Maybe some real examples can be found in e.g. the pytorch examples repository:
Example 1: MNIST forward
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output
Similar to the _= f(_) trick, this turns out surprisingly difficult to improve.
Example 2: GCN forward
def forward(self, input_tensor, adj_mat):
        # Perform the first graph convolutional layer
        x = self.gc1(input_tensor, adj_mat)
        x = F.relu(x) # Apply ReLU activation function
        x = self.dropout(x) # Apply dropout regularization
        # Perform the second graph convolutional layer
        x = self.gc2(x, adj_mat)
        # Apply log-softmax activation function for classification
        return F.log_softmax(x, dim=1)
The same pattern again.
Example 3: GCN features
    # Process features
    features = torch.FloatTensor(content_tensor[:, 1:-1].astype(np.int32)) # Extract feature values
    scale_vector = torch.sum(features, dim=1) # Compute sum of features for each node
    scale_vector = 1 / scale_vector # Compute reciprocal of the sums
    scale_vector[scale_vector == float('inf')] = 0 # Handle division by zero cases
    scale_vector = torch.diag(scale_vector).to_sparse() # Convert the scale vector to a sparse diagonal matrix
    features = scale_vector @ features # Scale the features using the scale vector
Similar patterns also occur often with numpy. An improvement for this kind of code would be nice, but it’s not obvious any of the proposals so far would really help.