Convolutional Neural Networks Filters and Feature Maps with PyTorch

pytorch
This is a practice notebook for implementing a convolutional neural network (CNN) on the MNIST dataset with PyTorch. We will implement the now famous LeNet-5 from Yann LeCun, a 7-layer CNN from 1989. Then we will explore and visualize the layers learned by our network including filters, feature maps, and output layers.
Published

October 18, 2022

Introduction

We will train a LeNet-5 CNN model with PyTorch on the MNIST dataset in this notebook. Given below is the summary of the steps followed in this notebook.

  • Download the MNIST dataset. Split the data into Train and Validation datasets. Then convert them into mini-batches using PyTorch DataLoader class
  • Create a Neural Net model configuration, an SGD optimizer, and a loss function
  • Create a pipeline that will train the model on given data and update the weights based on the loss
  • Visualize filters and feature maps from the trained model

Environment

This notebook is prepared with Google Colab.

Code
from platform import python_version
import numpy, matplotlib, pandas, torch

print("python==" + python_version())
print("numpy==" + numpy.__version__)
print("torch==" + torch.__version__)
print("matplotlib==" + matplotlib.__version__)
python==3.7.15
numpy==1.21.6
torch==1.12.1+cu113
matplotlib==3.2.2

Credits

This notebook takes inspiration and ideas from the following two sources.

Download MNIST Dataset

MNIST dataset can be downloaded easily from PyTorch built-in datasets provided under torchvision.datasets. In this section, we will download it, split it into train and test datasets, and then convert it into PyTorch tensors.

  • Read more about the PyTorch MNIST dataset here
  • torchvision.transforms.Compose is like a container to hold a list of transformations you intend to apply. Read more about it here
  • torchvision.transforms.ToTensor converts a PIL Image or numpy.ndarray to tensor. It converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]. Here C=Channel, H=Height, W=Width. Read more about this transformation here
#collapse-output
import torchvision
import numpy as np

train_dataset = torchvision.datasets.MNIST('classifier_data', train=True, download=True)
test_dataset  = torchvision.datasets.MNIST('classifier_data', train=False, download=True)

transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()
])

train_dataset.transform=transform
test_dataset.transform=transform

print(f"Total training images: {len(train_dataset)}")
print(f"Shape of an image: {np.shape(train_dataset.data[7])}")
print(f"Values of an image: \n{train_dataset.data[7]}")
Total training images: 60000
Shape of an image: torch.Size([28, 28])
Values of an image: 
tensor([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  38,  43, 105,
         255, 253, 253, 253, 253, 253, 174,   6,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,  43, 139, 224, 226, 252,
         253, 252, 252, 252, 252, 252, 252, 158,  14,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0, 178, 252, 252, 252, 252,
         253, 252, 252, 252, 252, 252, 252, 252,  59,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0, 109, 252, 252, 230, 132,
         133, 132, 132, 189, 252, 252, 252, 252,  59,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   4,  29,  29,  24,   0,
           0,   0,   0,  14, 226, 252, 252, 172,   7,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,  85, 243, 252, 252, 144,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,  88, 189, 252, 252, 252,  14,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          91, 212, 247, 252, 252, 252, 204,   9,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,  32, 125, 193, 193, 193,
         253, 252, 252, 252, 238, 102,  28,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,  45, 222, 252, 252, 252, 252,
         253, 252, 252, 252, 177,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,  45, 223, 253, 253, 253, 253,
         255, 253, 253, 253, 253,  74,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,  31, 123,  52,  44,  44,
          44,  44, 143, 252, 252,  74,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,  15, 252, 252,  74,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,  86, 252, 252,  74,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   5,  75,   9,   0,   0,   0,   0,   0,
           0,  98, 242, 252, 252,  74,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,  61, 183, 252,  29,   0,   0,   0,   0,  18,
          92, 239, 252, 252, 243,  65,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0, 208, 252, 252, 147, 134, 134, 134, 134, 203,
         253, 252, 252, 188,  83,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0, 208, 252, 252, 252, 252, 252, 252, 252, 252,
         253, 230, 153,   8,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,  49, 157, 252, 252, 252, 252, 252, 217, 207,
         146,  45,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   7, 103, 235, 252, 172, 103,  24,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]],
       dtype=torch.uint8)

From the above cell output, there are 60,000 training images. The shape of each image is 28 x 28, which means it is a 2D matrix. We have also printed the values of one image, but they don’t make much sense unless we view them as an image. So let’s do that.

import matplotlib.pyplot as plt

plt.imshow(train_dataset.data[7], cmap='gray')
<matplotlib.image.AxesImage at 0x7feae1775310>

Load generated data into PyTorch Dataset and DataLoader class

Now let’s load our data into Dataset and DataLoader classes. PyTorch Dataset is a helper class that converts data and labels into a list of tuples. DataLoader is another helper class to create batches from Dataset tuples. batch_size means the number of tuples we want in a single batch. We have used 128 here, so each fetch from DataLoader will give us a list of 128 tuples.

import torch
from torch.utils.data import Dataset, DataLoader, random_split

train_size=len(train_dataset)

# Randomly split the data into non-overlapping train and validation set
# train size = 70% and validation size = 30%
train_data, val_data = random_split(train_dataset, [int(train_size*0.7), int(train_size - train_size*0.7)])

batch_size=128

# Load data into DataLoader class
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size)
valid_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size)

print(f"Batches in Train Loader: {len(train_loader)}")
print(f"Batches in Valid Loader: {len(valid_loader)}")

print(f"Examples in Train Loader: {len(train_loader.sampler)}")
print(f"Examples in Valid Loader: {len(valid_loader.sampler)}")
Batches in Train Loader: 469
Batches in Valid Loader: 141
Examples in Train Loader: 60000
Examples in Valid Loader: 18000
##
# Helper function to plot images from DataLoader
def plot_images(images, targets, n_plot=30):
    n_rows = n_plot // 10 + ((n_plot % 10) > 0)
    fig, axes = plt.subplots(n_rows, 10, figsize=(15, 1.5 * n_rows))
    axes = np.atleast_2d(axes)

    for i, (image, target) in enumerate(zip(images[:n_plot], targets[:n_plot])):
        row, col = i // 10, i % 10    
        ax = axes[row, col]
        ax.set_title('#{} - Label:{}'.format(i, target), {'size': 12})
        # plot filter channel in grayscale
        ax.imshow(image.squeeze(), cmap='gray', vmin=0, vmax=1)

    for ax in axes.flat:
        ax.set_xticks([])
        ax.set_yticks([])
        ax.label_outer()

    plt.tight_layout()
    return fig

## Code taken from https://github.com/dvgodoy/PyTorchStepByStep/blob/master/plots/chapter5.py

Let’s plot some dataset images along with their labels from a batch.

images, labels = next(iter(train_loader))
fig = plot_images(images, labels, n_plot=40)

Define a class to implement training, validation, and mini-batch processing pipeline

In this section we will implement a class that encapsulates all the usual steps required in training a PyTorch model. This way we can focus more on the model architecture and performance, and less concerned about the boilerplate training loop. Important parts of this class are

  • __init__: Class constructor to define the main actors in a training cycle including model, optimizer, loss function, training and validation DataLoaders
  • _make_train_step_fn: Training pipeline is usually called “training step” which includes the following steps
    1. Compute our model’s predicted output - the forward pass
    2. Compute the loss
    3. Compute gradients i.e., find the direction and scale to update the weights to reduce the loss
    4. Update weight parameters using gradients and the learning rate
  • _make_val_step_fn: Validation pipeline is usually called the “validation step” which includes the following steps
    1. Compute our model’s predicted output - the forward pass
    2. Compute the loss
    3. Note that during validation, we are only concerned about the loss, i.e., how well our model performs on the validation dataset. Therefore, we don’t use it to calculate the gradients.
  • _mini_batch: It defines the steps to process a single minibatch in a helper function. For a mini-batch processing, we want to
    1. Get the next batch of data and labels (x, y) from the DataLoader iterator
    2. Perform a step on the batch. A step can be either training or validation
    3. Compute the average batch loss
  • train: Execute training and validation steps for given number of epoch
  • predict: Make a prediction from model on provided data
Code
class DeepLearningPipeline(object):
    def __init__(self, model, loss_fn, optimizer):
        # Here we define the attributes of our class
        
        # We start by storing the arguments as attributes 
        # to use them later
        self.model = model
        self.loss_fn = loss_fn
        self.optimizer = optimizer
        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
        # Let's send the model to the specified device right away
        self.model.to(self.device)

        # These attributes are defined here, but since they are
        # not informed at the moment of creation, we keep them None
        self.train_loader = None
        self.val_loader = None
        self.writer = None
        
        # These attributes are going to be computed internally
        self.losses = []
        self.val_losses = []
        self.total_epochs = 0

        # Creates the train_step function for our model, 
        # loss function and optimizer
        # Note: there are NO ARGS there! It makes use of the class
        # attributes directly
        self.train_step_fn = self._make_train_step_fn()
        # Creates the val_step function for our model and loss
        self.val_step_fn = self._make_val_step_fn()

    def set_loaders(self, train_loader, val_loader=None):
        # This method allows the user to define which train_loader (and val_loader, optionally) to use
        # Both loaders are then assigned to attributes of the class
        # So they can be referred to later
        self.train_loader = train_loader
        self.val_loader = val_loader

    def _make_train_step_fn(self):
        # This method does not need ARGS... it can refer to
        # the attributes: self.model, self.loss_fn and self.optimizer
        
        # Builds function that performs a step in the train loop
        def perform_train_step_fn(x, y):
            # Sets model to TRAIN mode
            self.model.train()

            # Step 1 - Computes our model's predicted output - forward pass
            yhat = self.model(x)
            # Step 2 - Computes the loss
            loss = self.loss_fn(yhat, y)
            # Step 3 - Computes gradients for both "a" and "b" parameters
            loss.backward()
            # Step 4 - Updates parameters using gradients and the learning rate
            self.optimizer.step()
            self.optimizer.zero_grad()

            # Returns the loss
            return loss.item()

        # Returns the function that will be called inside the train loop
        return perform_train_step_fn
    
    def _make_val_step_fn(self):
        # Builds function that performs a step in the validation loop
        def perform_val_step_fn(x, y):
            # Sets model to EVAL mode
            self.model.eval()

            # Step 1 - Computes our model's predicted output - forward pass
            yhat = self.model(x)
            # Step 2 - Computes the loss
            loss = self.loss_fn(yhat, y)
            # There is no need to compute Steps 3 and 4, 
            # since we don't update parameters during evaluation
            return loss.item()

        return perform_val_step_fn
            
    def _mini_batch(self, validation=False):
        # The mini-batch can be used with both loaders
        # The argument `validation`defines which loader and 
        # corresponding step function is going to be used
        if validation:
            data_loader = self.val_loader
            step_fn = self.val_step_fn
        else:
            data_loader = self.train_loader
            step_fn = self.train_step_fn

        if data_loader is None:
            return None
            
        # Once the data loader and step function, this is the 
        # same mini-batch loop we had before
        mini_batch_losses = []
        for x_batch, y_batch in data_loader:
            x_batch = x_batch.to(self.device)
            y_batch = y_batch.to(self.device)

            mini_batch_loss = step_fn(x_batch, y_batch)
            mini_batch_losses.append(mini_batch_loss)

        loss = np.mean(mini_batch_losses)
        return loss

    def set_seed(self, seed=42):
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False    
        torch.manual_seed(seed)
        np.random.seed(seed)
    
    def train(self, n_epochs, seed=42):
        # To ensure reproducibility of the training process
        self.set_seed(seed)

        for epoch in range(n_epochs):
            # Keeps track of the numbers of epochs
            # by updating the corresponding attribute
            self.total_epochs += 1

            # inner loop
            # Performs training using mini-batches
            loss = self._mini_batch(validation=False)
            self.losses.append(loss)

            # VALIDATION
            # no gradients in validation!
            with torch.no_grad():
                # Performs evaluation using mini-batches
                val_loss = self._mini_batch(validation=True)
                self.val_losses.append(val_loss)

            # If a SummaryWriter has been set...
            if self.writer:
                scalars = {'training': loss}
                if val_loss is not None:
                    scalars.update({'validation': val_loss})
                # Records both losses for each epoch under the main tag "loss"
                self.writer.add_scalars(main_tag='loss',
                                        tag_scalar_dict=scalars,
                                        global_step=epoch)
            
            print(f"epoch: {epoch:3}, train loss: {loss:.5f}, valid loss: {val_loss:.5f}")

        if self.writer:
            # Closes the writer
            self.writer.close()

    def predict(self, x):
        # Set is to evaluation mode for predictions
        self.model.eval() 
        # Takes aNumpy input and make it a float tensor
        x_tensor = torch.as_tensor(x).float()
        # Send input to device and uses model for prediction
        y_hat_tensor = self.model(x_tensor.to(self.device))
        # Set it back to train mode
        self.model.train()
        # Detaches it, brings it to CPU and back to Numpy
        return y_hat_tensor.detach().cpu().numpy()

    def plot_losses(self):
        fig = plt.figure(figsize=(10, 4))
        plt.plot(self.losses, label='Training Loss', c='b')
        plt.plot(self.val_losses, label='Validation Loss', c='r')
        plt.yscale('log')
        plt.xlabel('Epochs')
        plt.ylabel('Loss')
        plt.legend()
        plt.tight_layout()
        return fig

Create LeNet-5 model configuration

architecture_lenet.png

Source: Generated using Alexander Lenail’s NN-SVG and adapted by the author [“Daniel Voigt Godoy”]. For more details, see LeCun, Y., et al (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE,86(11), 2278–2324

Image taken from dvgodoy/PyTorchStepByStep

import torch.nn as nn

lenet = nn.Sequential()

# Featurizer
# Block 1: 1@28x28 -> 6@28x28 -> 6@14x14
lenet.add_module('C1', nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2))
lenet.add_module('func1', nn.ReLU())
lenet.add_module('S2', nn.MaxPool2d(kernel_size=2))
# Block 2: 6@14x14 -> 16@10x10 -> 16@5x5
lenet.add_module('C3', nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5))
lenet.add_module('func2', nn.ReLU())
lenet.add_module('S4', nn.MaxPool2d(kernel_size=2))
# Block 3: 16@5x5 -> 120@1x1
lenet.add_module('C5', nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5))
lenet.add_module('func2', nn.ReLU())
# Flattening
lenet.add_module('flatten', nn.Flatten())

# Classification
# Hidden Layer
lenet.add_module('F6', nn.Linear(in_features=120, out_features=84))
lenet.add_module('func3', nn.ReLU())
# Output Layer
lenet.add_module('OUTPUT', nn.Linear(in_features=84, out_features=10))

Now let’s create optimizer and a loss function.

import torch.optim as optim

lr = 0.003
torch.manual_seed(42)

model = lenet

# Defines a SGD optimizer to update the parameters
optimizer = optim.SGD(model.parameters(), lr=lr)

# Defines a BCE loss function
loss_fn = nn.CrossEntropyLoss()

Now let’s train our model for 20 epochs.

n_epochs = 20

dlp = DeepLearningPipeline(model, loss_fn, optimizer)
dlp.set_loaders(train_loader, valid_loader)
dlp.train(n_epochs)
epoch:   0, train loss: 2.29349, valid loss: 2.28632
epoch:   1, train loss: 2.27442, valid loss: 2.25871
epoch:   2, train loss: 2.22077, valid loss: 2.15503
epoch:   3, train loss: 1.85172, valid loss: 1.27419
epoch:   4, train loss: 0.82061, valid loss: 0.59110
epoch:   5, train loss: 0.50765, valid loss: 0.45872
epoch:   6, train loss: 0.41486, valid loss: 0.39295
epoch:   7, train loss: 0.36466, valid loss: 0.35130
epoch:   8, train loss: 0.33108, valid loss: 0.32108
epoch:   9, train loss: 0.30512, valid loss: 0.29665
epoch:  10, train loss: 0.28347, valid loss: 0.27550
epoch:  11, train loss: 0.26473, valid loss: 0.25711
epoch:  12, train loss: 0.24807, valid loss: 0.24056
epoch:  13, train loss: 0.23314, valid loss: 0.22590
epoch:  14, train loss: 0.21989, valid loss: 0.21282
epoch:  15, train loss: 0.20804, valid loss: 0.20097
epoch:  16, train loss: 0.19743, valid loss: 0.19028
epoch:  17, train loss: 0.18791, valid loss: 0.18065
epoch:  18, train loss: 0.17929, valid loss: 0.17199
epoch:  19, train loss: 0.17142, valid loss: 0.16408

Let’s see how our training and validation loss looks like.

fig = dlp.plot_losses()

Visualize model filters

Neural network convolutional layers are a stack of square matrices. We repeatedly apply these matrices or filters on images, and the output of this operation is called convolutions. These convolutions act as intermediate (or new) datasets generated from the images (a kind of dynamic feature engineering). We try to learn from these convolutions and calculate a loss for them. Initially, the value for these filters is randomly selected. If the loss is high, we try slowly changing values or weights for these filters. Changing filter values also create changes in outputs or convolutions. If a convolution results in a slight decrease in loss, we take it as a good sign and try to move in that direction (make more similar changes). If we try these steps repeatedly, we might find good convolutions (or good weights in the filter, as they are the source of convolutions). By “good weights”, we mean that the final loss is significantly lower than a random value.

These convolutional filters are created in PyTorch using class nn.Conv2d. A filter can have one matrix or a stack of matrices under it. Matrices under a filter are sometimes called kernels, but I will stick to the filter and matrix terms to avoid confusion.

So If we look at the LeNet-5 model configuration, we will find that we created three convolutional filters or nn.Conv2d layers. Let’s print the dimension of these layers.

model_weights = [] 
conv_layers = [] 
model_children = list(model.children())

# counter to keep count of the conv layers
counter = 0 
# append all the conv layers and their respective weights to the list
for i in range(len(model_children)):
    if type(model_children[i]) == nn.Conv2d:
        counter += 1
        model_weights.append(model_children[i].weight)
        conv_layers.append(model_children[i])

print(f"Total convolutional layers: {counter}")

for weight, conv in zip(model_weights, conv_layers):
    print(f"CONV: {conv} ====> SHAPE: {weight.shape}")
Total convolutional layers: 3
CONV: Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) ====> SHAPE: torch.Size([6, 1, 5, 5])
CONV: Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) ====> SHAPE: torch.Size([16, 6, 5, 5])
CONV: Conv2d(16, 120, kernel_size=(5, 5), stride=(1, 1)) ====> SHAPE: torch.Size([120, 16, 5, 5])

The output of the above cell tells us that

  • There are three convolutional layers
  • First layer has a dimension of [6, 1, 5, 5]. It means we have 6 filters in this layer. Each filter has 1 matrix of dimension 5x5 under it.
  • Second layer has a dimension of [16, 6, 5, 5]. It means we have 16 filters. Each filter has 6 matrices of size 5x5 under it.
  • Third layer has a dimension of [120, 16, 5, 5]. It means we have 120 filters. Each filter has 16 matrices of size 5x5 under it.

Note that the learned weights from these matrices are stored in model_weights list, which we will visualize in the next section.

##
# Helper function to visualize filters
def visualize_filters(layer, n_plots=30):
    # Get layer dimensions. e.g. `[6, 1, 5, 5]` where
    # filters=6, kernels=1, kernel_height=5, kernel_weight=5
    filters, kernels, kernel_height, kernel_weight = layer.shape
    # total plots = total number of matrices present in a layer. 
    # Each matrix weights can be plotted as an image
    total_plots = filters * kernels
    # total_plots can be too many. So let's create an upper limit on them as 'MAX_PLOTS'
    MAX_PLOTS = min(n_plots, total_plots)
    # number of columns for our plots 'MAX_COL_PLOTS'
    MAX_COL_PLOTS = 10
    # number of rows for our plots 'MAX_ROW_PLOTS'
    MAX_ROW_PLOTS = max(MAX_PLOTS // MAX_COL_PLOTS, MAX_COL_PLOTS)

    # specify some size of each plot image
    plt.figure(figsize=(20, 2.5*MAX_ROW_PLOTS)) # width, height
    plt.tight_layout()
    
    plot_count = 1
    # interate filters
    for f, filter in enumerate(layer):
        # iterate kernels under each filter
        for k, kernel in enumerate(filter):
            # plot a single kernel or a matrix
            plt.subplot(MAX_ROW_PLOTS, MAX_COL_PLOTS, plot_count)
            plt.imshow(kernel[:, :].detach().cpu().numpy(), cmap='gray')
            plt.title('#F:{} - K:{}'.format(f, k), {'size': 12})
            plt.axis('off')
            plot_count += 1
            
            # terminate on `MAX_PLOTS` 
            if plot_count > MAX_PLOTS:
                return plt.show()
    
    plt.show()

Visualize weights for first ‘Conv2d’ layer

visualize_filters(model_weights[0], 50)

Visualize weights for second ‘Conv2d’ layer

visualize_filters(model_weights[1], 50)

Visualize weights for third ‘Conv2d’ layer

visualize_filters(model_weights[2], 50)

What do these filter images tell us?

These learned filter (2d matrix) images seem very random. How can these filters create an output (convolution or intermediate dataset) that can help our model to learn and classify an image to its correct class? Filters usually learn to find edges and curves from images. When these filters are applied to images, they amplify certain aspects of these images, like edges, curves, lines, or other patterns.

Looking just at the filters does not give us much information. So in the next section, we will visualize the outputs (or convolutions) produced by these filters.

Visualize feature maps for convolutional layers

The output produced by a neural net layer is called its feature map. These layers can be convolutional, flattening, or linear (fully connected). For example, in our LeNet-5 model, we have three convolutional layers, and each of these layers produces a feature map. In this section, we will visualize them.

Define a hook

A hook is simply a function we can give to our model to execute after its forward or backward pass. While attaching (or registering) a hook to a model, we provide the layer name on which we want to connect it. A hook function takes three arguments.

  • a layer or a model
  • a tensor representing the inputs to the layer or model
  • a tensor representing the outputs from the layer or model

Let’s define a function that will serve as our hook.

##
# 'activation' is a dictionary to store the output from the layer
# It should be defined outside the function, otherwise we will not be able to access it.
activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

Let’s also define a function to visualize the feature maps.

##
# Helper funtion to visualize the feature maps
def visualize_feature_map(layer, n_plots=30, cmap='gray', repeats=5, figsize=(5, 5)):
    # get feature map values and store them as 'act'
    act = layer.squeeze()

    # if feature map has three dimension
    if len(act.shape) == 3:
        total_plots, plot_width, plot_height = act.shape

        # total_plots can be too many so let's create an upper limit on them as 'MAX_PLOT'
        MAX_PLOT = min(total_plots, n_plots)
        # number of columns for our plots as 'MAX_COL_PLOTS'
        MAX_COL_PLOTS = 6
        # number of rows for our plots as 'MAX_ROW_PLOTs'
        MAX_ROW_PLOTs = max(MAX_PLOT // MAX_COL_PLOTS, MAX_COL_PLOTS)

        # specify some size for each plot image
        plt.figure(figsize=(20, 3.5*MAX_ROW_PLOTs)) # width, height
        plt.tight_layout()

        plot_count = 0
        for i in range(MAX_ROW_PLOTs):
            for j in range(MAX_COL_PLOTS):
                plt.subplot(MAX_ROW_PLOTs, MAX_COL_PLOTS, plot_count+1)
                plt.imshow(act[plot_count].detach().cpu().numpy(), cmap=cmap)
                plt.title('#R:{} - C:{}'.format(i, j), {'size': 12})
                plt.axis('off')
                plot_count += 1

                # terminate if plot_count reaches MAX_PLOT
                if plot_count >= MAX_PLOT:
                    return plt.show()
    else: # len(act.shape) == 3
    # if feature map has two dimension
        arr_r = np.repeat(act.reshape(1,-1), repeats=repeats, axis=0)
        plt.figure(figsize=figsize) # width, height
        plt.tight_layout()
        plt.imshow(arr_r, cmap=cmap)
        plt.axis('off')

        return plt.show()

Feature map from first ‘Conv2d’ layer

Let’s register our hook to first convolutional layer and visualize its feature map.

##
# empty 'activation' as a precaution
activation = {}

# first 'Conv2d' layer is named as 'C1'
handle = model.C1.register_forward_hook(get_activation('C1'))

# take any dataset image. image '7' is for three number
data, label = train_dataset[7]
data.unsqueeze_(0)
output = model(data)

# remove hook
handle.remove()

print(f"Dimensions for C1 feature map: {activation['C1'].squeeze().shape}")
Dimensions for C1 feature map: torch.Size([6, 28, 28])
##
# visualize the feature map
# dimensions: 6@28x28
visualize_feature_map(activation['C1'])

Feature map from second ‘Conv2d’ layer

Let’s register our hook to second convolutional layer and visualize its feature map.

##
# empty 'activation' as a precaution
activation = {}

# second 'Conv2d' layer is named as 'C3'
handle = model.C3.register_forward_hook(get_activation('C3'))

# take any dataset image. image '7' is for three number
data, label = train_dataset[7]
data.unsqueeze_(0)
output = model(data)

# remove hook
handle.remove()

print(f"Dimensions for C3 feature map: {activation['C3'].squeeze().shape}")
Dimensions for C3 feature map: torch.Size([16, 10, 10])
##
# visualize the feature map
# dimensions: 16@10x10
visualize_feature_map(activation['C3'])

Feature maps from the first and second ‘Conv2d’ layers together

Feature maps from the first layer show that they are sharper. Feature maps from the second layer show that they are more spread out or convolved. By spreading out or blurring effect, it seems like only the most significant features remain in the output, and the rest slowly disappear. For example, in the case of ‘3’ in the second feature map, only the horizontal edge signal remains, and any other signal gets dissolved.

To get more intuition of what is happening here, let’s visualize both feature maps together and for multiple images.

##
# Visualize feature maps for C1 and C3 together
# Visualize them for first 5 train images
for i in range(5):
    # just a separator.
    print(f"{'*'*30} IMAGE {i} {'*'*30}")

    # empty 'activation' as a precaution
    activation = {}

    # create hooks for C1 and C3
    handle1 = model.C1.register_forward_hook(get_activation('C1'))
    handle2 = model.C3.register_forward_hook(get_activation('C3'))
    
    data, _ = train_dataset[i]
    data.unsqueeze_(0)
    output = model(data)

    # remove hooks
    handle1.remove()
    handle2.remove()

    # visualize feature maps
    # I have chaged the colors of output to sharpen the differences
    visualize_feature_map(activation['C1'], cmap='viridis')
    visualize_feature_map(activation['C3'], cmap='viridis')
****************************** IMAGE 0 ******************************

****************************** IMAGE 1 ******************************

****************************** IMAGE 2 ******************************

****************************** IMAGE 3 ******************************

****************************** IMAGE 4 ******************************

Feature map from third ‘Conv2d’ layer

Let’s register our hook to third convolutional layer and visualize its feature map.

##
# empty 'activation' as a precaution
activation = {}

# third 'Conv2d' layer is named as 'C5'
handle = model.C5.register_forward_hook(get_activation('C5'))

# take any dataset image. image '7' is for three number
data, label = train_dataset[7]
data.unsqueeze_(0)
output = model(data)

# remove hook
handle.remove()

print(f"Dimensions for C5 feature map: {activation['C5'].squeeze().shape}")
Dimensions for C5 feature map: torch.Size([120])
##
# visualize the feature map
# dimensions: 120@1x1
visualize_feature_map(activation['C5'], figsize=(20,15))

Feature map from third ‘Conv2d’ layer for multiple images

The output from the third ‘Conv2d’ layer C5 looks like a signature bar code that the model learns to associate with a particular target class. Let’s visualize more of them for a set of 1, 3, and 5 images together.

##
# create a collection of 1, 3, and 5 images
bucket = {'1':[], '3':[], '5':[]}

# iterate through the dataset till we have 5 images for each class
count_1, count_3, count_5 = 0,0,0
for idx in range(100):
    _, label = train_dataset[idx]

    if label == 1 and count_1 < 5:
        bucket[str(label)].append(idx)
        count_1 +=1
    elif label == 3 and count_3 < 5:
        bucket[str(label)].append(idx)
        count_3 +=1
    elif label == 5 and count_5 < 5:
        bucket[str(label)].append(idx)
        count_5 +=1

    if count_1 + count_3 + count_5 == 15:
        break
##
# visualize feature maps for a set of images from class 1, 3 and 5
for key in bucket:
    print(f"{'*'*30} LAYER C5, LABEL {key} {'*'*30}")

    for i in range(len(bucket[key])):
        activation = {}
        
        # attach hook
        handle = model.C5.register_forward_hook(get_activation('C5'))
        idx = bucket[key][i]
        data, label = train_dataset[idx]
        data.unsqueeze_(0)
        output = model(data)
        
        # remove hook
        handle.remove()
        
        # visualize feature map
        # i have changed the output colormap to sharpen the differences
        visualize_feature_map(activation['C5'], cmap='hsv', figsize=(20,15))
****************************** LAYER C5, LABEL 1 ******************************

****************************** LAYER C5, LABEL 3 ******************************

****************************** LAYER C5, LABEL 5 ******************************

Visualize feature maps for a classifier or hidden layer

We have two hidden layers or ‘nn.Linear’ in the classification part of our model. They are also called fully connected layers. Let’s visualize the feature maps for them.

Visualize feature map for first ‘Linear’ layer

Let’s visualize the feature maps for the first hidden layer.

# visualize differences for images 1,3 and 5
for key in bucket:
    print(f"{'*'*30} LAYER F6, LABEL {key} {'*'*30}")
    for i in range(len(bucket[key])):

        activation = {}
        handle1 = model.F6.register_forward_hook(get_activation('F6'))
        idx = bucket[key][i]
        data, label = train_dataset[idx]
        
        data.unsqueeze_(0)
        output = model(data)
        handle1.remove()
        
        visualize_feature_map(activation['F6'], cmap='hsv', figsize=(20,15))
****************************** LAYER F6, LABEL 1 ******************************

****************************** LAYER F6, LABEL 3 ******************************

****************************** LAYER F6, LABEL 5 ******************************

Visualize feature map for second ‘Linear’ layer

Let’s visualize the feature maps for the second hidden layer.

# visualize differences for images 1,3 and 5
for key in bucket:
    print(f"{'*'*30} LAYER OUTPUT, LABEL {key} {'*'*30}")
    for i in range(len(bucket[key])):

        activation = {}
        handle1 = model.OUTPUT.register_forward_hook(get_activation('OUTPUT'))
        idx = bucket[key][i]
        data, label = train_dataset[idx]
        
        data.unsqueeze_(0)
        output = model(data)
        handle1.remove()
        
        visualize_feature_map(activation['OUTPUT'], repeats=1,cmap='hsv', figsize=(10,5))
****************************** LAYER OUTPUT, LABEL 1 ******************************

****************************** LAYER OUTPUT, LABEL 3 ******************************

****************************** LAYER OUTPUT, LABEL 5 ******************************

What do these OUTPUT feature maps tell us?

We can see that output feature maps of the same numbers are very similar. They show that our model has learned some hidden patterns that distinguish these images. For example

  • For label 1, all output feature maps have a RED bar at the extreme left and a BLUE bar at the right side
  • For label 3, there are two RED bars around the middle and a light BLUE bar running along them
  • For label 5, RED bars are in the middle and at the extreme right side.

If we look at the first feature map from label 5, it is a bit different. A kind of outlier from the rest of label 5 feature maps. What could be the reason for this?

Let’s plot these label 5 images to check their actual shape.

##
# label 5 image indexes
bucket['5']
[0, 11, 35, 47, 65]
##
# Plot label 5 images
plt.figure(figsize=(20, 10))
for i, idx in enumerate(bucket['5']):
    plt.subplot(1, 5, i+1)
    plt.imshow(train_dataset.data[idx], cmap='gray')
    plt.title(f"#Index:{idx} ")
    plt.axis('off')

Now we can see the reason for the output feature map of the first label ‘5’ image being quite different from the rest. The first image from the left side is slightly different from the rest. So it is a bit weird number five image, and that weirdness got reflected in the feature map generated by it.