4 minute read

Published: March 01, 2024

It is a tutorial about implementing a convolutional layer from scratch with NumPy, with forward and backward propagation.

Introduction

Convolutional Neural Networks (CNN) have been widely adopted in various Computer Vision tasks and have achieved remarkable performance. With the development in deep learning frameworks such as PyTorch and TensorFlow, researchers can quickly build CNN models and start experimenting in a short amount of time. While these frameworks undoubtedly accelerate research progress and allow us to avoid reproducing the wheel, it is still crucial for us to understand the details of a convolutional layer to foster further innovation. Motivated by this, in this technical blog, we present a implementation of a convolutional layer from scratch.

Implementation

Forward Pass

Implementation of the forward pass for a convolutional layer. The input consists of N data points, each with C channels, height H and width W. We convolve each input with F different filters, where each filter spans all C channels and has height HH and width HH.

Input:

x: Input data of shape (N, C, H, W)
w: Filter weights of shape (F, C, HH, WW)
b: Biases, of shape (F,)
conv_param: A dictionary with the following keys: ‘stride’: The number of pixels between adjacent receptive fields in the horizontal and vertical directions. ‘pad’: The number of pixels that will be used to zero-pad the input.

Output:

out: Output data, of shape (N, F, H’, W’) where H’ and W’ are given by: H’ = 1 + (H + 2 * pad - HH) / stride and W’ = 1 + (W + 2 * pad - WW) / stride
cache: (x, w, b, conv_param)

Python Code

def conv_forward(x, w, b, conv_param):
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    pad = conv_param['pad']
    stride = conv_param['stride']
    H_ = 1 + math.floor((H + 2 * pad - HH) / stride)
    W_ = 1 + math.floor((W + 2 * pad - WW) / stride)
    out = np.zeros((N, F, H_, W_))

    x_padded = np.pad(x,  [(0,), (0,), (pad,), (pad,)])
    for n in range(N):
      for f in range(F):
          for i in range(H_):
              for j in range(W_):
                out[n,f,i,j] = np.sum(
                x_padded[n,:,i*stride:i*stride+HH,j*stride:j*stride+ WW]
                * w[f]) + b[f]
                
    cache = (x, w, b, conv_param)
    return out, cache

Backward Pass

Implementation of the backward pass for a convolutional layer. What should be noted is that the gradient computation is essentially a convolution, but with the rotated kernel. The notations follow ones in the forward pass.

Input:

dout: Upstream derivatives.
cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

Output:

dx: Gradient with respect to x
dw: Gradient with respect to w
db: Gradient with respect to b

Python Code

def conv_backward(dout, cache):
    dx, dw, db = None, None, None
    db = np.sum(dout, axis=(0,2,3))
    x, w, b, conv_param = cache
    N, C, H, W = x.shape
    F, C, HH, WW = w.  
    stride, pad = conv_param['stride'], conv_param['pad']
    H_ = 1 + math.floor((H + 2 * pad - HH) / stride)
    W_ = 1 + math.floor((W + 2 * pad - WW) / stride)
    x_padded = np.pad(x, [(0,), (0,), (pad,), (pad,)])
    dx = np.zeros_like(x)
    dw = np.zeros_like(w)

    for f in range(F):
      for c in range(C):
        for i in range(HH):
          for j in range(WW):
            dw[f,c,i,j] = np.sum(
            x_padded[:,c,i:i+H_*stride:stride,j:j + W_*stride:stride]
            * dout[:,f,:,:])
    
    w_rot = np.rot90(w, 2, axes=(2,3))
    w_rot = np.moveaxis(w_rot, 0, 1)
    _, _, h_out, w_out = dout.shape
    H_out = h_out + (stride - 1) * (h_out - 1)
    W_out = w_out + (stride - 1) * (w_out - 1)
    new_out = np.zeros((N, F, H_out, W_out))
    new_out[:,:,0::stride,0::stride] = dout
    new_out = np.pad(new_out, [(0,), (0,), (HH-1,), (WW-1,)])
    dx = conv_forward(new_out, w_rot, b = np.zeros(C,), 
    conv_param={'stride': 1, 'pad': 0})[0][:,:,pad:-pad,pad:-pad])

    return dx, dw, db

Limitations

We have presented a simple yet concise Python code for implementing a convolutional layer from scratch. However, our code’s direct looping approach over the input data and filters can be computationally expensive, especially for larger inputs and filters. To address the computational inefficiency, several optimization techniques can be applied: Vectorization, Convolutional Fast Fourier Transform, Parallelization, and Winograd’s Algorithm. Nevertheless, the primary aim of our work is to unravel the mysteries in convolutional neural networks, and serves as a good starting point for learning about CNNs and their inner workings.

Wenbin Hu 胡文彬 (George)

Sitemap

Pages

Posts

Introduction

Implementation

Forward Pass

Python Code

Backward Pass

Python Code

Limitations

portfolio

publications

talks

teaching