Backpropagation in a convolutional network The core equations of backpropagation in a network with fully-connected layers are (BP1)-(BP4) (link). Suppose we have a network containing a convolutional layer, a max-pooling layer, and a fully-connected output layer, as in the network discussed above. How are the equations of backpropagation modified?

In this post, I do not follow Michael Nielsen’s notations in Neural networks and deep learning above, rather I use notations from Learning From Data – A Short Course: (with being activation function) and (roughly), is cost function.

So we have 3 layers (I don’t count input layer):

• : Input layer.
• : Convolutional layer.
• : Pooling layer.
• : Output layer.

We also have weights connects and  connects and . There is no because I believe pooling layer is non-parametric.

We have: It’s easy to compute . How about ? If (max-pooling) or (L2 pooling), then must be a vector. Hence it would be a good idea if (identity function) for max-pooling or for L2 pooling, in such cases is still a number. There are no parameters to learn at pooling layer so we will go by the gradient update step. As hinted above, at this step we can define for max-pooling and for L2 pooling. For derivative of max function, you can make a reference to Derivative of the .

However, what plays vital role is how we compute .

Dear my future self, you should remember that at this time I suck at multivariate calculus (do not be surprised, I can go this far due to single calculus power – special thanks to Herbert Gross), so DO NOT TRUST ME blindly from this step. Thanks.

Well, by chain rule, I guess this is how we calculate : With being value of the weight at position of convolutional layer’s kernel matrix.

That’s all, I think so.