Backpropagation in a convolutional network The core equations of backpropagation in a network with fully-connected layers are (BP1)-(BP4) (link). Suppose we have a network containing a convolutional layer, a max-pooling layer, and a fully-connected output layer, as in the network discussed above. How are the equations of backpropagation modified?
In this post, I do not follow Michael Nielsen’s notations in Neural networks and deep learning above, rather I use notations from Learning From Data – A Short Course: (with being activation function) and (roughly), is cost function.
So we have 3 layers (I don’t count input layer):
- : Input layer.
- : Convolutional layer.
- : Pooling layer.
- : Output layer.
We also have weights connects and , connects and . There is no because I believe pooling layer is non-parametric.
It’s easy to compute . How about ? If (max-pooling) or (L2 pooling), then must be a vector. Hence it would be a good idea if (identity function) for max-pooling or for L2 pooling, in such cases is still a number. There are no parameters to learn at pooling layer so we will go by the gradient update step.
As hinted above, at this step we can define for max-pooling and for L2 pooling. For derivative of max function, you can make a reference to Derivative of the .
However, what plays vital role is how we compute .
Dear my future self, you should remember that at this time I suck at multivariate calculus (do not be surprised, I can go this far due to single calculus power – special thanks to Herbert Gross), so DO NOT TRUST ME blindly from this step. Thanks.
Well, by chain rule, I guess this is how we calculate :
With being value of the weight at position of convolutional layer’s kernel matrix.
That’s all, I think so.