## Learning From Data – A Short Course: Problem 7.1

Page 43 Implement the decision function below using a 3-layer perceptron. First I’ll construct a rectangle like this: It’s easy to see how: Consider the four lines , , , and what we want is the hypothesis . The corresponding MLP: Next I’ll try to construct a cooler shape: Now consider the three lines , and [...]

## Backpropagation in Convolutional (Neural) Network

Neural networks and deep learning, Chapter 6: Backpropagation in a convolutional network The core equations of backpropagation in a network with fully-connected layers are (BP1)-(BP4) (link). Suppose we have a network containing a convolutional layer, a max-pooling layer, and a fully-connected output layer, as in the network discussed above. How are the equations of backpropagation [...]

## Learning From Data – A Short Course: Exercise 7.19

Page 41 Previously, for our digit problem, we used symmetry and intensity. How do these features relate to deep networks? Do we still need them? Symmetry and intensity features can be the input features in the input layer or some outputs of some hidden layer. We may intentionally inject them into the network or they may [...]

## Learning From Data – A Short Course: Exercise 7.18

Since the input is an image it is convenient to represent it as a mtrix of its pixels which are black () or white (). The basic shape of identifies a set of these pixels which are black. (a) Show that feature  can be computed by the neural network node     Set if the [...]

## Learning From Data – A Short Course: Exercise 7.13

Page 27   Suppose you run gradient descent for 1000 iterations. You have 500 examples in , and you use 450 for and 50 for . You output the weight from iteration 50, with and . (a) Is  an unbiased estimate of ? No. (b) Use the Hoeffding bound to get a bound for using [...]

## Learning From Data – A Short Course: Exercise 7.11

Page 24 For weight elimination, show that . I have this differential formula: . Now I consider the following derivative:                 So we have:     Argue that weight elimination shrinks small weights faster than large ones. There are many ways to do this. One of the fastest [...]

## Learning From Data – A Short Course: Exercise 7.10

Page 20 How many weight parameters are there in a neural network with architecture speciﬁed by , a vector giving the number of nodes in each layer?     Evaluate your formula for a 2 hidden layer network with 10 hidden nodes in each hidden layer.

## Learning From Data – A Short Course: Exercise 7.9

Page 18 What can go wrong if you just initialize all the weights to exactly zero?   For , if becomes zero then becomes zero. For , ( or ), if becomes zero then becomes zero. The gradient will then becomes zero so the algorithm will stop immediately and then blindly return as the final [...]

## Learning From Data – A Short Course: Exercise 7.7

Page 11 For the sigmoidal perceptron, , let the in-sample error be . Show that: If , what happens to the gradient; how this is related to why it is hard to optimize the perceptron.                                 We observe that: That [...]

## Learning From Data – A Short Course: Exercise 7.2

Page 3 (a) The Boolean and of two inputs can be extended to more than two inputs: if any one of the inputs is ; if all the inputs equal . Give graph representations of and .         (c) Give the graph representation of .