## Learning From Data – A Short Course: Exercise 7.19

Page 41 Previously, for our digit problem, we used symmetry and intensity. How do these features relate to deep networks? Do we still need them? Symmetry and intensity features can be the input features in the input layer or some outputs of some hidden layer. We may intentionally inject them into the network or they may [...]

If we apply logistic regression in binary classification problem then we will have a parameter called threshold (its usual value is ). In skewed classes problem, rare class is considered as positive class (Andrew Ng). When the threshold increases: Intuition: If you are more picky for positive class then when you label a data point [...]

## Learning From Data – A Short Course: Exercise 7.18

Since the input is an image it is convenient to represent it as a mtrix of its pixels which are black () or white (). The basic shape of identifies a set of these pixels which are black. (a) Show that feature  can be computed by the neural network node     Set if the [...]

## Learning From Data – A Short Course: Exercise 3.9

Page 97 Consider pointwise error measures , , and , where the signal . (b) Show that , and hence that the classification error is upper bounded by the squared error. If then and . If  then , if then and , if then and , in general for whichever the case may be we have: . [...]

## Learning From Data – A Short Course: Exercise 7.13

Page 27   Suppose you run gradient descent for 1000 iterations. You have 500 examples in , and you use 450 for and 50 for . You output the weight from iteration 50, with and . (a) Is  an unbiased estimate of ? No. (b) Use the Hoeffding bound to get a bound for using [...]

## Learning From Data – A Short Course: Exercise 7.11

Page 24 For weight elimination, show that . I have this differential formula: . Now I consider the following derivative:                 So we have:     Argue that weight elimination shrinks small weights faster than large ones. There are many ways to do this. One of the fastest [...]

## Learning From Data – A Short Course: Exercise 7.10

Page 20 How many weight parameters are there in a neural network with architecture speciﬁed by , a vector giving the number of nodes in each layer?     Evaluate your formula for a 2 hidden layer network with 10 hidden nodes in each hidden layer.

## Learning From Data – A Short Course: Exercise 7.9

Page 18 What can go wrong if you just initialize all the weights to exactly zero?   For , if becomes zero then becomes zero. For , ( or ), if becomes zero then becomes zero. The gradient will then becomes zero so the algorithm will stop immediately and then blindly return as the final [...]

## [Notes] Learning From Data – A Short Course: e-Chapter 7

Page 18: Need an explanation for . Page 20: : is number of node in the first layer, is number of node in the input layer. Here is my guess: Each input node must connect to at least one node in the first layer (that is ). So the first input node can choose one in [...]

## Learning From Data – A Short Course: Exercise 7.7

Page 11 For the sigmoidal perceptron, , let the in-sample error be . Show that: If , what happens to the gradient; how this is related to why it is hard to optimize the perceptron.                                 We observe that: That [...]