## Learning From Data – A Short Course: Exercise 8.9

Let be optimal for (8.10), and let be optimal for (8.11). (a) Show that . If is the optimal of (8.10) then it must satisfy the constraint , hence . If then it doesn’t matter what is, . If  then we can always choose , so . (b) Show that is feasible for (8.10). To show this, [...]

## Learning From Data – A Short Course: Problem 8.7

For any with and even, show that there exists a balanced dichotomy that satisfies , and (This is the geometric lemma that is need to bound the VC-dimension of -fat hyperplanes by .)  The following steps are a guide for the proof. Suppose you randomly select of the labels  to be , the others being [...]

## Learning From Data – A Short Course: Exercise 8.7

Assume that the data is restricted to lie in a unit sphere. (a) Show that is non-increasing in . Suppose a dichotomy is shattered by a hyperplane with margin  then it is also shattered by that hyperplane with margin  as there is no margin argument in final hypothesis representation. (b) In 2 dimensions, show that  for [...]

## Learning From Data – A Short Course: Exercise 8.3

For separable data that contain both positive and negative examples, and a separating hyperplane , define the positive-side margin to be the distance between and the nearest data point of class . Similarly, define the negative-side margin to be the distance between and the nearest data point of class . Argue that if is the [...]

## Learning From Data – A Short Course: Exercise 8.1

Assume contains two data points and . Show that: (a) No hyperplane can tolerate noise radius greater than . Assume such hyperplane exists and we call it .     Let  is the line connecting two points and , as two points stay on different sides seperated by the hyperplane , it’s always true that crosses [...]

## Learning From Data – A Short Course: Problem 7.1

Page 43 Implement the decision function below using a 3-layer perceptron. First I’ll construct a rectangle like this: It’s easy to see how: Consider the four lines , , , and what we want is the hypothesis . The corresponding MLP: Next I’ll try to construct a cooler shape: Now consider the three lines , and [...]

## Learning From Data – A Short Course: Exercise 8.17

Page 45 Show that is an upper bound on the , where  is the classification error. We consider the error and on data point . (correct classification):     (wrong classification): We have:     Hence:     So the statement follows.

## Learning From Data – A Short Course: Exercise 8.12

Page 29 If all the data is from one class, then for . (a) What is ? (b) What is ? From (8.23) we have . As all the data is from one class, we also have: . Hence:

## Backpropagation in Convolutional (Neural) Network

Neural networks and deep learning, Chapter 6: Backpropagation in a convolutional network The core equations of backpropagation in a network with fully-connected layers are (BP1)-(BP4) (link). Suppose we have a network containing a convolutional layer, a max-pooling layer, and a fully-connected output layer, as in the network discussed above. How are the equations of backpropagation [...]

## Proof for Softmax Regression gradient

The notation of this proof comes from the article Softmax Regression by UFLDL Tutorial.     We have:       When :                                 When :                             Now we look carefully at the [...]