## Learning From Data – A Short Course: Exercise 4.9

Page 142: Referring to Figure 4.10, why are both curves increasing with ? Why do they converge to each other with increasing ? For the first question, as we use more data to validate, we use less data to train. Less data to train will output worse final hypothesis, hence the increase of both curves. [...]

## Learning From Data – A Short Course: Exercise 4.8

Page 142: Is an unbiased estimate for the out-of-sample error ? Yes, it is. Because (“unbiased estimate” and “biased estimate” are statistics’ definitions).

## Learning From Data – A Short Course: Exercise 4.6

Page 133: We have seen both the hard-order constraint and the soft-order constraint. Which do you expect to be more useful for binary classification using the perceptron model? Onwait for a reply here. After the first post, I think I will choose the soft-order constraint with some regularizer that can actually constrain the model (I’m wondering [...]

## Learning From Data – A Short Course: Problem 4.8

Page 156: In the augmented error minimization with and , assume that is differentiable and use gradient descent to minimize :     Show that the update rule above is the same as     A note here is that by page 131, we know that: .

## Learning From Data – A Short Course: Exercise 4.5

Page 131: A more general soft constraint is the Tikhonov regularization constraint     which can capture relationships among the (the matrix is the Tikhonov regularizer). (a) What should be to obtain the constraint ? I think in this case. (b) What should be to obtain the constraint ? is a matrix with first components of [...]

## Learning From Data – A Short Course: Exercise 4.7

Page 139: Fix (learned from ) and define . We consider how depends on . Let     be the pointwise variance in the out-of-sample error of . (a) Show that . We have: Because each is independent from each other, so: Reference: Properties of variance and covariance. (b) In a classification problem, where , express [...]

## Learning From Data – A Short Course: Exercise 4.3

Page 125: Deterministic noise depends on , as some models approximate better than others. (a) Assume is fixed and we increase the complexity of . Will deterministic noise in general go up or down? Is there a higher or lower tendency to overfit? Deterministic noise in general will go up because it is harder for [...]

## Learning From Data – A Short Course: Problem 2.22

Page 74: When there is noise in the data, , where . If is a zero-mean noise random variable with variance , show that the bias-variance decomposition becomes     We have: We split the above expression in two sub-expressions, in which: Together, we can derive that:

## Learning From Data – A Short Course: Problem 1.3

Page 33: Prove that the PLA eventually converges to a linear separator for separable data. The following steps will guide you through the proof. Let be an optimal set of weights (one which separates the data). The essential idea in this proof is to show that the PLA weights get “more aligned” with with every [...]

## Learning From Data – A Short Course: Exercise 1.3

Page 8: The weight update rule in (1.3) has the nice interpretation that it moves in the direction of classifying correctly. (a) Show that . [Hint: x(t) is misclassified by .] Because is misclassified by so: : . Hence: . : . Hence: . So: . (b) Show that . [Hint: Use (1.3).] My solution for (b) is wrong. [...]