Page 133: We have seen both the hard-order constraint and the soft-order constraint. Which do you expect to be more useful for binary classification using the perceptron model? Onwait for a reply here. After the first post, I think I will choose the soft-order constraint with some regularizer that can actually constrain the model (I’m wondering [...]
Page 156: In the augmented error minimization with and , assume that is differentiable and use gradient descent to minimize : Show that the update rule above is the same as A note here is that by page 131, we know that: .
Page 131: A more general soft constraint is the Tikhonov regularization constraint which can capture relationships among the (the matrix is the Tikhonov regularizer). (a) What should be to obtain the constraint ? I think in this case. (b) What should be to obtain the constraint ? is a matrix with first components of [...]
Page 139: Fix (learned from ) and define . We consider how depends on . Let be the pointwise variance in the out-of-sample error of . (a) Show that . We have: Because each is independent from each other, so: Reference: Properties of variance and covariance. (b) In a classification problem, where , express [...]
Page 125: Deterministic noise depends on , as some models approximate better than others. (a) Assume is fixed and we increase the complexity of . Will deterministic noise in general go up or down? Is there a higher or lower tendency to overfit? Deterministic noise in general will go up because it is harder for [...]