Page 139:

Fix (learned from ) and define . We consider how depends on . Let

be the pointwise variance in the out-of-sample error of .

(a) Show that .

We have:

Because each is independent from each other, so:

Reference: Properties of variance and covariance.

(b) In a classification problem, where , express in terms of .

First we observe that:

and:

Hence:

(c) Show that for any in a classification problem, .

First we consider the function: , we have: and . So reaches maxima value at and that is , which also means .

Similarly, we have:

(d) Is there a uniform upper bound for similar to (c) in the case of regression with squared error ?

Because the squared error is unbounded hence the variance of it cannot be bounded. However, the result (a) suggests that large may help reduce the variance.

(e) For regression with squared error, if we train using fewer points (smaller ) to get , do you expect to be higher or lower?

We have:

As we use fewer points to train, gets worse. As gets worse, and gets higher value, hence often gets higher, that also means is expected to be higher.

(f) Conclude that increasing the size of the validation set can result in a better or a worse estimate of .

Check the answer for (d) and (e), this question is also discussed at the same page.