Page 87:

Consider a noisy target for generating the data, where is a noise term with zero mean and variance, independently generated for every example . The expected error of the best possible linear fit to this target is thus .

For the data , denote the noise in as and let , assume that is invertible. By following the steps below, show that the expected in-sample error of linear regression with respect to is given by The solution of this part of the exercise can be found here: (a) – (b) – (c) – (d). A note here is that:

We have: the event is not affected by the position of the data point that it occurs. Hence the probability distribution of is the same as the probability distribution of (the noise term of first chosen data point of data set), et cetera.

For the expected out-of-sample error, we take a special case which is easy to analyze. Consider a test data set , which shares the same input vectors with but with a different realization of the noise terms. Denote the noise in as and let . Define to be the average squared error on .

(e) Prove that .

The special test error is a very restricted case of the general out-of-sample error. Some detailed analysis shows that similar results can be obtained for the general case, as shown in Problem 3.11.

Training data set: . Hence, the final hypothesis: .

Test data set: .   Inherited from the results of the first part, we have:  ( and are just two different vectors, however the components’ value of them all come from probability distribution of random variable (not vector).)

We also have:  For further explanation please refer the solution of the first part.

Hence: 