Learning From Data – A Short Course: Exercise 9.4

Let \hat{x_{1}} and \hat{x_{2}} be independent with zero mean and unit variance. You measure inputs x_{1} = \hat{x_{1}} and x_{2} = \sqrt{1 - \epsilon^{2}}\hat{x_{1}} + \epsilon\hat{x_{2}}.

(a) What are variance (x_{1}), variance (x_{2}) and covariance (x_{1}, x_{2})?

First, we have:

    \[ \mathbb{E}\left [ \hat{x_{1}} \right ] = \frac{1}{N}\sum_{n=1}^{N}\hat{x_{n1}} = 0 \]

    \[ \mathbb{E}\left [ \hat{x_{2}} \right ] = \frac{1}{N}\sum_{n=1}^{N}\hat{x_{n2}} = 0 \]

    \[ \text{Var}\left [ \hat{x_{1}} \right ] = \frac{1}{N}\sum_{n=1}^{N}\hat{x_{n1}}^{2} = 1 \]

    \[ \text{Var}\left [ \hat{x_{2}} \right ] = \frac{1}{N}\sum_{n=1}^{N}\hat{x_{n2}}^{2} = 1 \]

Now, we consider and get:

Expected values:

    \[ \mathbb{E}\left [ x_{1} \right ] = \mathbb{E}\left [ \hat{x_{1}} \right ] = 0 \]

    \[ \mathbb{E}\left [ x_{2} \right ] = \mathbb{E}\left [ \sqrt{1 - \epsilon^{2}}\hat{x_{1}} + \epsilon\hat{x_{2}} \right ] = \sqrt{1 - \epsilon^{2}}\mathbb{E}\left [ \hat{x_{1}} \right ] + \epsilon\mathbb{E}\left [ \hat{x_{2}} \right ] = 0 \]

    \[ \begin{array}{ll} \mathbb{E}[x_{1}x_{2}] &= \mathbb{E}[\hat{x_{1}}(\sqrt{1 - \epsilon^{2}}\hat{x_{1}} + \epsilon\hat{x_{2}})]\\ &= \mathbb{E}[\sqrt{1 - \epsilon^{2}}\hat{x_{1}}^{2} + \epsilon\hat{x_{1}}\hat{x_{2}}]\\ &=\sqrt{1 - \epsilon^{2}}\mathbb{E}[\hat{x_{1}}^{2}] + \epsilon\mathbb{E}[\hat{x_{1}}\hat{x_{2}}]\\ &= \sqrt{1 - \epsilon^{2}}\text{Var}(\hat{x_{1}}) + \epsilon\mathbb{E}[\hat{x_{1}}]\mathbb{E}[\hat{x_{2}}]\\ &= \sqrt{1 - \epsilon^{2}}\\ \end{array} \]

Variance:

    \[ \text{Var}\left ( x_{1} \right ) = \text{Var}\left ( \hat{x_{1}} \right ) = 1 \]

    \[ \begin{array}{ll} \text{Var}\left ( x_{2} \right ) &= \text{Var}\left ( \sqrt{1 - \epsilon^{2}}\hat{x_{1}} + \epsilon\hat{x_{2}}  \right )\\ &= \mathbb{E}[x_{2}^{2}] - \left ( \mathbb{E}[x_{2}] \right )^{2}\\ &= \mathbb{E}\left [\left ( \sqrt{1 - \epsilon^{2}}\hat{x_{1}} + \epsilon\hat{x_{2}}  \right )^{2} \right ]\\ &= \mathbb{E}\left [(1 - \epsilon^{2})\hat{x_{1}}^{2} + 2\epsilon\sqrt{1 - \epsilon^{2}}\hat{x_{1}}\hat{x_{2}} + \epsilon^{2}\hat{x_{2}}^{2} \right ]\\ &= \mathbb{E}\left [(1 - \epsilon^{2})\hat{x_{1}}^{2} \right ] + \mathbb{E}\left [2\epsilon\sqrt{1 - \epsilon^{2}}\hat{x_{1}}\hat{x_{2}} \right ] + \mathbb{E}\left [\epsilon^{2}\hat{x_{2}}^{2} \right ]\\ &= (1 - \epsilon^{2})\mathbb{E}\left [\hat{x_{1}}^{2} \right ] + 2\epsilon\sqrt{1 - \epsilon^{2}}\mathbb{E}\left [\hat{x_{1}}\hat{x_{2}} \right ] + \epsilon^{2}\mathbb{E}\left [\hat{x_{2}}^{2} \right ]\\ \end{array} \]

    \[ \begin{array}{ll} \text{Var}\left ( x_{2} \right ) &= (1 - \epsilon^{2})\text{Var}\left (\hat{x_{1}} \right ) + 2\epsilon\sqrt{1 - \epsilon^{2}}\mathbb{E}\left [\hat{x_{1}} \right ] \mathbb{E}\left [\hat{x_{2}} \right ] + \epsilon^{2}\text{Var}\left (\hat{x_{2}} \right )\\ &= (1 - \epsilon^{2}) + \epsilon^{2}\\ &= 1 \end{array} \]

Covariance:

    \[ \begin{array}{ll} \text{cov}(x_{1}, x_{2}) &= \mathbb{E}[x_{1}x_{2}] - \mathbb{E}[x_{1}]\mathbb{E}[x_{2}]\\ &= \sqrt{1 - \epsilon^{2}} \end{array} \]

(b) Suppose f(\hat{x}) = \hat{w}_{1}\hat{x}_{1} + \hat{w}_{2}\hat{x}_{2} (linear in the independent variables). Show that f is linear in the correlated inputs, f(x) = w_{1}x_{1} + w_{2}x_{2}. (Obtain w_{1}, w_{2} as functions of \hat{w}_{1}, \hat{w}_{2}.)

 

    \[ \left\{\begin{matrix} \hat{x}_{1} = x_{1}\\ \hat{x}_{2} = \frac{x_{2} - \sqrt{1 - \epsilon^{2}}x_{1}}{\epsilon} \end{matrix}\right. \]

    \[ f(\hat{x}) = \hat{w}_{1}\hat{x}_{1} + \hat{w}_{2}\hat{x}_{2} = f(\hat{x}) = \hat{w}_{1}x_{1} + \hat{w}_{2}\left ( \frac{x_{2} - \sqrt{1 - \epsilon^{2}}x_{1}}{\epsilon}  \right ) = \left ( \hat{w}_{1} - \hat{w}_{2}\frac{\sqrt{1 - \epsilon^{2}}}{\epsilon} \right )x_{1} + \frac{\hat{w}_{2}}{\epsilon}x_{2} \]

Let:

    \[ w_{1} = \left ( \hat{w}_{1} - \hat{w}_{2}\frac{\sqrt{1 - \epsilon^{2}}}{\epsilon} \right ) \]

    \[ w_{2} = \frac{\hat{w}_{2}}{\epsilon} \]

Please notice that w_{1} and w_{2} are linear function.

We get:

    \[ f(\hat{x}) = \hat{w}_{1}\hat{x}_{1} + \hat{w}_{2}\hat{x}_{2} = w_{1}x_{1} + w_{2}x_{2} = f(x) \]

That means:  f(\hat{x}) = f(x) is the linear combination of  \hat{x}_{1},  \hat{x}_{2},  x_{1} and  x_{2}.

(c) Consider the ‘simple’ target function f(\hat{x}) = \hat{x}_{1} + \hat{x}_{2}. If you perform regression with the correlated inputs x and regularization constraint w_{1}^{2} + w_{2}^{2} \leq C, what is the maximum amount of regularization you can use (minimum value of C) and still be able to implement the target?

    \[ w_{1}^{2} + w_{2}^{2} \leq C \Leftrightarrow  \left ( \hat{w}_{1} - \hat{w}_{2}\frac{\sqrt{1 - \epsilon^{2}}}{\epsilon} \right )^{2} + \left ( \frac{\hat{w}_{2}}{\epsilon} \right )^{2} \leq C \]

    \[ \Rightarrow \left ( 1 - \frac{\sqrt{1 - \epsilon^{2}}}{\epsilon} \right )^{2} + \left ( \frac{1}{\epsilon} \right )^{2} \leq C \Leftrightarrow C \geq 2\times\frac{1 - \epsilon\sqrt{1 - \epsilon^{2}}}{\epsilon^{2}} \]

(d) What happens to the minimum C as the correlation increases (\epsilon \rightarrow 0).

First we need to explain why as \epsilon \rightarrow 0, the correlation increases: Remember that x_{1} = \hat{x_{1}} and x_{2} = \sqrt{1 - \epsilon^{2}}\hat{x_{1}} + \epsilon\hat{x_{2}}. As \epsilon \rightarrow 0, the factor  \sqrt{1 - \epsilon^{2}}\hat{x_{1}} dominates x_{2} while x_{1} = \hat{x_{1}}, hence x_{2} and x_{1} are more correlated. When \epsilon \rightarrow 1 or \epsilon \rightarrow -1 , the factor  \epsilon\hat{x_{2}} dominates x_{2} and the factor  \sqrt{1 - \epsilon^{2}}\hat{x_{1}} has subtle influence on x_{2} so x_{2} and x_{1} are less correlated. Covariance also reveals this.

    \[ \lim_{\epsilon \rightarrow 0} 2\times\frac{1 - \epsilon\sqrt{1 - \epsilon^{2}}}{\epsilon^{2}} = +\infty \]

So C will approaches infinity as correlation increases.

(e) Assuming that there is significant noise in the data, discuss your results in the context of bias and var.

The more x_{1} and x_{2} are correlated, the more complex the hypothesis set is required, hence the lower the bias is and the higher the variance is. The higher the variance is, the more the model is susceptible to noise.


Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *