Learning From Data – A Short Course: Problem 8.7

For any x_{1},..., x_{N} with  \left \| x_{n} \right \| \leq R and N even, show that there exists a balanced dichotomy y_{1},..., y_{N} that satisfies

 \sum_{n=1}^{N}y_{n} = 0, and  \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \| \leq \frac{NR}{\sqrt{N-1}}

(This is the geometric lemma that is need to bound the VC-dimension of \rho-fat hyperplanes by  \left \lceil \frac{R^{2}}{\rho^{2}} \right \rceil + 1.)  The following steps are a guide for the proof.

Suppose you randomly select \frac{N}{2} of the labels y_{1},..., y_{N} to be + 1, the others being - 1. By construction,  \sum_{n=1}^{N}y_{n} = 0.

(a) \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2} = \sum_{n=1}^{N}\sum_{m=1}^Ny_{n}y_{m}x_{n}^{T}x_{m}

    \[ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2} = \left ( \sum_{n=1}^{N}y_{n}x_{n} \right )^{T}\left ( \sum_{m=1}^{N}y_{m}x_{m} \right ) = \sum_{n=1}^{N}y_{n}x^{T}_{n}\sum_{m=1}^{N}y_{m}x_{m} = \sum_{n=1}^{N}\sum_{m=1}^Ny_{n}y_{m}x_{n}^{T}x_{m} \]

(b) When n = m, what is y_{n}y_{m}? Show that  \mathbb{P}[y_{n}y_{m} = 1] = \left ( \frac{N}{2} - 1 \right ) / (N - 1) when n \neq m. Hence show that

    \[ \mathbb{E}[y_{n}y_{m}] = \left\{\begin{matrix} 1 & m = n\\ -\frac{1}{N-1} & m \neq n \end{matrix}\right. \]

When n = m, y_{n}y_{m} = y^{2}_{n} = y^{2}_{m} = 1.

When n \neq m, we have  N(\frac{N}{2} - 1) ways to choose y_{n} and y_{m} such that they share the same sign (first we choose one y_{n} out of N, second we choose one y_{m} such that it shares the same sign as y_{n} out of \frac{N}{2}, minus 1 for the case n = m), we also have N(N - 1) ways to choose y_{n} and y_{m} for the product y_{n}y_{m} (first we choose one y_{n} out of N, second we choose one y_{m} out of N, minus 1 for the case n = m), hence:

    \[ \mathbb{P}[y_{n}y_{m} = 1] = \frac{N(\frac{N}{2} - 1)}{N(N - 1)} = \frac{\frac{N}{2} - 1}{N - 1} \]

    \[ \mathbb{E}[y_{n}y_{m}] = \left\{\begin{matrix} 1 & m = n\\ 1 \times \frac{\frac{N}{2} - 1}{N - 1} - 1 \times \left ( 1 - \frac{\frac{N}{2} - 1}{N - 1} \right )& m \neq n \end{matrix}\right. \]

    \[ \Rightarrow \mathbb{E}[y_{n}y_{m}] = \left\{\begin{matrix} 1 & m = n\\ 2 \times \frac{\frac{N}{2} - 1}{N - 1} - 1 & m \neq n \end{matrix}\right. \]

    \[ \Rightarrow \mathbb{E}[y_{n}y_{m}] = \left\{\begin{matrix} 1 & m = n\\ \frac{N - 2}{N - 1} - 1 & m \neq n \end{matrix}\right. \]

    \[ \Rightarrow \mathbb{E}[y_{n}y_{m}] = \left\{\begin{matrix} 1 & m = n\\ -\frac{1}{N-1} & m \neq n \end{matrix}\right. \]

(c) Show that

 \mathbb{E} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2} \right ] = \frac{N}{N-1}\sum_{n=1}^{N}\left \| x_{n} - \bar{x} \right \|^{2}

where the average vector  \bar{x} = \frac{1}{N}\sum_{n=1}^{N}x_{n}.

    \[ \mathbb{E} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2}  \right ] = \mathbb{E} \left [ \sum_{n=1}^{N}\sum_{m=1}^Ny_{n}y_{m}x_{n}^{T}x_{m} \right ] = \sum_{n=1}^{N}\sum_{m=1}^N \mathbb{E} \left [ y_{n}y_{m}x_{n}^{T}x_{m} \right ] = \sum_{n=1}^{N}\sum_{m=1}^N \mathbb{E} \left [ y_{n}y_{m} ] \mathbb{E} \left [ x_{n}^{T}x_{m} \right ] \]

The last equality is still a question to me. It’s probably that when you use dataset to train supervisedly, you have assumed that y depends on x. If not, the whole process is meaningless.

    \[ \mathbb{E} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2}  \right ] = \sum_{n=1}^{N}\sum_{m=1}^N \mathbb{E} \left [ y_{n}y_{m} \right ] \mathbb{E} \left [ x_{n}^{T}x_{m} \right ] \]

    \[ = \sum_{n=1}^{N}\mathbb{E} \left [ x_{n}^{T}x_{n} \right ] + \sum_{n=1}^{N}\sum_{m=1, m \neq n}^N -\frac{1}{N - 1} \mathbb{E} \left [ x_{n}^{T}x_{m} \right ] \]

    \[ = \sum_{n=1}^{N}\frac{1}{N}\sum_{m=1}^{N}\ x_{m}^{T}x_{m} + \sum_{n=1}^{N} - \mathbb{E} \left [ x_{n}^{T}x_{m} \right ] \]

    \[ = \sum_{n=1}^{N}\ x_{n}^{T}x_{n} - N \times \mathbb{E} \left [ x_{n}^{T}x_{m} \right ]  \]

    \[ \mathbb{E} \left [ x_{n}^{T}x_{m} \right ] = \frac{1}{N(N-1)}\sum_{n=1}^{N}x_{n}^{T}\sum_{m=1,m \neq n}^{N} x_{m} \]

    \[ = \frac{1}{N(N-1)}\sum_{n=1}^{N}x_{n}^{T}\left ( \sum_{m=1}^{N}x_{m} - x_{n} \right ) \]

    \[ = \frac{1}{N-1}\sum_{n=1}^{N}x_{n}^{T}\left ( \frac{1}{N}\sum_{m=1}^{N}x_{m} - \frac{1}{N}x_{n} \right ) \]

    \[ = \frac{1}{N-1}\sum_{n=1}^{N}x_{n}^{T}\bar{x} - \frac{1}{N(N-1)}\sum_{n=1}^{N}x_{n}^{T}x_{n} \]

    \[ \mathbb{E} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2}  \right ] = \sum_{n=1}^{N}\ x_{n}^{T}x_{n} - N \times \mathbb{E} \left [ x_{n}^{T}x_{m} \right ] \]

    \[ = \sum_{n=1}^{N}\ x_{n}^{T}x_{n} - \frac{N}{N-1}\sum_{n=1}^{N}x_{n}^{T}\bar{x} + \frac{1}{N-1}\sum_{n=1}^{N}x_{n}^{T}x_{n} \]

    \[ = \frac{N}{N-1}\sum_{n=1}^{N} \left (x_{n}^{T}x_{n} - x_{n}^{T}\bar{x}  \right ) \]

It looks like we are still missing some components, let’s check back the formula we have to derive at:

    \[ \mathbb{E} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2} \right ] = \frac{N}{N-1}\sum_{n=1}^{N}\left \| x_{n} - \bar{x} \right \|^{2} \]

    \[ = \frac{N}{N-1}\sum_{n=1}^{N}\left ( x_{n} - \bar{x} \right )^{T}\left ( x_{n} - \bar{x} \right ) \]

    \[ = \frac{N}{N-1}\sum_{n=1}^{N}\left ( x_{n}^{T} - \bar{x}^{T} \right )\left ( x_{n} - \bar{x} \right ) \]

    \[ = \frac{N}{N-1}\sum_{n=1}^{N}\left ( x_{n}^{T}x_{n} - x_{n}^{T}\bar{x} - \bar{x}^{T}x_{n} + \bar{x}^{T}\bar{x} \right ) \]

Now let’s consider the following missing components:

    \[ \frac{N}{N-1}\sum_{n=1}^{N}\left ( - \bar{x}^{T}x_{n} + \bar{x}^{T}\bar{x} \right ) \]

    \[ = -\frac{N}{N-1}\sum_{n=1}^{N}x_{n}^{T}\bar{x} + \frac{N}{N-1}\sum_{n=1}^{N}\bar{x}^{T}\bar{x} \]

    \[ = -\frac{N}{N-1}\sum_{n=1}^{N}x_{n}^{T}\bar{x} + \frac{N^{2}}{N-1}\bar{x}^{T}\bar{x} \]

    \[ = -\frac{N}{N-1}\sum_{n=1}^{N}x_{n}^{T}\bar{x} + \frac{N^{2}}{N-1}\left (\frac{1}{N}\sum_{n=1}^{N}x_{n}  \right )^{T}\bar{x} \]

    \[ = -\frac{N}{N-1}\sum_{n=1}^{N}x_{n}^{T}\bar{x} + \frac{N}{N-1}\sum_{n=1}^{N}x_{n}^{T}\bar{x} \]

    \[ = 0 \]

So yes, we have gained what we need to show.

(d) Show that  \sum_{n=1}^{N}\left \| x_{n} - \bar{x} \right \|^{2} \leq \sum_{n=1}^{N}\left \| x_{n} \right \|^{2} \leq NR^{2}.

Based on the result from Problem 8.6, we have:

    \[ \sum_{n=1}^{N} \left \| x_{n} - \mu  \right \|^{2} \geq \sum_{n=1}^{N} \left \| x_{n} - \bar{x}\right \|^{2} \]

If we set \mu = 0, then:

    \[ \sum_{n=1}^{N} \left \| x_{n} \right \|^{2} \geq \sum_{n=1}^{N} \left \| x_{n} - \bar{x}\right \|^{2} \]

We also have:

    \[ \left \| x_{n} \right \| \leq R \Rightarrow \sum_{n=1}^{N} \left \| x_{n} \right \|^{2} \leq \sum_{n=1}^{N} R^{2} = NR^{2} \]

So the statement follows.

(e) Conclude that

    \[ \mathbb{E} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|^{2} \right ] \leq \frac{N^{2}R^{2}}{N - 1} \]

and hence that

    \[ \mathbb{P} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \| \leq \frac{NR}{\sqrt{N - 1}} \right ] > 0 \]

This means for some choice of y_{n},   \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \| \leq \frac{NR}{\sqrt{N-1}}.

Remember that:

    \[ \text{Var}(X) = \mathbb{E}\left [ \left ( X - \mathbb{E}\left [ X \right ] \right )^{2} \right ] = \mathbb{E}\left [ X^{2} \right ] - \left ( \mathbb{E} \left [ X \right ] \right )^{2} \geq 0 \]

So it is true that: \mathbb{E}[X] \leq \sqrt{\mathbb{E}[X^{2}]}, hence:

    \[ \mathbb{E} \left [\left \| \sum_{n=1}^{N}y_{n}x_{n} \right \|  \right ] \leq \frac{NR}{\sqrt{N - 1}} \]

Expected value is mean value, what does mean value imply? It implies that there is at least one number greater than or equal the mean value and there is at least one number less than or equal the mean value. Hence:

    \[ \mathbb{P} \left [ \left \| \sum_{n=1}^{N}y_{n}x_{n} \right \| \leq \frac{NR}{\sqrt{N - 1}} \right ] > 0 \]

So the statement follows.


Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *