Learning From Data – A Short Course: Problem 1.3

Page 33:

Prove that the PLA eventually converges to a linear separator for separable data. The following steps will guide you through the proof. Let w^{*} be an optimal set of weights (one which separates the data). The essential idea in this proof is to show that the PLA weights w(t) get “more aligned” with w^{*} with every iteration. For simplicity, assume that w(0) = 0.

(a) Let  \rho = \min_{1 \leq n \leq N} {y_{n}(w^{*T}x_{n})}. Show that \rho > 0.

Because w^{*} seperate the data totally so, for any data point x_{n} we always have  y_{n}(w^{*T}x_{n}) > 0 (as argued in the Exercise 1.3). Hence \rho > 0.

(b) Show that w^{T}(t)w^{*} \geq w^{T}(t-1)w^{*}+\rho, and conclude that  w^{T}(t)w^{*} \geq t\rho. [Hint: Use induction.]

We have: w^{T}(t - 1) = (w(t) - y(t)x(t))^{T} = w^{T}(t) - y(t)x^{T}(t).

Hence:

    \[ w^{T}(t-1)w^{*}+\rho = w^{T}(t)w^{*} - y(t)x^{T}(t)w^{*} + \rho \]

We also have: \rho \leq y(t)x^{T}(t)w^{*} (as proved in part (a)), hence: \rho - y(t)x^{T}(t)w^{*} \leq 0.

So:

    \[ w^{T}(t)w^{*} - y(t)x^{T}(t)w^{*} + \rho \leq w^{T}(t)w^{*} \]

Base case: t = 0: w^{T}(0)w^{*} \geq 0\rho \Leftrightarrow 0 \geq 0. (We have assumed that w(0) = 0).

Induction step:

Assume:  w^{T}(t + 1)w^{*} \geq (t + 1)\rho.

We observe that:  (t + 1)\rho = t\rho + \rho \leq w^{T}(t)w^{*} + \rho \leq w^{T}(t + 1)w^{*}.

So the statement follows.

(c) Show that  \left \| w(t) \right \|^{2} \leq \left \| w(t - 1) \right \|^{2} + \left \| x(t - 1) \right \|^{2}.

Check the slide 13 of Lecture 01. We will use Law of Cosines here.

For y(t - 1) = + 1, we have:

    \[ \left \| w(t) \right \|^{2} = \left \| w(t - 1) \right \|^{2} + \left \| x(t - 1) \right \|^{2} - 2\left \| w(t - 1) \right \| \left \| x(t - 1) \right \| \cos \theta \]

\theta is the angle between - w(t - 1) and x(t - 1) and \theta \leq 90. Here is the proof:

y(t - 1)w^{T}(t - 1)x(t - 1) < 0 \Rightarrow w^{T}(t - 1)x(t - 1) < 0 \Leftrightarrow \cos (180 - \theta) < 0 \Leftrightarrow 180 - \theta > 90 \Leftrightarrow \theta < 90.

Hence:  \cos \theta \geq 0  \Leftrightarrow - 2\left \| w(t - 1) \right \| \left \| x(t - 1) \right \| \cos \theta \leq 0.

The case y(t - 1) = - 1 can be proved similarly.

So:

\left \| w(t) \right \|^{2} = \left \| w(t - 1) \right \|^{2} + \left \| x(t - 1) \right \|^{2} - 2\left \| w(t - 1) \right \| \left \| x(t - 1) \right \| \cos \theta \leq \left \| w(t - 1) \right \|^{2} + \left \| x(t - 1) \right \|^{2}

The statement then follows.

(d) Show by induction that  \left \| w(t) \right \|^{2} \leq tR^{2}, where R = \max_{1 \leq n \leq N}{\left \| x_{n} \right \|}.

Base case: t = 0 \left \| w(0) \right \|^{2} \leq 0R^{2} \Leftrightarrow 0 \leq 0.

Induction step:

Assume:  \left \| w(t + 1) \right \|^{2} \leq (t + 1)R^{2}.

We observe:  (t + 1)R^{2} = tR^{2} + R^{2} \geq \left \| w(t) \right \|^{2} + R^{2} \geq \left \| w(t) \right \|^{2} + \left \| x(t - 1) \right \|^{2} \geq \left \| w(t + 1) \right \|^{2}. (refer to part (c)).

So the statement follows.

(e) Using (b) and (d), show that:

    \[ \frac{w^{T}(t)}{\left \| w(t) \right \|}w^{*} \geq \sqrt{t}.\frac{\rho}{R} \]

and hence prove that:

    \[ t \leq \frac{R^{2} \left \| w^{*} \right \|^{2}}{\rho^{2}} \]

Special thanks to MaciekLeks for the indication.

Refer to (b), we have:

    \[ \frac{w^{T}(t)}{\left \| w(t) \right \|}w^{*} \geq \frac{w^{T}(t-1)w^{*}+\rho}{\left \| w(t) \right \|} \geq \frac{t\rho}{\left \| w(t) \right \|} \]

Refer to (d), we have:

    \[ \frac{t\rho}{\left \| w(t) \right \|} \geq \frac{t\rho}{\sqrt{t}R} =  \frac{\sqrt{t}\rho}{R}\]

Hence [1]:

    \[ \frac{w^{T}(t)}{\left \| w(t) \right \|}w^{*} \geq \sqrt{t}.\frac{\rho}{R} \]

We have:  \frac{w^{T}(t)}{\left \| w(t) \right \|}w^{*} = \left \| w^{*} \right \| \cos \theta, with \theta is the angle between  w^{T}(t) and  w^{*}.

We observe that: \sqrt{t}.\frac{\rho}{R} \geq 0 \Rightarrow  \frac{w^{T}(t)}{\left \| w(t) \right \|}w^{*} \geq 0 \Leftrightarrow \left \| w^{*} \right \| \cos \theta \geq 0 \Leftrightarrow \cos \theta \geq 0 \Leftrightarrow \theta \leq 90. And as t increases, the right-hand side of [1] increases, this leads to left-hand side of [1] increases too. The increase of  \left \| w^{*} \right \| \cos \theta means the increase of  \cos \theta, the increase of \cos \theta means the decrease of  \theta, the decrease of  \theta means the closer w(t) is to w^{*}. So as t increases, w(t) converges to w^{*}.

And the bound for t is:

 \frac{w^{T}(t)}{\left \| w(t) \right \|}w^{*} \geq \sqrt{t}.\frac{\rho}{R} \Leftrightarrow \left \| w^{*} \right \| \cos \theta \geq \sqrt{t}.\frac{\rho}{R} \Rightarrow \left \| w^{*} \right \| \geq \sqrt{t}.\frac{\rho}{R} \Leftrightarrow t \leq \frac{R^{2} \left \| w^{*} \right \|^{2}}{\rho^{2}}

 


Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *