Learning From Data – A Short Course: Problem 1.3
Page 33:
Prove that the PLA eventually converges to a linear separator for separable data. The following steps will guide you through the proof. Let
be an optimal set of weights (one which separates the data). The essential idea in this proof is to show that the PLA weights
get “more aligned” with
with every iteration. For simplicity, assume that
.
(a) Let
. Show that
.
Because seperate the data totally so, for any data point
we always have
(as argued in the Exercise 1.3). Hence
.
(b) Show that
, and conclude that
. [Hint: Use induction.]
We have: .
Hence:
We also have: (as proved in part (a)), hence:
.
So:
Base case: :
. (We have assumed that
).
Induction step:
Assume: .
We observe that: .
So the statement follows.
(c) Show that
.
Check the slide 13 of Lecture 01. We will use Law of Cosines here.
For , we have:
is the angle between
and
and
. Here is the proof:
.
Hence: .
The case can be proved similarly.
So:
The statement then follows.
(d) Show by induction that
, where
.
Base case: :
.
Induction step:
Assume: .
We observe: . (refer to part (c)).
So the statement follows.
(e) Using (b) and (d), show that:
and hence prove that:
Special thanks to MaciekLeks for the indication.
Refer to (b), we have:
Refer to (d), we have:
Hence [1]:
We have: , with
is the angle between
and
.
We observe that: . And as
increases, the right-hand side of [1] increases, this leads to left-hand side of [1] increases too. The increase of
means the increase of
, the increase of
means the decrease of
, the decrease of
means the closer
is to
. So as
increases,
converges to
.
And the bound for is:





