Learning From Data – A Short Course: Exercise 8.3

For separable data that contain both positive and negative examples, and a separating hyperplane h, define the positive-side margin p_{+}(h) to be the distance between h and the nearest data point of class + 1. Similarly, define the negative-side margin p_{-}(h) to be the distance between h and the nearest data point of class - 1. Argue that if h is the optimal hyperplane, then p_{+}(h) = p_{-}(h). That is, the thickness of the cushion on either side of the optimal h is equal.

Suppose p_{+}(h) < p_{-}(h), then:

    \[ \left\{\begin{matrix} p_{+}(h) = \frac{1}{\left \| w^{*} \right \|}\\ p_{-}(h) = \frac{y_{n}(w^{*T}x_{n} + b^{*})}{\left \| w^{*} \right \|}, y_{n}(w^{*T}x_{n} + b^{*}) = k > 1 \end{matrix}\right. \]

The nearest positive points then stay in hyperplane  w^{*T}x + b^{*} - 1 = 0, while the nearest negative points stay in hyperplane y(w^{*T}x + b^{*}) = k \Leftrightarrow - (w^{*T}x + b^{*}) = k \Leftrightarrow w^{*T}x + b^{*} + k = 0. More general:

    \[ \left\{\begin{matrix} \forall (x_{n}, +1) \in D, w^{*T}x_{n} + b^{*} \geq 1\\ \forall (x_{n}, -1) \in D, w^{*T}x_{n} + b^{*} \leq -k \end{matrix}\right. \]

Now consider the hyperplane: h' : w^{*T}x + b^{*} + \frac{k - 1}{2} = 0, first it is still a seperating hyperplane, quick check:

    \[ \forall (x_{n}, +1) \in D, w^{*T}x_{n} + b^{*} + \frac{k - 1}{2} \geq 1 + \frac{k - 1}{2} = \frac{k + 1}{2} > 0 \]

    \[ \forall (x_{n}, -1) \in D, w^{*T}x_{n} + b^{*} + \frac{k - 1}{2} \leq -k + \frac{k - 1}{2} = -\frac{k + 1}{2}  < 0 \]

Second, we have:

    \[ \text{dist}(h', (x_{n},y_{n})) = \frac{y_{n}(w^{*T}x_{n} + b + \frac{k - 1}{2})}{\left \| w^{*} \right \|} \]

So yes, the nearest positive points are still the old nearest positive points, the nearest negative points are still the old the nearest negative points:

    \[ \begin{cases} y_{n}(w^{*T}x_{n} + b + \frac{k - 1}{2}) \geq 1 + \frac{k-1}{2} = \frac{k+1}{2} & \text{ if } y_{n} = +1 \\ y_{n}(w^{*T}x_{n} + b + \frac{k - 1}{2}) \geq k - \frac{k-1}{2} = \frac{k+1}{2} & \text{ if } y_{n} = -1 \end{cases} \]

The only difference is that: p_{+}(h') = p_{-}(h') = \frac{k+1}{2}\frac{1}{\left \| w^{*} \right \|} > p_{+}(h) (due to the fact that k > 1). So h cannot be the optimal hyperplane if p_{+}(h) < p_{-}(h). The case p_{-}(h) < p_{+}(h) is similar, I guess.


Facebooktwittergoogle_plusredditpinterestlinkedinmail

1 comment on “Learning From Data – A Short Course: Exercise 8.3”

Leave a Reply

Your email address will not be published. Required fields are marked *