Learning From Data – A Short Course: Exercise 8.14

Suppose that we removed a data point (x_{n}, y_{n}) with \alpha^{*}_{n} = 0.

(a) Show that the previous optimal solution \alpha^{*} remains feasible for the new dual problem (8.21) (after removing \alpha^{*}_{n}).

L is the old dual problem while L' is the new dual problem.

    \[ L(\alpha^{*}) = \frac{1}{2}\sum_{j=1}^{N}\sum_{i=1}^{N}y_{i}y_{j}\alpha_{i}\alpha_{j}x_{i}^{T}x_{j} - \sum_{i=1}^{N}\alpha_{i} = \frac{1}{2}\sum_{j=1,j\neq n}^{N}\sum_{i=1,i\neq n}^{N}y_{i}y_{j}\alpha_{i}\alpha_{j}x_{i}^{T}x_{j} - \sum_{i=1,i\neq n}^{N}\alpha_{i} = L'(\alpha^{*} \text{ with no } \alpha^{*}_{n}) \]

Because \alpha_{n} appears when x_{n} appears and reverse. So when we replace the value of \alpha^{*} into (8.21), there is no difference between setting \alpha^{*}_{n} = 0 and removing the data point (x_{n}, y_{n}).

(b) Show that if there is any other feasible solution for the new dual that as a lower objective value than \alpha^{*}, this would contradict the optimality of \alpha^{*} for the original dual problem.

If there exists \alpha' such that L'(\alpha') < L(\alpha^{*}) then we can have \alpha^{*}' such that:

    \[ \left\{\begin{matrix} \alpha^{*}'_{i} = \alpha'_{i}, i \neq n\\ \alpha^{*}'_{n} = 0 \end{matrix}\right. \]

Hence L(\alpha^{*}') < L(\alpha^{*}). This would contradict the optimality of \alpha^{*} for the original problem

(c) Hence, show that \alpha^{*} (minus \alpha^{*}_{n}) is optimal for the new dual.

By contradiction proof, the above statement follows.

(d) Hence, show that the optimal fat-hyperplane did not change.

This statement trivially follows.

(e) Prove the bound on E_{cv} in (8.27).

I guess the formula (8.27) is about LOOCV method, so the following argument may only apply to LOOCV.

If we remove the non-support vectors then the optimal fat-hyperplane does not change, data points that were correctly classified before will remain correctly classified (e = 0). If we remove one of the support vectors then the optimal fat-hyperplane may change, what worse, it may incorrectly classify the removed support vector (e \leq 1). Hence the bound (8.27).


1 comment on “Learning From Data – A Short Course: Exercise 8.14”

Leave a Reply

Your email address will not be published. Required fields are marked *