Learning From Data – A Short Course: Exercise 7.7

Page 11

For the sigmoidal perceptron, h(x) = \tanh(w^{T}x), let the in-sample error be E_{in}(w) = \frac{1}{N}\sum_{n=1}^{N}(\tanh(w^{T}x_{n}) - y_{n})^{2}. Show that:
\nabla E_{in}(w) = \frac{2}{N}\sum_{n=1}^{N}(\tanh(w^{T}x_{n}) - y_{n})(1 - \tanh^{2}(w^{T}x_{n}))x_{n}
If w \rightarrow \infty, what happens to the gradient; how this is related to why it is hard to optimize the perceptron.

    \[ \nabla E_{in}(w) = \nabla_{w} (\frac{1}{N}\sum_{n=1}^{N}(\tanh(w^{T}x_{n}) - y_{n})^{2}) \]

    \[ = \frac{1}{N}\sum_{n=1}^{N}\nabla_{w} (\tanh(w^{T}x_{n}) - y_{n})^{2}) \]

    \[ = \frac{1}{N}\sum_{n=1}^{N}\frac{\partial (\tanh(w^{T}x_{n}) - y_{n})^{2}}{\partial (\tanh(w^{T}x_{n}) - y_{n})}\nabla_{w}(\tanh(w^{T}x_{n}) - y_{n}) \]

    \[ = \frac{1}{N}\sum_{n=1}^{N} 2(\tanh(w^{T}x_{n}) - y_{n})\nabla_{w} (\tanh(w^{T}x_{n}) - y_{n}) \]

    \[ = \frac{2}{N}\sum_{n=1}^{N} (\tanh(w^{T}x_{n}) - y_{n})\nabla_{w} (\tanh(w^{T}x_{n}) - y_{n}) \]

    \[ = \frac{2}{N}\sum_{n=1}^{N} (\tanh(w^{T}x_{n}) - y_{n})\frac{\partial (\tanh(w^{T}x_{n}) - y_{n})}{\partial (w^{T}x)}\nabla_{w} (w^{T}x_{n}) \]

    \[ = \frac{2}{N}\sum_{n=1}^{N} (\tanh(w^{T}x_{n}) - y_{n})(1 - \tanh^{2}(w^{T}x_{n}))\nabla_{w}(w^{T}x_{n}) \]

    \[ = \frac{2}{N}\sum_{n=1}^{N} (\tanh(w^{T}x_{n}) - y_{n})(1 - \tanh^{2}(w^{T}x_{n}))x_{n} \]

We observe that:

 w \rightarrow \infty \Rightarrow w^{T}x \rightarrow \infty \Rightarrow \begin{cases} & \tanh(w^{T}x) \rightarrow 1 \text{ if } w^{T}x \rightarrow +\infty \\ & \tanh(w^{T}x) \rightarrow -1 \text{ if } w^{T}x \rightarrow -\infty \end{cases}\\ \Rightarrow \tanh^{2}(w^{T}x) \rightarrow 1\\ \Rightarrow \tanh^{2}(w^{T}x) - 1 \rightarrow 0\\ \Rightarrow \nabla E_{in}(w) \rightarrow 0

That means when w is large enough, the Gradient Descent Algorithm will not make much change to w. Even worse, it may stop when E_{in} is currently at its largest possible value (\forall (x_{n}, y_{n}), (\tanh(w^{T}x_{n}))y_{n} < 0) and return that large w as the final hypothesis that it could have found.


Facebooktwitterredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *