Learning From Data – A Short Course: Exercise 7.11

Page 24

For weight elimination, show that \frac{\partial E_{\text aug}}{\partial w^{(l)}_{ij}} = \frac{\partial E_{\text in}}{\partial w^{(l)}_{ij}} + 2\frac{\lambda}{N}\times\frac{w^{(l)}_{ij}}{(1+(w^{(l)}_{ij})^{2})^{2}}.

I have this differential formula:  d\frac{u}{v} = \frac{vdu - udv}{v^{2}}. Now I consider the following derivative:

    \[ f'(x) = \left ( \frac{x^{2}}{1+x^{2}} \right )' \]

    \[ = \frac{(1+x^{2})(x^{2})' - (x^{2})(1+x^{2})'}{(1+x^{2})^{2}} \]

    \[ = \frac{(1+x^{2})(2x) - (x^{2})(2x)}{(1+x^{2})^{2}} \]

    \[ = \frac{2x}{(1+x^{2})^{2}} \]

So we have:

    \[ f'(w^{(l)}_{ij}) = \frac{2w^{(l)}_{ij}}{(1+(w^{(l)}_{ij})^{2})^{2}} \]

Argue that weight elimination shrinks small weights faster than large ones.

There are many ways to do this. One of the fastest way that I can think of is to Google for “graph for x / (1+x^2)^2” haha. The more traditional way would be: Take the derivative of  g(x) = \frac{x}{(1+x^{2})^{2}} (I have eliminated the number 2 here because it make trivial effect to the argument), then solve the equation (I admit that I will let Maple do these computational jobs):

    \[ g'(x) = 0 \Rightarrow x \in \left \{ -\frac{\sqrt{3}}{3}, \frac{\sqrt{3}}{3} \right \} \]

I check and see that g''\left ( -\frac{\sqrt{3}}{3} \right ) > 0 (local minimum) and g''\left ( \frac{\sqrt{3}}{3} \right ) < 0 (local maximum).

g'(x) -\infty -\frac{\sqrt{3}}{3} +\frac{\sqrt{3}}{3} +\infty
- 0 + 0 -

That means on the interval (-\infty, -\frac{\sqrt{3}}{3}), x decrease then y increase

The only concern left is how the function g(x) acts when x  goes to infinity? Well, it’s easy to see that  \lim_{x \rightarrow \infty} g(x) = 0. I also observe that:

    \[ \begin{cases} g(x) = 0 & \text{ if } x = 0\\ g(x) < 0 & \text{ if } x < 0\\ g(x) > 0 & \text{ if } x > 0 \end{cases} \]

2016-08-30_162548

So far so good.

I should have realized before that  g(-x) = -g(x) and xg(x) \geq 0, so if I’m interested in how | g(x) | behaves when | x | is small / large in general, I only need to consider how g(x) behaves when x > 0 and then deduce the other case’s result.

    \[ \begin{cases} | x | \uparrow \Rightarrow | g(x) | \uparrow & \text{ if } x \in \left [ -\frac{\sqrt{3}}{3}, \frac{\sqrt{3}}{3} \right ]\\ x| \uparrow \Rightarrow | g(x) | \downarrow & \text{ if } x \in \left ( -\infty, -\frac{\sqrt{3}}{3} \right ) \cup \left ( \frac{\sqrt{3}}{3}, +\infty \right )\\ |g(x)| \rightarrow 0 & \text{ if } |x| \rightarrow \infty \end{cases} \]

So the statement follows.


Facebooktwitterredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *