[Notes] Learning From Data – A Short Course: e-Chapter 7

  • Page 18:
    • Need an explanation for  \mathbb{E}_{w}\left [ \left | w^{T}x_{n} \right |^{2} \right ] = \sigma_{w}^{2} \left \| x_{n} \right \|^{2}.
  • Page 20:
    • m^{d}: m is number of node in the first layer, d is number of node in the input layer. Here is my guess: Each input node i must connect to at least one node j in the first layer (that is w_{ij}^{1} > 0). So the first input node can choose one in m hidden nodes to connnect to and the second input node can also choose one in m hidden nodes to connect to, et cetera, hence: m \times m \times ... \times m = m^{d}.
  • Page 26:
    • How early stopping is actually related to weight decay? Please refer to the Figure on Page 130 [Chapter 4].

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *