# Learning From Data – A Short Course: Exercise 7.7

**Page 11**

For the sigmoidal perceptron, , let the in-sample error be . Show that:

If , what happens to the gradient; how this is related to why it is hard to optimize the perceptron.

We observe that:

That means when is large enough, the Gradient Descent Algorithm will not make much change to . Even worse, it may stop when is currently at its largest possible value () and return that large as the final hypothesis that it could have found.