## Learning From Data – A Short Course: Exercise 3.12

Page 103: We know that in the Euclidean plane, the perceptron model cannot implement all 16 dichotomies on 4 points. That is . Take the feature transform in (3.12). (a) Show that . We have proved (in Exercise 2.4) that the hypothesis set of perceptron model in Euclidean plane has , and by the definition of [...]

## Learning From Data – A Short Course: Exercise 3.10

Page 98: (a) Define an error for a single data point to be     Argue that PLA can be viewed as SGD on with learning rate . when   means that agrees with (no error at that point): . when  and  disagrees (that point is misclassified): . Hence: When there is no error at the [...]

## Learning From Data – A Short Course: Exercise 3.8

Page 94: The claim that is the direction which gives largest decrease in only holds for small . Why? is small and ignorable only when is small. P/s: I’m bored. That’s why I’m posting separate post.

## Learning From Data – A Short Course: Exercise 3.4

Page 87: Consider a noisy target for generating the data, where is a noise term with zero mean and variance, independently generated for every example . The expected error of the best possible linear fit to this target is thus . For the data , denote the noise in as and let , assume that [...]

## Learning From Data – A Short Course: Exercise 3.3

Page 87, Exercise 3.3: Consider the hat matrix , where   is an by matrix, and is invertible. (a) Show that is symmetric. (b) Show that for any positive integer . (c) If is the identity matrix of size , show that for any positive integer . (d) Show that , where the trace is [...]

## Learning From Data – A Short Course: Exercise 1.10

Page 23, Exercise 1.10 Here is an experiment that illustrates the difference between a single bin and multiple bins. Run a computer simulation for flipping 1,000 fair coins. Flip each coin independently 10 times. Let’s focus on 3 coins as follows: is the first coin flipped; is a coin you choose at random ; is [...]

## Learning From Data – A Short Course: Exercise 1.9

Page 19, Exercise 1.9. If , use the Hoeffding Inequality to bound the probability that a sample fo 10 marbles will have and compare the answer to the previous exercise.     means any number slightly less than (reference). Hence:     If so:     We observe that:     is true.

## Learning From Data – A Short Course: Exercise 1.8

Page 19, Exercise 1.8. If , what is the probability that a sample of 10 marbles will have ?     Here we have N = 10 (a sample of 10 marbles):         Hence:

## Learning From Data – A Short Course: Problem 2.5

Page 69, Problem 2.5. Prove by induction that , hence     Base cases: Induction step for :     We will prove later, for now, we will use its result:     :     :     So  follows. Prove by induction:     Base case: Induction step for :       [...]

## [Book Note] Learning From Data – A Short Course

This book should be read along with watching its corresponding online course. However, you should not watch the online course alone. The Bin Model PAGE NOTE Page 19: Reference: Malik Magdon-Ismail. Page 31: y here is a random variable. Reference: Malik Magdon-Ismail, The Elements of Statistical Learning page 28. Page 32: “we will assume the target to be a [...]