Suppose you run gradient descent for 1000 iterations. You have 500 examples in , and you use 450 for and 50 for . You output the weight from iteration 50, with and .
(a) Is an unbiased estimate of ?
(b) Use the Hoeffding bound to get a bound for using plus an error bar. Your bound should hold with probability at least .
If the problem is classification then please refer to (2.1) formula on Page 40 [Section 2.1] and then replace , , . So the bound will be: .
(c) Can you bound using or do you need more information?
As cardinality of the hypothesis set at 1000 iterations is infinite, the (2.1) formula does not longer hold even the problem is binary classification. In the case of binary classification, we will use the (2.12) formula on Page 53 [Section 2.1]. In other cases, I guess we will use some similar formula involving . In either cases, I guess that we lack the information to actually be able to compute the bound.