[Book Note] Computer Vision: A Modern Approach (2nd Edition)

  • Page 458: I’m not sure if I interpret the statement right but here is my guess: One of the risk function in the example should look like this:  R(1, s) = p(1 \rightarrow 1 | s) \times L(1 \rightarrow 1) + p(1 \rightarrow -1 | s) \times L(1 \rightarrow -1).
  • Page 459:
    • The cost when an x is refused to decide is:  \sum_{i} p(i|x) \times d = d.
    • The following arguments are applied for the Algorithm 15.1:
      • If d > 1 then we will rather to fail than to give it a thought whether we should decide or not. If d < 0 then we prefer to make no decision at all. Hence 0 < d < 1.
      • If we choose type k then the cost will be 1 - p(k|x) =  1 - p. Then we will have to compare the costs of choosing type k and choosing not to decide:
        • We will choose type k if: 1 - p < d \Leftrightarrow p > 1 - d.
        • We will choose not to decide if: 1 - p \geq d \Leftrightarrow p \leq 1 - d.
    • Reference: Bayesian Decision Theory.
  • Page 461:
    • \ln\frac{p(1|x)}{p(-1|x)} = a^{T}x\\ \Leftrightarrow \ln\frac{p(1|x)}{1-p(1|x)} = a^{T}x\\ \Leftrightarrow \frac{p(1|x)}{1-p(1|x)} = e^{a^{T}x}\\ \Leftrightarrow p(1|x) = \frac{e^{a^{T}x}}{1+e^{a^{T}x}}, same goes for p(-1|x).
      • Notes that   p(1|x) = \frac{e^{a^{T}x}}{1+e^{a^{T}x}} is sigmoid function.
    • For MLE stuffs you can take a look at Page 91 of Learning From Data – A Short Course. For further insight, Wikipedia also offers some insight at the end of the Principles section.
      • For this formula:  \frac{1+y_{i}}{2}a^{T}x-\ln(1+e^{a^{T}x}).
        • When y_{i} = 1 you will get:
           \frac{1+1}{2}a^{T}x-\ln(1+e^{a^{T}x})\\ = a^{T}x-\ln(1+e^{a^{T}x})\\ = \ln(e^{a^{T}x})-\ln(1+e^{a^{T}x})\\ = \ln{\frac{e^{a^{T}x}}{1+e^{a^{T}x}}}\\ = \ln{p(1|x)}
        • Same goes for y_{i} = -1.
    • It looks like the equality  \frac{1+y_{i}}{2}a^{T}x-\ln(1+e^{a^{T}x}) = \ln(1 + e^{-y_{i}a^{T}x}) is not true in general except the case where y_{i} = 1 or y_{i} = -1. Try the case  y_{i} = 0 and  a^{T}x = 1 then you will see the contradiction.
    • Wikipedia: “However, the logistic loss function does not assign zero penalty to any points. Instead, functions that correctly classify points with high confidence (i.e., with high values of |f(\vec{x})|) are penalized less.”
    • “but also due to examples that have \gamma_{i} near zero”: To be more specific, if \gamma_{i} and y_{i} has the same sign and \gamma_{i} is near zero then the loss will be also very large. Sep 23, 2016: I wonder if this note is correct?
  • Page 462:
    • “i.e., the example is close to the decision boundary”: Notice that we have:  \lim_{a^{T}x \rightarrow 0}{p(1|x)} = \lim_{a^{T}x \rightarrow 0}{\frac{e^{a^{T}x}}{1+e^{a^{T}x}}} = \frac{e^{0}}{1+e^{0}} = \frac{1}{2} (same goes for p(-1|x)). Also remember that a point x is on the boundary if and only if p(1|x) = p(-1|x) = \frac{1}{2}.
    • Robust machine learning.
    • Hinge Lost. Solver.com: “A non-convex optimization problem is any problem where the objective or any of the constraints are non-convex, as pictured below. Such a problem may have multiple feasible regions and multiple locally optimal points within each region. It can take time exponential in the number of variables and constraints to determine that a non-convex problem is infeasible, that the objective function is unbounded, or that an optimal solution is the “global optimum” across all feasible regions.” Bookmark: NIPS 2015 workshop on non-convex optimization, ResearchGate.


Leave a Reply

Your email address will not be published. Required fields are marked *