It took me quite a lot of time to understand the question, though. I do not post the test here as I am not sure if I am permitted to do that. This post serves for my personal use. Even though the original questions are written in Vietnamese, I will be writing my solutions in English as I am not used to solve the problems in Vietnamese.

PHẦN I

  1. a. Let X be a discrete random variable with n possible values x_{1}, ... , x_{n}, we have:  H(X) = -\sum_{i=1}^{n}{P(X = x_{i})\log_{b}{P(X = x_{i})}}. Remember that \log_{b}{1} = 0.
  2. f. Venn diagram for 3 sets A, B and C: (image source: Science HQ).

        \[ \begin{array}{l}\left\{\begin{array}{lll} P(A) = 0.28 & P(B) = 0.29 & P(C) = 0.19 \\ P(A \cap B) = 0.14 & P(B \cap C) = 0.12 & P(A \cap C) = 0.1\\ P(A \cap B \cap C) = 0.08 & & \end{array}\right.\\ \Rightarrow P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(B \cap C) - P(A \cap C) + P(A \cap B \cap C) = 0.48\\ \Rightarrow P(\bar{A} \cap \bar{B} \cap \bar{C}) = 1 - P(A \cup B \cup C) = 0.52 \end{array} \]

  3. e. This question is a little tricky:
    1.  \sin(x) \leq 0.5 \Rightarrow x \in \left[ -\frac{7\pi}{6} + 2k\pi, \frac{\pi}{6} + 2k\pi \right ]. How are you going to express this solution in linear constraints?
    2. Similar to a.
    3. We only have y \neq 0 so we cannot nerf x, so no linear constraints. This question and question d can be clever traps in this type of question. Luckily, they are not traps in this test.
    4. Similar to c.
    5.  \frac{x}{y} \leq 10, y < 0 \Leftrightarrow x - 10y \leq 0, y < 0. A note here is that the constraint  y < 0 plays a role to legitimate  \frac{x}{y} \leq 10 \Leftrightarrow x \leq 10y.
    6. Well… no linear constraints?
  4. e. Again, another tricky question:
    1. No feasible solutions: Constraint:  x < 0, x > 0.
    2. No optimal solutions provided feasible solutions: Unbounded objective, e.g. Minimize: x, Subject to:  x < 0.
    3. Optimal solutions are feasible solutions: Minimize x, Subject to: x >= 0.
    4. “chỉ có nghiệm tối ưu”? Or should it be “chỉ có một nghiệm tối ưu”? If it is the latter then: Minimize x, Subject to: x >= 0.
    5. If the constraints are  x <= 0, x >= 0 then yes we have only one feasible solutions. But if we want only two feasible solutions, e.g. a and b and a > b. We cannot have two equality constraints like this x = a, x = b. If we involve inequalities like b \leq x \leq a then how many x‘s are there in the interval [b; a]? It is infinite.
    6. Infinite feasible solutions: Constraint: x > 0.
  5. IMHO, the answer should be e.
    1. Personally, I tend to prefer logistic regression over decision tree. However, as Machine Learning involves diverse approaches and Decision Tree is a name can be used to refer to a wide range of approaches, the answer remains indefinite. Reference: CrossValidated.
    2. To my knowledge, it is incorrect in mathematical view and in practice, the phenomenon should be questioned futher.

PHẦN II

  1. I cannot understand what does it mean by “Hãy tính giá trị Entropy của tập huấn luyện trên theo phân lớp dương”? What? If we only consider the positive cases, there is no need for us to calculate the provided formula as they will be all zero! I will ignore the part “theo phân lớp dương” then.
    •  E(a_{1}) = -\frac{4}{9}\left ( \frac{3}{4}\log_{2}{\frac{3}{4}} + \frac{1}{4}\log_{2}{\frac{1}{4}}\right ) - \frac{5}{9}\left ( \frac{1}{5}\log_{2}{\frac{1}{5}} + \frac{4}{5}\log_{2}{\frac{4}{5}}\right ) \approx 0.761639.
    •  E(a_{2}) = -\frac{5}{9}\left ( \frac{2}{5}\log_{2}{\frac{2}{5}} + \frac{3}{5}\log_{2}{\frac{3}{5}}\right ) - \frac{4}{9}\left ( \frac{2}{4}\log_{2}{\frac{2}{4}} + \frac{2}{4}\log_{2}{\frac{2}{4}}\right ) \approx 0.983861.
    • E(a_{3}) = -\frac{2}{9}\left ( \frac{1}{2}\log_{2}{\frac{1}{2}} + \frac{1}{2}\log_{2}{\frac{1}{2}}\right ) = \frac{2}{9}. It is easy to see that many component entropies of this entropy equal to zero due to the fact that \log_{2}{1} = 0.
    •  E(D) = -\left ( \frac{4}{9}\log_{2}{\frac{4}{9}} + \frac{5}{9}\log_{2}{\frac{5}{9}}\right ) \approx 0.991076.
  2. Information gain = Entropy before splitting – Entropy after splitting.
    •  \text{Gain}(a_{1}) = E(D) - E(a_{1}) \approx 0.991076 - 0.761639 = 0.229437.
    •  \text{Gain}(a_{2}) = E(D) - E(a_{2}) \approx 0.991076 - 0.983861 = 0.007215.
  3. I’m not sure about the term “Cost” and “Bias” mentioned in this question. However, I will assume that the “Cost” is the one in “computational cost” and the “Bias” refers to the distance between the final hypothesis and the target function. My answer: C, D, A, B.
  4. I do not know how to address this question.
  5. I am not sure if I understand the question correctly. However, if I do, then:

        \[ \begin{array}{ll} &\left\{\begin{array}{l} \mathbb{P}(\text{doping}) = 0.1\\ \mathbb{P}(\text{positive} | \text{doping}) = 0.9\\ \mathbb{P}(\text{negative} | \text{no doping}) = 0.9\\ \end{array}\right.\\ \Rightarrow & \left\{\begin{array}{l} \mathbb{P}(\text{positive} \cap \text{doping}) = \mathbb{P}(\text{positive} | \text{doping})\mathbb{P}(\text{doping}) = 0.09\\ \mathbb{P}(\text{positive} | \text{no doping}) = 1 - \mathbb{P}(\text{negative} | \text{no doping}) = 0.1 \end{array}\right.\\ \Rightarrow & \left\{\begin{array}{l} \mathbb{P}(\text{positive} \cap \text{doping}) = 0.09\\ \mathbb{P}(\text{positive} \cap \text{no doping}) = \mathbb{P}(\text{positive} | \text{no doping})\mathbb{P}(\text{no doping}) = 0.09 \end{array}\right.\\ \Rightarrow & \mathbb{P}(\text{doping} | \text{positive}) = \frac{\mathbb{P}(\text{positive} \cap \text{doping})}{\mathbb{P}(\text{positive} \cap \text{doping}) + \mathbb{P}(\text{positive} \cap \text{no doping}) } = \frac{1}{2} \end{array} \]

PHẦN III

  1. I cannot understand what is the role of x in this question?
    1. Because \Sigma is a covariance matrix, it is inherently symmetric. We have:

          \[ \begin{array}{ll} \begin{array}{ll} \text{maximize}_{\alpha_{k} \in \mathbb{R}^{p}} & \alpha_{k}^{T} \Sigma \alpha_{k}\\ \text{subject to} & \alpha_{k}^{T}\alpha_{k} = 1 \end{array} & \Rightarrow \left\{\begin{matrix} \alpha_{k}^{T}\alpha_{k} = 1 \\ \bigtriangledown_{\alpha_{k}}{\alpha_{k}^{T} \Sigma \alpha_{k} = \bigtriangledown_{\alpha_{k}}\lambda\alpha_{k}^{T}\alpha_{k}} \end{matrix}\right.\\ & \Rightarrow \left\{\begin{matrix} \alpha_{k}^{T}\alpha_{k} = 1 \\ (\Sigma + \Sigma^{T})\alpha_{k} = 2\lambda\alpha_{k} \end{matrix}\right.\\ & \Rightarrow \left\{\begin{matrix} \alpha_{k}^{T}\alpha_{k} = 1 \\ 2\Sigma\alpha_{k} = 2\lambda\alpha_{k} \end{matrix}\right.\\ & \Rightarrow \left\{\begin{matrix} \alpha_{k}^{T}\alpha_{k} = 1 \\ \Sigma\alpha_{k} = \lambda\alpha_{k} \end{matrix}\right. \end{array} \]

      So,  \alpha_{k} must be an unit eigenvector of \Sigma for \alpha_{k}^{T} \Sigma \alpha_{k} to reach a constrained extremum. It is easy to see that we need to choose the unit eigenvector \alpha_{k} = \alpha_{1} with largest eigenvalue to maximize \alpha_{k}^{T} \Sigma \alpha_{k}.

    2. The same goes for \alpha_{2}. A note here is that as \Sigma is symmetric, it is diagonalizable, hence it has p linearly independent eigenvectors.
  2. I cannot the understand why the test asks that. However:
    1.     \[ \mathbb{P}(K = 1 | a = 1, b = 1, c = 0) \propto \mathbb{P}(K = 1)\mathbb{P}(a = 1 | K = 1)\mathbb{P}(b = 1 | K = 1)\mathbb{P}(c = 0 | K = 1) = \frac{2}{4}\frac{1}{4}\frac{2}{4}\frac{4}{8} = \frac{1}{32} \]

      .

    2.     \[ \mathbb{P}(K = 0 | a = 1, b = 1, c = 0) \propto \mathbb{P}(K = 1)\mathbb{P}(a = 1 | K = 0)\mathbb{P}(b = 1 | K = 0)\mathbb{P}(c = 0 | K = 1) = \frac{2}{4}\frac{3}{4} = \frac{3}{8} \]

      .


Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *