Todai Entrance Exam: Subject 2011 – Problem 3

(1)

The \text{num} variable is not resetted to 0 when we start counting a new \text{word}. Add \text{num} = 0 right after the line \text{output_pair}(\text{word}, \text{num}); to correct the code.

(2)

merge(L1, L2) {

  • for (word1, freq1) in L1 {
    • for (word2, freq2) in L2 {
      • if (word1 is word2) {
        • output_pair(word1, freq1 + freq2)
        • break
      • }
    • }
  • }

}

(3) We merge list L1 and L2 into L12, and then we merge L12 and L3 into L123,… We keep up that process untill all N lists are merged into one final list.

(4)

map(word):

  • sum = 0
  • for each character of word:
    • sum = sum + ASCII(character)
  • return sum % N

Here ASCII is the function convert a character to its corresponding ASCII value.

(5)

What if we assume that words with less length likely to have a higher frequency?

What if we do the hash function for each character instead of the sum of the word?


Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *