Many binary codes map characters to fixed-length strings of 1's and 0's. ASCII (American Standard for Information Interchange) is a binary code that maps each symbol to a sting of seven bits (1's and 0's). For example, the symbol '(' is mapped to the string 0101000 and the symbol 'l' is mapped to the string 1101100.
A Huffman code is a variable-length code that uses frequency counts to minimize the expected code length for encoding messages. As an example, suppose that we need to encode messages constructed from the symbols, A, B, C, and D. Further, suppose that 75% of the symbols in our messages are A, 10% are the symbol B, 10% are the symbol C, and the remaining 5% are the symbol D.
Using a fixed-length code, each symbol would be encoded using two bits. So, for example, the message BAAABCAACD would be encoded using 20 bits.
However, if we map the symbol A to the string 1, the symbol B to the string 01, the symbol C to the string 001, and the symbol D to the string 000, the message BAAABCAACD is encoded in 18 bits (011110100111001000).
Saving two bits may not seem too significant; however, if we encode a message of 100 symbols, the savings becomes more apparent. From our frequencies, we can expect that 75 of the symbols will be A's requiring one bit (1), 10 will be B's requiring two bits (01), 10 will be C's requiring three bits (001) and 5 will be D's requiring three bits (000). As such, the total length will be 75 + 20 + 30 + 15 = 140 bits which is quite a bit smaller than the 200 bits that would be required for a fixed-length code.
To construct a Huffman tree for a set of symbols, you start by constructing a trivial tree for each symbol in the alphabet. Along with each tree, you need to keep track of its frequency. The frequency of the initial trees is the frequency of the symbol contained in the tree. On each step of the construction, you select two trees with the minimum frequencies and merge them to form a new tree. The trees are merged by making one of them the left child and the other the right child. The frequency of the new tree is the sum of the frequencies of the two trees used to construct the new tree.
The output should consist of a line for each symbol in the input alphabet. Each line should have the symbol, followed by a single space, followed by the string that the symbol is mapped to.