Homework 1

Due: Thurs, Feb 3, 2011, start of class.

All students:

  1. Devise a decision tree learning algorithm that supports backtracking. Specifically, provide:
    1. Pseudocode for your proposed algorithm.
    2. The criterion that your algorithm uses to decide when to backtrack, including your justification for this choice.
    3. An analysis of the pros/cons (advantages and disadvantages) of your algorithm.
Students enrolled in the 529 section also do the following:
  1. Show that entropy gain is concave (i.e., anti-convex).
  2. Show that a binary, categorical decision tree, using information gain as a splitting criterion, always increases purity. That is, information gain is non-negative for all possible splits, and is 0 only when the split leaves the data distribution unchanged in both leaves. You may assume that all attributes (features) are binary, but the class variable may be an arbitrary categorical. (That is, the class can be an arbitrary integer on the range 1...k, for some finite k.)
  3. Prove that the basic decision tree learning algorithm (i.e., the greedy, recursive, tree-growing algorithm from class, with no early stopping or pruning), using the information gain splitting criterion, halts.