Homework 2
Due: Tues, Feb 21, 2012, start of class.turnin key: cs429-529.hw2
All students:
- Bishop problem 14.6
- For a fixed dimension, d, generate two d-dimensional "ideal data points", X and Y, one at the origin and one at (1,0,0,...0). The true Euclidean distance between these is clearly 1. Now corrupt each of them by adding to it a random vector drawn from N(0,σ2 I). (That is, add a normally distributed, independent random variable with standard deviation σ to each component of X and a separate RV draw to each component of Y.) Call the resulting points X' and Y'. While the distance between X and Y is fixed and is indepedent of d, the distance between X' and Y' is a random variable. For d=1,... ,1000 and σ in { 0.01, 0.1, 0.2, 0.5, 1 } plot the mean distance between X' and Y'. (Because dist(X',Y') is a random variable, this will require generating each, say, 20 times, and averaging over the set.) What does this say about the effects of noise on k-nearest neighbors and other metric-based learning algorithms in high-dimensional spaces?
- Write a k-NN learner with an interchangeable metric
function. (E.g., in an object-oriented language, you can use
overloading and inheritance to get this effect, while in Matlab,
you can use function pointers -- see help
function_handle and help feval.) Pick three data
sets, of different dimension d, from the UCI machine learning data
set repository and apply your learner with at least three
different metrics to these data sets. Can you detect any trends
in terms of d, k, or the choice of metric?
Extra credit Use a KD-tree data structure to accelerate finding nearest neighbors.
- Extend your decision tree learner from HW1 to handle weighted data.
- Write an AdaBoost booster for an arbitrary learner. Apply it to (at least) your decision tree learner and your K-NN learner. Use both to learn the synthetic data sets from HW1 and TBA. Examine the behavior of your booster as a function of your smoothness parameters (λ and k, respectively) and the boosting depth parameter (Tmax). Do you detect any trends or consistent patterns?
- Given d-vectors X and Y, we know that
d2(X,Y) = ((X-Y)T(X-Y))1/2
is a metric. Given a real, square matrix W, show that
dW(X,Y) = ((X-Y)TW (X-Y))1/2
is also a metric, for some classes of matrix W, and describe the necessary and sufficient conditions on W for dW() to remain a metric.
- There is an intuition that, in high dimensional spaces, the volume of a hypercube is concentrated in its "corners." Quantify this by deriving the ratio of the volume of a hypersphere of radius r in d-dimensional Euclidean space to the volume of its circumscribed hypercube (that is, the hypercube of edge length 2r, whose faces are all tangent to the hypersphere). What is the limit of this ratio as d→ ∞? What does this imply about the difference between using d2() (Euclidean distance) versus d∞() (max-norm distance) in k-nearest neighbors?
- How many vertices does a sphere in d1() space have? What about a sphere in d∞() space?
