CS 531/ECE 517 Computer HW 3

Due (Thursday) April 12

Consider the MNIST Database of handwritten digits. There are a LARGE number of samples in both the trainging and test datasets. You might consider writing your classifier using a small subset of each, untill you are sure it works. If you do use a subset for the assignment, be sure to report what you did. Also notice that each sample is an image of a handwritten character, 28x28 pixels. This is a single point in your feature space. How many dimensions does the feature space have?

1) Use the Nearest neighbor rule to compute the classification error for your test patterns.
      a) What is the error rate?
      b) What was the run-time for this classifier?


2) Use a Parzen Window (gaussian phi) to compute the classification error for your test patterns.
      a) Identify the ideal radius. What is error rate for each radius?
      b) Compare these results to nearest neighbor.
      c) What is the run-time for this classifier?


3) Extra Credit 1. Use K-Nearest Neighbors rule to compute the error for your test patters. Compare these results to 1 and 2.
      a) What is the error rate?
      b) What is the run-time?


4) Extra Credit 2. Come up with a strategy for eliminating samples in the training set, while manintaining a decent error rate.
      a) How many samples do you use for each classifier (Nearest, Parzen, K-Nearest)?
      b) What is the error rate for each?
      c) What are the new run-times?