Projects

Introduction to Machine Learning

Course Project Information

This course will include some small mid-term class projects involving reading from the current literature and implementation of some standard algorithms. The primary project effort, however, will be focused on a final project of non-negligible content culminating in a written report and a public presentation. As this is a dual level course, my expectations for the final project will depend on the student's status, but the hope is that in any case the project will stretch imaginations without rupturing anything important.

The specific topic of the project is open to the student, modulo my guidelines for content and substance and some feedback on feasibility. I hope that students will take this opportunity to explore topics that interest them, and will introduce us all to new domains and ideas. While I'm open to a variety of subjects, here are some suggestions for possible topics. A number of these relate to my own research areas, so I'll be happy to discuss these at more length beyond the course if there is interest in pursuing them longer term.

Undergraduate Credit

For undergraduates, the purpose of the final project is to explore existing topics more deeply than is possible in a single course. In general, two classes of projects are available here: implementation or topic survey.

Implementation Projects
The student will develop a learning program of reasonable scope and characterize its performance on a variety of real benchmark or plausible simulated data sets or environments. Possible topics include, but are not limited to:
  • Reinforcement learning agent in simulated world (e.g., simulated mazes, simulated ecologies, or, for the very inspired, a 3-d game such as Quake III).
  • Learning to play deterministic boardgames such as checkers, N-dimensional tic-tac-toe, or connect-4.
  • Learning to play non-deterministic games such as poker, craps, hearts, or bridge.
  • Bag of words/LSA analysis of web pages or other documents.
  • Linear and nonlinear (e.g., neural net) prediction of stock market trends.
  • Performance comparison of two or more algorithms (e.g., Support Vector Machine vs Naive Bayes vs Decision tree classification of standard data).
  • Examination of financial data with time series models (linear predictors, Markov chains, HMMs, etc.).
Topic Survey Projects
The student will read a collection of current papers on a topic of interest and report on the current state of the art. The goal here is to branch out to a topic not covered in class or to examine a class topic in more depth and to present some aspect of the topic to the class. Some possibilities here include:

Graduate Credit

I expect those seeking graduate credit in this course to take a step beyond the undergraduate project level and introduce some element of novelty into their project. I've listed below a number of projects that I think would be interesting to explore. Some of these are fairly open ended, but rest assured --- I'm not expecting a full dissertation in one semester! ;-) I'm mostly interested in a dedicated effort and a demonstration of innovative thinking --- mostly, I hope that students will become engaged and excited about some problem and will make an effort to take it a step further.

Topic Survey Projects
The student will survey two or more subject areas in an attempt to synthesize different ideas into a unified framework, or at least to understand the relationship between the two better and point the way toward a common way of thinking about them. This can be two topics from within machine learning or between an application domain and a branch of ML.
  • Relations between statistical mechanics and stochastic planning.
  • Relations between genetic algorithms and MDP sampling techniques.
Empirical Projects
The student will either
  1. Compare and analyze the performance of two or more different learning algorithms on the same test data sets (algorithms that have not previously been directly empirically compared). The goal is to understand some properties and strengths/weaknesses of the two.
  2. Develop an extension to an existing algorithm or data structure.
  3. Apply or adapt an existing algorithm to a novel class of data (some domain problem that hasn't been widely discussed in the literature).
  4. Develop a novel learning method (harder than it might appear).
Topics here are very much subject to student interest and motivation --- I hope that the students will bring their own experiences in other fields into the class and motivate interesting applications of ML techniques. A couple of possibilities in this area might be:
  • Learning to play, analyze, or compose music.
  • Data formats for exchanging reinforcement learning and stochastic planning domains (MDP/POMDP data structs).
  • Learning to improve search strategies in optimization problems.
  • Feature subset selection or kernel methods for reinforcement learning.
  • Application of probabilistic rule analysis methods to biological domains.
Theoretical Projects
For the mathematically inclined, there are a number of open questions of varying levels of complexity in basic or theoretical machine learning. These projects are probably more difficult to resolve in a single semester than are empirical projects, so a thorough analysis and formulation of the problem would suffice for a semester project. If you're interested in a project in this category, please talk to me directly.