August 22, 2011
Data Analysis and Models for Complex Systems (CSCI 7000)
This semester, I am again teaching my topics course on data analysis and models for complex systems. The course content is a bit of a mash-up, intended to give students a crash course in probability theory, complex networks, model estimation, comparison and testing, random walks, a little stochastic processes, simulation techniques and lots of data. I'm continuing to require students to both give a technical lecture on a topic related to inference and models, and do a small independent project.
One goal is that, by the end of course, students will be able to take a new data set, identify its interesting structure, and then develop, simulate and test probabilistic models of that structure.
For those of you interested in following along, here's a list of lecture topics, which I'll update with links to lecture notes as they pass by.
Computers are fundamentally changing how science is done by allowing us to record enormous amounts of data on social, biological, technological and physical systems. This data glut should allow us both to address old questions about complex systems in new ways and to answer fundamentally new kinds of questions. But, our ability to do this relies on the development of new algorithms to automatically extract information about patterns in these huge data sets and to reliably test complex scientific hypotheses. Examples include understanding the evolution of the Internet (at various levels), social behavior on the Web (Facebook, Wikipedia, Twitter, etc.), the role of networks in structuring complex social behaviors and biological functions, the emergence of population-scale patterns from seemingly random individual-level behavior, and forecasting rare but important events in the future.
This graduate-level topics course will cover a selection of recent developments in computational approaches to doing science with complex systems. It is not a scientific computing course. Topics will include statistical inference, the structure of complex networks, macro-phenomena in biological evolution and in wars and terrorism, simple mathematical models, and simulation techniques for more complicated models. The focus will be on using computational tools (algorithms) to do science (work with data; test hypotheses; build understanding; make predictions).
Lecture 0: Overview, probability distributions (lecture notes)
Lecture 1: Poisson process, exponential distribution, likelihood functions (lecture notes)
Lecture 2: Power-law distributions (mathematics and properties) (lecture notes)
Lecture 3: Power-law distributions (empirical data and mechanisms) (lecture notes)
Lecture 4: Testing models (hypothesis tests and model plausibility) (lecture notes)
Lecture 5: Comparing models (marginalization, likelihood ratios, etc.) (lecture notes)
Lecture 6: Time series analysis, random walks (lecture notes)
Lecture 7: More time series analysis, random walks (lecture notes)
Lecture 8: Random walk models of macroevolution (lecture notes)
Lecture 9: More macroevolution (lecture notes)
Lecture 10: Probabilistic models of terrorism and wars
Lecture 11: More terrorism and wars
Lecture 12: Introduction to complex networks (lecture notes)
Lecture 13: More networks (lecture notes)
Lecture 14: Random graph models (lecture notes)
Lecture 15: Citation networks (lecture notes)
Lecture 16: Modular and hierarchical network structures (lecture notes)
Lecture 17: More modules and hierarchies (lecture notes)
Lecture 18: (student lecture) The bootstrap
Lecture 19: (student lecture) Duplication-mutation models
Lecture 20: (student lecture) Privacy and de-anonymizing human data
Lecture 21: (student lecture) Levy flights
Lecture 22: (student lecture) Regression models
Lecture 23: (student lecture) Metabolic networks
Lecture 24: (student lecture) Gaussian mixture models
Lecture 25: (student lecture) People are not? particles
Lecture 26+: Project presentations