November 22, 2013
Network Analysis and Modeling (CSCI 5352)
This semester I developed and taught a new graduate-level course on network science, titled Network Analysis and Modeling (listed as CSCI 5352 here at Colorado). As usual, I lectured from my own set of notes, which total more than 150 pages of material. The class was aimed at graduate students, so the pace was relatively fast and I assumed they could write code, understood basic college-level math, and knew basic graph algorithms. I did not assume they had any familiarity with networks.
To some degree, the course followed Mark Newman's excellent textbook Networks: An Introduction, but I took a more data-centric perspective and covered a number of additional topics. A complete listing of the lecture notes are below, with links to the PDFs.
I also developed six problem sets, which, happily, the students tell me where challenging. Many of the problems were also drawn from Newman's textbook, although often with tweaks to make the more suitable for this particular class. It was a fun class to teach, and overall I'm happy with how it went. The students were enthusiastic throughout the semester and engaged well with the material. I'm looking forward to teaching it again next year, and using that time to work out some of the kinks and add several important topics I didn't cover this time.
Lecture 1,2 : An introduction and overview, including representing network data, terminology, types of networks (pdf)
Lecture 3 : Structurally "important" vertices, degree-based centralities, including several flavors of eigenvector centrality (pdf)
Lecture 4 : Geometric centralities, including closeness, harmonic and betweenness centrality (pdf)
Lecture 5 : Assortative mixing of vertex attributes, transitivity and clustering coefficients, and reciprocity (pdf)
Lecture 6 : Degree distributions, and power-law distributions in particular (pdf)
Lecture 7 : Degree distributions, and fitting models, like the power law, to data via maximum likelihood (pdf)
Lecture 8 : How social networks differ from biological and technological networks, small worlds and short paths (pdf)
Lecture 9 : Navigability in networks (discoverable short paths) and link-length distributions (pdf)
Lecture 10 : Probabilistic models of networks and the Erdos-Renyi random graph model in particular (pdf)
Lecture 11 : Random graphs with arbitrary degree distributions (the configuration model), their properties and construction (pdf)
Lecture 12 : Configuration model properties and its use as a null model in network analysis (pdf)
Lecture 13 : The preferential attachment mechanism in growing networks and Price's model of citation networks (pdf)
Lecture 14 : Vertex copying models (e.g., for biological networks) and their relation to Price's model (pdf)
Lecture 15 : Large-scale patterns in networks, community structure, and modularity maximization to find them (pdf)
Lecture 16 : Generative models of networks and the stochastic block model in particular (pdf)
Lecture 17 : Network inference and fitting the stochastic block model to data (pdf)
Lecture 18 : Using Markov chain Monte Carlo (MCMC) in network inference, and sampling models versus optimizing them (pdf)
Lecture 19 : Hierarchical block models, generating structurally similar networks, and predicting missing links (pdf)
Lecture 20 : Adding time to network analysis, illustrated via a dynamic proximity network (pdf)
Problem set 1 : Checking the basics and a few centrality calculations (pdf)
Problem set 2 : Assortativity, degree distributions, more centralities (pdf)
Problem set 3 : Properties and variations of random graphs, and a problem with thresholds (pdf)
Problem set 4 : Using the configuration model, and edge-swapping for randomization (pdf)
Problem set 5 : Growing networks through preferential or uniform attachment, and maximizing modularity (pdf)
Problem set 6 : Free and constrained inference with the stochastic block model (pdf)
January 10, 2013
Design and Analysis of Algorithms (CSCI 5454)
Returning from paternity leave last semester, it is back into the fire this semester with my large graduate-level algorithms class, Design and Analysis of Algorithms (CSCI 5454). This year, I am again shocked to find that it is the largest grad class in my department, with 60 enrolled students and 30 students on the waiting list. In the past, enrollment churn at the beginning of the semester  has been high enough that everyone on the wait list has had an opportunity to take the class. I am not sure that will be true this year. This year will also be the last time I teach it, for a while.
My version of this course covers much of the standard material in algorithms, but with a few twists. Having tinkered considerably with the material a year ago, I'm going to tinker a little less this year. That being said, improving is a never-ending process and there are a number of little things on the problem sets and lectures that I want to try differently this time. The overall goals remain the same: to show students how different strategies for algorithm design work (and sometimes don't work), to get them thinking about boundary cases and rigorous thinking, to get them to think carefully about whether an algorithm is correct or not, and to introduce them to several advanced topics and algorithms they might encounter out in industry.
For those of you interested in following along, I will again be posting my lecture notes on the class website.
 Caused in part by some students enrolling thinking the class would be easy, and then dropping it when they realize that it requires real effort. To be honest, I am okay with this, although it does make for scary enrollment numbers on the first day.
August 22, 2011
Data Analysis and Models for Complex Systems (CSCI 7000)
This semester, I am again teaching my topics course on data analysis and models for complex systems. The course content is a bit of a mash-up, intended to give students a crash course in probability theory, complex networks, model estimation, comparison and testing, random walks, a little stochastic processes, simulation techniques and lots of data. I'm continuing to require students to both give a technical lecture on a topic related to inference and models, and do a small independent project.
One goal is that, by the end of course, students will be able to take a new data set, identify its interesting structure, and then develop, simulate and test probabilistic models of that structure.
For those of you interested in following along, here's a list of lecture topics, which I'll update with links to lecture notes as they pass by.
Computers are fundamentally changing how science is done by allowing us to record enormous amounts of data on social, biological, technological and physical systems. This data glut should allow us both to address old questions about complex systems in new ways and to answer fundamentally new kinds of questions. But, our ability to do this relies on the development of new algorithms to automatically extract information about patterns in these huge data sets and to reliably test complex scientific hypotheses. Examples include understanding the evolution of the Internet (at various levels), social behavior on the Web (Facebook, Wikipedia, Twitter, etc.), the role of networks in structuring complex social behaviors and biological functions, the emergence of population-scale patterns from seemingly random individual-level behavior, and forecasting rare but important events in the future.
This graduate-level topics course will cover a selection of recent developments in computational approaches to doing science with complex systems. It is not a scientific computing course. Topics will include statistical inference, the structure of complex networks, macro-phenomena in biological evolution and in wars and terrorism, simple mathematical models, and simulation techniques for more complicated models. The focus will be on using computational tools (algorithms) to do science (work with data; test hypotheses; build understanding; make predictions).
Lecture 0: Overview, probability distributions (lecture notes)
Lecture 1: Poisson process, exponential distribution, likelihood functions (lecture notes)
Lecture 2: Power-law distributions (mathematics and properties) (lecture notes)
Lecture 3: Power-law distributions (empirical data and mechanisms) (lecture notes)
Lecture 4: Testing models (hypothesis tests and model plausibility) (lecture notes)
Lecture 5: Comparing models (marginalization, likelihood ratios, etc.) (lecture notes)
Lecture 6: Time series analysis, random walks (lecture notes)
Lecture 7: More time series analysis, random walks (lecture notes)
Lecture 8: Random walk models of macroevolution (lecture notes)
Lecture 9: More macroevolution (lecture notes)
Lecture 10: Probabilistic models of terrorism and wars
Lecture 11: More terrorism and wars
Lecture 12: Introduction to complex networks (lecture notes)
Lecture 13: More networks (lecture notes)
Lecture 14: Random graph models (lecture notes)
Lecture 15: Citation networks (lecture notes)
Lecture 16: Modular and hierarchical network structures (lecture notes)
Lecture 17: More modules and hierarchies (lecture notes)
Lecture 18: (student lecture) The bootstrap
Lecture 19: (student lecture) Duplication-mutation models
Lecture 20: (student lecture) Privacy and de-anonymizing human data
Lecture 21: (student lecture) Levy flights
Lecture 22: (student lecture) Regression models
Lecture 23: (student lecture) Metabolic networks
Lecture 24: (student lecture) Gaussian mixture models
Lecture 25: (student lecture) People are not? particles
Lecture 26+: Project presentations
April 26, 2011
Why students don't learn what we think we teach
It's that time of year when professors reflect on what their students have learned in the past semester. While mulling over my own class on the design and analysis of algorithms, by chance, I was pointed at this excellent lecture by Robert Duke, a professor of music and human learning at the University of Texas Austin, on "Why students don't learn what we think we teach" (hosted at Cornell University Videos).
Duke is captivating, and he makes a clear argument that students don't learn what we think we teach because they're too busy learning what we're actually teaching, which is, often, that precision is more important than understanding and that grades matter. The solution, he argues, is to teach, over and over, the things that we actually want our students to remember after the semester is over. And, that we should not defer learning about "The Good Stuff" until after they've suffered through boring prerequisites. Instead, we should teach the good stuff first and teach what we really enjoy.
This seems like pretty good advice, and I hope to be able to improve my courses by striving to follow it. The difficulty, of course, is trying to figure out what exactly are the most important ideas we want our students to remember after the course is over. Duke argues that many professors don't themselves have a clear idea of these things, which of course makes it difficult to teach in a way that emphasizes them. For quantitative fields like computer science, I think we sometimes convince ourselves that detailed problem sets are necessary to teach the quantitative (mathematics or programming) skills the students will need for more advanced work. But how much of what we really care about depends on those? For students destined for the computer industry, or for non-computer industries, what do we as computer-science teachers really want them to remember, even if they never write another line of code in their lives? Food for thought, for sure.
Tip to Larry Hunter.
Update 27 April 2011: While I'm on the subject , here's another fascinating video on teaching, this one a TED talk by Salman Kahn, the former hedge fund manager who started the Khan Academy by first doing short YouTube videos of simple math lessons and then expanded it to a fairly comprehensive-looking system for learn-at-your-own-pace education. Given how little time (and money) university professors and school teachers are provided to tinker with such novel course structures, it doesn't surprise me one bit that it was a hedge fund manager who developed this idea. I would very much like to introduce some of this stuff in my university courses, but the prospect of developing the videos from scratch is more than a little daunting!
 For long-time readers, you've probably noticed that I've only posted videos and comics for the past two months or so. This is because I'm a new professor and new professors as a rule, true to the warnings I was given a year ago, struggle to find time for anything other developing their courses and learning how to teach. I don't find it ironic that I was hired as a professor on the merits of my ability to do things that my job now systematically prevents me from doing. Instead, I find it maddening. Putting that aside, last week, I was mulling over writing something about the recent pieces in Nature on the brokenness of the PhD system (here and here; to be honest, the articles are not as insightful as their titles suggest), which is, I think, intimately related to the ideas posed above about teaching.
That is, universities are not really in the business of providing job-training for the private sector, but increasingly that is what both the private sector and the students want universities to do. On top of that, university professors generally have very little experience outside the academy and we're hired mainly on the merits of our ability to conduct and communicate novel research to our peers, who are mainly other university professors and program managers at federal agencies. It's a strange situation, frustrating on all sides, perhaps. The strangeness is, I think, driven by pretty deep societal structures related to the priorities of the people with money (which is basically students and the federal government) and to the way employers select among applicants (academic degrees are signals of legitimacy, competency and prestige, which are easier to sort on than actually taking the time to evaluate a person individually). These things are so deep they're not going to change much anytime soon, and from my brief experience so far, seem likely to only get more pronounced. The question then, is into what shape will that pressure push universities, over the long run?
February 05, 2011
Design and Analysis of Algorithms (CSCI 5454)
This semester, I'm teaching a new course (for me) on the design, analysis and evaluation of computer algorithms. This is a graduate-level course and covers a lot of standard material in algorithms. My hope, however, is to also show the students how different strategies for algorithm design work (and sometimes don't work) and to introduce them to a number of more advanced topics and algorithms they might encounter out in industry.
For those of you interested in following along, here's a list of my lecture notes and the topics I've covered. I'll update this entry with the rest of the lectures, once they're up on the course webpage.
CSCI 5454 Design and Analysis of Algorithms
Algorithms are the heart of computer science, and their essential nature is to automate some aspect of the collecting, organizing and processing of information. Today, information of all kinds is increasingly available in enormous quantities. However, our ability to make sense of all this information, to manage, organize and search it, and to use it for practical purposes, e.g., self-driving cars, adaptive computation, search algorithms for the Internet or for social networks, artificial intelligence, and many scientific applications, relies on the design of efficient algorithms, that is, algorithms that are fast, use little memory and provide guarantees on their performance.
This graduate-level course will cover a selection of topics related to algorithm design and analysis. Topics will include divide and conquer algorithms, greedy algorithms, graph algorithms, algorithms for social networks, computational biology, optimization algorithms, randomized data structures and their analysis. We will not cover any of these topics exhaustively. Rather, the focus will be on algorithmic thinking, efficient solutions to practical problems and understanding algorithm performance. Advanced topics will cover a selection of modern algorithms, many of which come from real-world applications.
Lecture 0: Why algorithms, and an example
Lecture 1: The divide and conquer strategy, quicksort and recurrence relations
Lecture 2: Randomized quicksort and probabilistic analysis
Lecture 3: Randomized data structures and skip lists
Lecture 4: Randomized data structures, hash tables, the Birthday Paradox and the Coupon Collector problem
Lecture 5: The greedy strategy, linear storage media and Huffman codes
Lecture 6: Graphs and networks, graph algorithms, search-trees
Lecture 7: Single-source shortest path algorithms and Google Maps
Lecture 8: Minimum spanning trees
Lecture 9: Graph clustering and modularity maximization
Lecture 10: Maximum flows and minimum cuts
Lecture 11: Phylogenetic trees and parsimony
Lecture 12: Averaging trees
Lecture 13: Optimization, simulated annealing
Lecture 14: Stable matchings (student lecture)
Lecture 15: Disjoint sets and union find (student lecture)
Lecture 16: Sequence alignment (student lecture)
Lecture 17: Byzantine agreement (student lecture)
Lecture 18: Linear programming and the simplex algorithm (student lecture)
Lecture 19: Expectation-maximization (student lecture)
Lecture 20: Cake cutting and fair division (student lecture)
Lecture 21: Approximation algorithms (student lecture)
Lecture 22: Link prediction in social networks (student lecture)
Lecture 23: Sudoku puzzles and backtracking algorithms (student lecture)
September 24, 2010
Learning how to teach
This semester has turned out to be a lot busier than I expected. Teaching, omg, takes a lot of time. At least, I've been putting a lot of time into it, developing the lectures, writing out my notes in a coherent format, developing the problem sets, refreshing my own understanding of technical topics, translating research-y material into a more instructional level, grading problem sets, meeting with students, etc. etc. etc.
This is, after all, what I signed up for as a professor. I'm also happy to say that I'm enjoying it a great deal. The students are engaging, ask interesting questions, and have already taught me new things. I do wish I had more time for research, but I suppose that almost goes without saying. One topic I expect to continue to struggle with is deciding how much material to cover and how deeply to go into it. Striking a good balance, making assignments and lectures challenging and interesting but not unreasonable or trivial, seems like an art. A good friend of mine warned me months ago that I should not underestimate how enormous a burden it would be to be completely and wholly responsible for an entire semester's material. I don't think I did underestimate it, but for sure I didn't understand, viscerally, what he meant until now.
For those of you interested in the course, here's a current list of my lecture notes and the topics I've covered. I'll update this entry with the rest of the lectures, once they're up on the course webpage.
CSCI 7000-003 Inference, Models and Simulation for Complex Systems
Lecture 1: The Poisson process, the exponential distribution and a brief introduction to maximum likelihood
Lecture 2: Mathematics of power-law distributions and power-law tails
Lecture 3: Power-law distributions in empirical data
Lecture 4: Model plausibility, hypothesis tests, model comparison and a fallacy
Lecture 5: Models of time series and simulations of random walks
Lecture 6: Random walks in empirical data
Lecture 7: Structural measures of networks: degrees, reciprocity, transitivity, similarity
Lecture 8: Structural measures of networks: distances, diameters, centrality, homophily
Lecture 9: Random graph models, degree distributions, giant components
Lecture 10: Preferential attachment, citation networks
Lecture 11: Large-scale structure, modules, communities and three ill omens
Lecture 12: Hierarchical structure, predicting missing links
Lecture 13: Macroevolution, deep time, and the evolution of species body sizes
Lecture 14: Macroevolution of whales, morphological disparity
August 22, 2010
Fall 2010, CSCI 7000-003
This Fall, as is customary for new faculty in my department, I'm teaching a section of Current Topics in Computer Science. This series is basically the CS department's vehicle for advanced topics at the graduate level. I'll be posting my lecture notes, problem sets and class readings on the class webpage, if anyone here is interested in that material. Here's a short ad for the course:
CSCI 7000-003 Inference, Models and Simulation for Complex Systems
First meeting is in MUEN E114 on Tuesday, August 24, 11:00am.
This graduate-level topics course will cover a selection of recent developments in computational approaches to doing science with complex systems. It is not a scientific computing course. Topics will include statistical inference, the structure of complex networks, macro-phenomena in biological evolution and in wars and terrorism, simple mathematical models, and simulation techniques for more complicated models. The focus will be on using computational tools (algorithms) to do science (work with data; test hypotheses; build understanding; make predictions). The first half of the class will be driven by lectures and problem sets. The second half will revolve around student-run lectures and an independent research project.
I should add that this will be a rather biased (and in no way even remotely complete) selection of "recent developments", with the bias going heavily in the direction of stuff I like and find useful. Much of the material will be drawn from my own research, although at least in the beginning I'm going to pare some things down to a more introductory level. The second half of the course will be an adventure, with the student-led lectures on topics of (largely) their choosing.