« September 2006 | Main | November 2006 »

October 31, 2006

Future of computing

The New York Times has a short piece covering a recent symposium run by the National Academies on the future of computing. And, naturally enough, the intertwining of social networks and technological networks (e.g., YouTube and MySpace) was a prominent topic. Jon Kleinberg represented our community at the symposium. From the article:

But with the rise of the Internet, social networks and technology networks are becoming inextricably linked, so that behavior in social networks can be tracked on a scale never before possible.

“We’re really witnessing a revolution in measurement,” Dr. Kleinberg said.

The new social-and-technology networks that can be studied include e-mail patterns, buying recommendations on commercial Web sites like Amazon, messages and postings on community sites like MySpace and Facebook, and the diffusion of news, opinions, fads, urban myths, products and services over the Internet. Why do some online communities thrive, while others decline and perish? What forces or characteristics determine success? Can they be captured in a computing algorithm?

Isn't it nice to see your research topics in print in a major news paper? To me, the two exciting things about this area are the sheer number of interesting questions to study, and the potential for their answers to qualitatively improve our lives. (Although, being more of a theoretician than an engineer, I suppose I leave the latter for other people.) For the Web in particular, new ideas like YouTube and del.icio.us have made it possible to study social behavior in ways never before possible. Physicists and computer scientists deserve some credit for recognizing that these sources of data offer a fundamentally new perspective on questions that sociologists have been kicking around for several decades now.

As is true perhaps with any relatively new field, there still a lot of debate and jostling about how to do good science here. But mostly, that just adds to the excitement. There's a lot of opportunity to say important and interesting things about these systems, and, to develop new and useful applications on top of them.

Update Nov 2: In a separate piece, the NYTimes discusses Tim Berners-Lee's efforts to found a "Web science" field that combines aspects of Computer Science with Sociology and Business. Sounds familiar, no? Here's the final thought from the article:

Ben Shneiderman, a professor at the University of Maryland, said Web science was a promising idea. “Computer science is at a turning point, and it has to go beyond algorithms and understand the social dynamics of issues like trust, responsibility, empathy and privacy in this vast networked space,” Professor Shneiderman said. “The technologists and companies that understand those issues will be far more likely to succeed in expanding their markets and enlarging their audiences.”

(Tip to C. Moore)

posted October 31, 2006 05:07 PM in Computer Science | permalink | Comments (0)

October 11, 2006

Hierarchy in networks

After several months of silence on it, I've finally posted a new paper (actually written more than 5 months ago!) on the arxiv about the hierarchical decomposition of network structure. I presented it at the 23rd International Conference on Machine Learning (ICML) Workshop on Social Network Analysis in June.

Aaron Clauset, Cristopher Moore, M. E. J. Newman, "Structural Inference of Hierarchies in Networks", to appear in Lecture Notes in Computer Science (Springer-Verlag). physics/0610051

One property of networks that has received comparatively little attention is hierarchy, i.e., the property of having vertices that cluster together in groups, which then join to form groups of groups, and so forth, up through all levels of organization in the network. Here, we give a precise definition of hierarchical structure, give a generic model for generating arbitrary hierarchical structure in a random graph, and describe a statistically principled way to learn the set of hierarchical features that most plausibly explain a particular real-world network. By applying this approach to two example networks, we demonstrate its advantages for the interpretation of network data, the annotation of graphs with edge, vertex and community properties, and the generation of generic null models for further hypothesis testing.

posted October 11, 2006 07:44 PM in Self Referential | permalink | Comments (0)

October 07, 2006

The Netflix Prize

The big news in Computer Science this week is the announcement of the Netflix Prize competition, which is to design a whole new way of generating recommendations. NPR has an interview with James Bennett (of Netflix) about it, and the NYTimes broke the story.The game here is that Netflix provides you with a sample of its database of movie ratings by users (anonymized). Your job is to come up with a new recommendation algorithm that beats their in-house approach by at least 10%. If you win, they'll pay out a cool million USD.

There are a couple of exciting things about this competition. First, the field of collaborative filtering (CF) has apparently been squeezed dry of fresh ideas -- ideas that Amazon and Netflix have already implemented, tweaked and adjusted to make their respective recommendation systems -- Netflix wants to use the competition as a way to get the creative juices flowing again. Although I'm certainly not an expert on CF, nor am I aware of the latest techniques in this area, I sure hope that the ML community (who will probably be most interested in this competition) does more than simply try to solve yet another problem with Bayes nets, SVMs and their ilk. For a fairly narrow problem like this, it seems that some careful thinking might produce something significantly more compelling than these techniques. Lance Fortnow and John Langford both seem to also feel this way. John, who works in machine learning, seems to think that the "progress prize" of $50k (what you get if you're the best so-far, but not above the 10% threshold) is achievable with some careful thought.

The second thing is the data set itself, which is huge. For the collaborative filtering people, it's easily 100x bigger than the previous best public data set. Closer to home for me, the Netflix data can easily be turned into a bipartite network, with people on one side and movies on the other. Bipartite networks are old news themselves, but what's cool about this one is that the data is given as a series of tuples {userID, movieID, date, rating}, so it should be possible to watch how the network has grown over time. The temporal information alone is a big enough addition to the complex networks data store to produce a slew of new papers from applying the old tools as a function of time. But I think that we can ask more interesting questions than that. For instance, can our network analysis techniques be extended to explicitly account for the temporal information? By that, I mean, to account for the temporal behavior of nodes in time (velocity, acceleration, etc.). And, what would it mean to do community structure inference on a bipartite network in a meaningful way?

posted October 7, 2006 12:01 AM in Computer Science | permalink | Comments (1)

October 04, 2006

Fighting the dominant paradigm

Sean over at Cosmic Variance has a nice review of Lee Smolin's The Trouble With Physics, which is itself a critique of theoretical physic's focus on string theory as the way to unify gravity with the other forces. Most of the review focuses on Smolin's criticism of string theory's dominance, but Sean points out that Smolin is actually making two arguments, one about string theory and one about supporting interesting alternative ideas.

Smolin talks a great deal about the need for physics, and academia more generally, to support plucky upstart ideas and scholars with the courage and vision to think big and go against the grain. This is a larger point than the specific argument about how to best quantize gravity, and ultimately far more persuasive; it is likely, unfortunately, to be lost amidst the conflict between string theory and its discontents. Faculty positions and grant money are scarce commodities, and universities and funding agencies are naturally risk-averse. Under the current system, a typical researcher might spend five years in graduate school, three to six as a postdoc, and another six or seven as an assistant professor before getting tenure – with an expectation that they will write several competent papers in every one of those years. Nobody should be surprised that, apart from a few singular geniuses, the people who survive this gauntlet are more likely to be those who show technical competence within a dominant paradigm, rather than those who will take risks and pursue their idiosyncratic visions. The dogged pursuit of string theory through the 1970’s by Green and Schwarz is a perfect example of the ultimate triumph of the latter approach, and Smolin is quite correct to lament the lack of support for this kind of research today.

Although he's talking about theoretical physicists, the same applies just as much to other disciplines (perhaps with shorter postdoc periods) and their relationship to upstart ideas. Of course, finding the right balance between "normal science" and "paradigm-shifting science" is not easy, and there is a big difference between supporting interesting new ideas and supporting crackpots. Sometimes, that distinction can be hard to see at first, but all good new ideas ultimately lead to really excellent science. Fortunately, there are places that actively encourage both both excellent work and thinking about crazy ideas.

Update Oct. 4: Suresh blogs about Sean's review as well, and also zeros in on the same passage. He makes some valuable points about how important it is to build your own model of how to do good research. Separately, Dave Bacon blogs about Peter Shor's review of Smolin's book, in which Shor likens Nature to an insurance salesman.

posted October 4, 2006 09:42 AM in Interdisciplinarity | permalink | Comments (0)

October 01, 2006

Visualizing biology

Aftering seeing it blogged on Cosmic Variance and Shtetl-Optimzed, I wasn't surprised when my friend Josh Adelman wrote to share it, too, with "it" being this wonderful bit of bio-visualization:

A brief popular explanation of the video, which gives substantial background on Harvard's sponsorship of XVIVO, the company that did the work, is here. The video above is apparently a short version of a longer one that will be used in Harvard's undergraduate education; the short version will play at this year's Siggraph 2006 Electronic Theater.

Anyway, what prompted me to blog about this video is that, being a biophysicist, Josh added some valuable additional insight into what's being shown here, and the new(-ish) trend of trying to popularize this stuff through these beautiful animations.

For instance, that determined looking "walker" in the video is actually a kinesin molecule walking along a microtubule, which was originally worked out by the Vale Lab at UCSF, which have their own (more technical) animation of how the walker actually works. Truly, an amazingly little protein.

Another of the more visually stunning bits of the film is the self-assembling behavior of the microtubules themselves. This work was done by the Nogales Lab at Berkeley, and they too have some cool animations that explain how microtubules dynamically assemble and disassemble.

DNA replication hardly makes an appearance in the video above, but the Walter & Eliza Hall Institute produced several visually stunning shorts that show how this process works (the sound-effects are cheesy, but it's clear the budget was well spent on other things).

posted October 1, 2006 03:43 PM in Scientifically Speaking | permalink | Comments (0)