January 31, 2007
My kingdom for a good null-model
The past few days, I've been digging into the literature on extreme value theory, which is a rather nice branch of probability theory that shows how the distribution of the largest (or, smallest) observed value varies. This exercise has been mostly driven by a desire to understand how it connects to my own research on power-law distributions (I'm reluctant to admit that I'm actually working on a lengthy review article on the topic, partially in an attempt to clear up what seems to be substantial confusion over both their significance and how to go about measuring them in real data). But, that's a topic for a future post. What I really want to mention is an excellent example of good statistical reasoning in experimental high energy physics (hep), as related by Prof. John Conway over at CosmicVariance. Conway is working on the CDF experiment (at Fermilab), and his story kicks off with the appropriate quip "Was it real?" The central question Conway faces is whether or not a deviation / fluctuation in his measurements is significant. If it is, then it's evidence for the existence of a particular particle called the Higgs boson - a long sought-after component of the Standard Model of particle physics. If not, then it's back to searching for the Higgs. What I liked most about Conway's post is the way the claims of significance - the bump is real - are carefully vetted against both theroetical expectations of random fluctuations and a desire to not over-hype the potential for discovery.
In the world of networks (and power laws), the question of "Is it real?" is one that I wish was asked more often. When looking at complex network structure, we often want to know whether a pattern or the value of some statistical measure could have been caused by chance. Crucially, though, our ability to answer this question depends on our model of chance itself - this point is identical to the one that Conway faces, however, for hep experiments, the error models are substantially more precise than what we have for complex networks. Historically, network theorists have used either the Erdos-Renyi random graph or the configuration model (see cond-mat/0202208) as the model of chance. Unfortunately, neither of these look anything like the real-world, and thus probably provide terrible over-estimates of the significance of any particular network pattern. As a modest proposal, I suggest that hierarchical random graphs (HRGs) seem to serve as a more robust null-model, since they can capture a wide range of the heterogeneity that we observe in the real-world, e.g., community structure, skewed degree distribution, high clustering coefficient, etc. The real problem, of course, is that a good null-model depends heavily on what kind of question is being asked. In the hep experiment, we know enough about what the results would look like without the Higgs that, if it does exist, then we'd see large (i.e., statistically large) fluctuations at a specific location in the distribution.
Looking forward, the general problem of coming up with good null-models of network structure, against which we can reasonably benchmark our measurements and their deviations from our theoretical expectations, is hugely important, and I'm sure it will become increasingly so as we delve more deeply into the behavior of dynamical processes that run on top of a network (e.g., metabolism or signaling). For instance, what would a reasonable random-graph model of a signaling network look like? And, how can we know if the behavior of a real-world signaling network is within statistical fluctuations of its normal behavior? How can we tell whether two metabolic networks are significantly different from each other, or whether their topology is identical up to a small amount of noise? Put another way, how can we tell when a metabolic network has made a significant shift in its behavior or struture as a result of natural selection? One could even phrase the question of "What is a species?" as a question of whether the difference between two organisms is within statistical fluctuations of a cannonical member of the species.
January 30, 2007
Since most of my intellectual activities depend, in some way, on the generosity of the American people (via taxes) and the political will of politicians, I can't help but follow the problems of funding for science. For those of you not nearly as obsessed with the relationship between science and our society, let me catch you up on the recent political turmoil. President Bush announced the "American Competitiveness Initiative" in his State of the Union 2006 address, which proposed to substantially increase federal funding of science (via agencies like NSF, NIST and the DOE). But, as is usually the case with such programs, there was always the question of whether real money would follow the promise. Then, when funding for FY2007 started getting tight, Congress froze almost all government agencies' funding at their FY2006 levels, which basically killed the idea of increasing funding for science. But, in a recent turn-about (largely the result of Democrats' actions), Congress passed a "continuing resolution" that would exempt the main basic-science agencies from the freeze. From the Computing Research (CRA) Policy Blog:
Science was one of just a few priorities protected by Congressional Democrats in the bill -- it joins federal highway programs, veteran's health care, the FBI and local law enforcement, and Pell grant funding.
The result is that the basic-science agencies will see a slight increase in funding, although not quite what the President's initiative promised. Good news for science, and good news for society. Why the latter? Because this kind of investment is what makes our country special, and thus worth defending, in the first place.
(Tip to Lance Fortnow.)
January 27, 2007
Fish are the new birds?
Given my apparent fascination with bird brains, and their evident ability to functionally imitate mammalian brains, imagine my surprise to discover that fish (specifically the males of a species of cichlid called A. burtoni) employ similar logical inference techniques to birds and mammals. The experimental setup allowed a bystander cichlid to observe fights between five others, through which a social hierarchy of A > B > C > D > E was constructed. In subsequent pairings between the bystander and the members of the hierarchy, the bystander preferred pairing with the losers in the hierarchy, i.e., near E and D. The idea is that the bystander is hedging his bet on where he stands in the hierarchy by preferring to fight losers over winners.
One interesting implication of this study is that logical inference - in this case something called "transitive inference", which allows the user to use chains of relationships to infer additional relationships that shortcut the chain, e.g., A > B and B > C implies A > C - maybe have evolved extremely early in the history of life; alternatively, it could be that the ability to do logical inference is something that brains can acquire relatively quickly when natural selection favors it slightly. In the case of the cichlids, it may be that the development of transitive inference evolved in tandem with their becoming highly territorial.
I wonder what other cerebral capabilities fish and birds have in common...
L. Grosenick, T. S. Clement and R. D. Fernald, "Fish can infer social rank by observation alone." Nature 445, 429 (2007).
January 25, 2007
DIMACS - Complex networks and their applications (Day 3)
The third day of the workshop focused on applications to biochemical networks (no food webs), with a lot of that focus being on the difficulties of taking fuzzy biological data (e.g., gene expression data) and converting it into an accurate and meaningful form for further analysis or for hypothesis testing. Only a few of the talks were theoretical, but this perhaps reflects the current distribution of focus in biology today. After the workshop was done, I wondered just how much information crossed between the various disciplines represented at the workshop - certainly, I came away from it with a few new ideas, and a few new insights from the good talks I attended. And I think that's the sign of a successful workshop.
Complex Networks in Biology
Chris Wiggins (Columbia) delivered a great survey of interesting connections between machine learning and biochemical networks. It's probably fair to say that biologists are interested in constructing an understanding of cellular-level systems that compares favorably to an electrical engineer's understanding of circuits (Pointer: Can a Biologist Fix a Radio?). But, this is hard because living stuff is messy, inconsistent in funny ways, and has a tendency to change while you're studying it. So, it's harder to get a clean view of what's going on under the hood than it was with particle physics. This, of course, is where machine learning is going to save us - ML offers powerful and principled ways to sift through (torture) all this data.
The most interesting part of his talk, I think, was his presentation of NetBoost, a mechanism discriminator that can tell you which (among a specific suite of existing candidates) is the most likely to have generated your observed network data . For instance, was it preferential attachment (PA) or duplication-mutation-complementation (DMC) that produced a given protein-interaction network (conclusion: the latter is better supported). The method basically works by constructing a decision tree that looks at the subgraph decomposition of a network and scores it's belief that each of the various mechanisms produced it . With the ongoing proliferation of network mechanisms (theorists really don't have enough to do these days), this kind of approach serves as an excellent way to test a new mechanism against the data it's supposed to be emulating.
One point Chris made that resonated strongly with me - and which Cris and Mark made yesterday - is the problem with what you might call "soft validation" . Typically, a study will cluster or do some other kind of analysis with the data, and then tell a biological story about why these results make sense. On the other hand, forcing the clustering to make testable predictions would be a stronger kind of validation.
Network Inference and Analysis for Systems Biology
Just before lunch, Joel Bader (Johns Hopkins) gave a brief talk about his work on building a good view of the protein-protein interaction network (PPIN). The main problems with this widely studied data are the high error rate, both for false positives (interactions that we think exist, but don't) and false negatives (interactions that we think don't exist, but do). To drive home just how bad the data is, he pointed out that two independent studies of the human PPIN showed just 1% overlap in the sets of "observed" interactions.
He's done a tremendous amount of work on trying to improve the accuracy of our understanding of PPINs, but here he described a recent approach that fits degree-based generative models  to the data using our old friend expectation-maximization (EM) . His results suggest that we're seeing about 30-40% of the real edges, but that our false positive rate is about 10-15%. This is a depressing signal-to-noise ratio (roughly 1%), because the number of real interactions is O(n), while our false positive rate is O(n^2). Clearly, the biological methods used to infer the interactions need to be improved before we have a clear idea of what this network looks like, but it also suggests that a lot of the previous results on this network are almost surely wrong. Another question is whether it's possible to incorporate these kinds of uncertainties into our analyses of the network structure.
Activating Interaction Networks and the Dynamics of Biological Networks
Meredith Betterton (UC-Boulder) presented some interesting work on signaling and regulatory networks. One of the more surprising tidbits she used in her motivation is the following. In yeast, the mRNA transcription undergoes a consistent 40-minute genome-wide oscillation, but when exposed to an antidepressant (in this case, phenelzine), the period doubles . (The fact that gene expression oscillates like this poses another serious problem for the results of gene expression analysis that doesn't account for such oscillations.)
The point Meredith wanted to drive home, though, was we shouldn't just think of biochemical networks as static objects - they also represent the form that the cellular dynamics must follow. Using a simple dynamical model of activation and inhibition, she showed that the structure (who points to who, and whether an edge inhibits or activates its target) of a real-world circadian rhythm network and a real-world membrane-based signal cascade basically behave exactly as you would expect - one oscillates and the other doesn't. But, then she showed that it only takes a relatively small number of flips (activation to inhibition, or vice versa) to dramatically change the steady-state behavior of these cellular circuits. In a sense, this suggests that these circuits are highly adaptable, given a little pressure.
There are several interesting questions that came to mind while she was presenting. For instance, if we believe there are modules within the signaling pathways that accomplish a specific function, how can we identify them? Do sparsely connected dense subgraphs (assortative community structure) map onto these functional modules? What are the good models for understanding these dynamics, systems of differential equations, discrete time and matrix multiplication, or something more akin to a cellular version of Ohm's Law? 
 M. Middendorf, E. Ziv and C. Wiggins, "Inferring Network Mechanisms: The Drosophila melanogaster Protein Interaction Network." PNAS USA 102 (9), 3192 (2005).
 Technically, it's using these subgraphs as generic features and then crunching the feature vectors from examples of each mechanism through a generalized decision tree in order to learn how to discriminate among them. Boosting is used within this process in order to reduce the error rates. The advantage of this approach to model selection and validation, as Chris pointed out, is that it doesn't assume a priori which features (e.g., degree distribution, clustering coefficient, distance distribution, whatever) are interesting, but rather chooses the ones that can actually discriminate between things we believe are different.
 Chris called it "biological validation," but the same thing happens in sociology and Internet modeling, too.
 I admit that I'm a little skeptical of degree-based models of these networks, since they seem to assume that we're getting the degree distribution roughly right. That assumption is only reasonable if our sampling of the interactions attached to a particular vertex is unbiased, which I'm not sure about.
 After some digging, I couldn't find the reference for this work. I did find this one, however, which illustrates a different technique for a related problem. I. Iossifov et al., "Probabilistic inference of molecular networks from noisy data sources." 20 (8), 1205 (2004).
 C. M. Li and R. R. Klevecz, "A rapid genome-scale response of the transcriptional oscillator to perturbation reveals a period-doubling path to phenotypic change." PNAS USA 103 (44), 16254 (2006).
 Maribeth Oscamou pointed out to me during the talk that any attempt to construct such rules have to account for processes like the biochemical degradation of the signals. That is, unlike electric circuits, there's no strict conservation of the "charge" carrier.
January 24, 2007
DIMACS - Complex networks and their applications (Day 2)
There were several interesting talks today, or rather, I should say that there were several talks today that made me think about things beyond just what the presenters said. Here's a brief recap of the ones that made me think the most, and some commentary about what I thought about. There were other good talks today, too. For instance, I particularly enjoyed Frank McSherry's talk on doing PageRank on his laptop. There was also one talk on power laws and scale-free graphs that stimulated a lot of audience, ah, interaction - it seems that there's a lot of confusion both over what a scale-free graph is (admittedly the term has no consistent definition in the literature, although there have been some recent attempts to clarify it in a principled manner), and how to best show that some data exhibit power-law behavior. Tomorrow's talks will be more about networks in various biological contexts.
Complex Structures in Complex Networks
Mark Newman's (U. Michigan) plenary talk mainly focused on the importance of having good techniques to extract information from networks, and being able to do so without making a lot of assumptions about what the technique is supposed to look for. That is, rather than assume that some particular kind of structure exists and then look for it in our data, why not let the data tell you what kind of interesting structure it has to offer?  The tricky thing about this approach to network analysis, though, is working out a method that is flexible enough to find many different kinds of structure, and to present only that which is unusually strong. (Point to ponder: what should we mean by "unusually strong"?) This point was a common theme in a couple of the talks today. The first example that Mark gave of a technique that has this nice property was a beautiful application of spectral graph theory to the task of find a partition of the vertices that give an extremal value of modularity. If we ask for the maximum modularity, this heuristic method , using the positive eigenvalues of the resulting solution, gives us a partition with very high modularity. But, using the negative eigenvalues gives a partition that minimizes the modularity. I think we normally think of modules meaning assortative structures, i.e., sparsely connected dense subgraphs. But, some networks exhibit modules that are approximately bipartite, i.e., they are disassortative, being densely connected sparse subgraphs. Mark's method naturally allows you to look for either. The second method he presented was a powerful probabilistic model of node clustering that can be appropriately parameterized (fitted to data) via expectation-maximization (EM). This method can be used to accomplish much the same results as the previous spectral method, except that it can look for both assortative and disassortative structure simultaneously in the same network.
Hierarchical Structure and the Prediction of Missing Links
In an afternoon talk, Cris Moore (U. New Mexico) presented a new and powerful model of network structure, the hierarchical random graph (HRG) . (Disclaimer: this is joint work with myself and Mark Newman.) A lot of people in the complex networks literature have talked about hierarchy, and, presumably, when they do so, they mean something roughly along the lines of the HRG that Cris presented. That is, they mean that nodes with a common ancestor low in the hierarchical structure are more likely to be connected to each other, and that different cuts across it should produce partitions that look like communities. The HRG model Cris presented makes these notions explicit, but also naturally captures the kind of assortative hierarchical structure and the disassortative structure that Mark's methods find. (Test to do: use HRG to generate mixture of assortative and disassortative structure, then use Mark's second method to find it.) There are several other attractive qualities of the HRG, too. For instance, using a Monte Carlo Markov chain, you can find the hierarchical decomposition of a single real-world network, and then use the HRG to generate a whole ensemble of networks that are statistically similar to the original graph . And, because the MCMC samples the entire posterior distribution of models-given-the-data, you can look not only at models that give the best fit to the data, but you can look at the large number of models that give an almost-best fit. Averaging properties over this ensemble can give you more robust estimates of unusual topological patterns, and Cris showed how it can also be used to predict missing edges. That is, suppose I hide some edges and then ask the model to predict which ones I hid. If it can do well at this task, then we've shown that the model is capturing real correlations in the topology of the real graph - it has the kind of explanatory power that comes from making correct predictions. These kinds of predictions could be extremely useful for laboratory or field scientists who manually collect network data (e.g., protein interaction networks or food webs) . Okay, enough about my own work!
The Optimization Origins of Preferential Attachment
Although I've seen Raissa D'Souza (UC Davis) talk about competition-induced preferential attachment  before, it's such an elegant generalization of PA that I enjoyed it a second time today. Raissa began by pointing out that most power laws in the real-world can't extend to infinity - in most systems, there are finite limits to the size that things can be (the energy released in an earthquake or the number of edges a vertex can have), and these finite effects will typically manifest themselves as exponential cutoffs in the far upper tail of the distribution, which takes the probability of these super-large events to zero. She used this discussion as a springboard to introduce a relatively simple model of resource constraints and competition among vertices in a growing network that produces a power-law degree distribution with such an exponential cutoff. The thing I like most about this model is that it provides a way for (tempered) PA to emerge from microscopic and inherently local interactions (normally, to get pure PA to work, you need global information about the system). The next step, of course, is to find some way to measure evidence for this mechanism in real-world networks . I also wonder how brittle the power-law result is, i.e., if you tweak the dynamics a little, does the power-law behavior disappear?
Web Search and Online Communities
Andrew Tomkins (of Yahoo! Reserch) is a data guy, and his plenary talk drove home the point that Web 2.0 applications (i.e., things that revolve around user-generated content) are creating a huge amount of data, and offering unparalleled challenges for combining, analyzing, and visualizing this data in meaningful ways. He used Flickr (a recent Y! acquisition) as a compelling example by showing an interactive (with fast-rewind and fast-forward features) visual stream of the trends in user-generated tags for user-posted images, annotated with notable examples of those images. He talked a little about the trickiness of the algorithms necessary to make such an application, but what struck me most was his plea for help and ideas in how to combine information drawn from social networks with user behavior with blog content, etc. to make more meaningful and more useful applications - there's all this data, and they only have a few ideas about how to combine it. The more I learn about Y! Research, the more impressed I am with both the quality of their scientists (they recently hired Duncan Watts), and the quality of their data. Web 2.0 stuff like this gives me the late-1990s shivers all over again. (Tomkins mentioned that in Korea, unlike in the US, PageRank-based search has been overtaken by an engine called Naver, which is driven by users building good sets of responses to common search queries.)
 To be more concrete, and perhaps in lieu of having a better way of approaching the problem, much of the past work on network analysis has taken the following approach. First, think of some structure that you think might be interesting (e.g., the density of triangles or the division into sparsely connected dense subgraphs), design a measure that captures that structure, and then measure it in your data (it turns out to be non-trivial to do this in an algorithm independent way). Of course, the big problem with this approach is that you'll never know whether there is other structure that's just as important as, or maybe more important than, the kind you looked for, and that you just weren't clever enough to think to look for it.
 Heuristic because Mark's method is a polynomial time algorithm, while the problem of modularity maximization was recently (finally...) shown to be NP-complete. The proof is simple, and, in retrospect, obvious - just as most such proofs inevitably end up being. See U. Brandes et al. "Maximizing Modularity is hard." Preprint (2006).
 M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices." PRE 74, 036104 (2006).
 M. E. J. Newman and E. A. Leicht, "Mixture models and exploratory data analysis in networks." Submitted to PNAS USA (2006).
 A. Clauset, C. Moore and M. E. J. Newman, "Structural Inference of Hierarchies in Networks." In Proc. of the 23rd ICML, Workshop on "Statistical Network Analysis", Springer LNCS (Pittsburgh, June 2006).
 This capability seems genuinely novel. Given that there are an astronomical number of ways to rearrange the edges on a graph, it's kind of amazing that the hierarchical decomposition gives you a way to do such a rearrangement, but one which preserves the statistical regularities in the original graph. We've demonstrated this for the degree distribution, the clustering coefficient, and the distribution of pair-wise distances. Because of the details of the model, it sometimes gets the clustering coefficient a little wrong, but I wonder just how powerful / how general this capability is.
 More generally though, I think the idea of testing a network model by asking how well it can predict things about real-world problems is an important step forward for the field; previously, "validation" consisted of showing only a qualitative (or worse, a subjective) agreement between some statistical measure of the model's behavior (e.g., degree distribution is right-skewed) and the same statistical measure on a real-world network. By being more quantitative - by being more stringent - we can say stronger things about the correctness of our mechanisms and models.
 R. M. D'Souza, C. Borgs, J. T. Chayes, N. Berger, and R. Kleinberg, "Emergence of Tempered Preferential Attachment From Optimization", To appear in PNAS USA, (2007).
 I think the best candidate here would be the BGP graph, since there is clearly competition there, although I suspect that the BGP graph structure is a lot more rich than the simple power-law-centric analysis has suggested. This is primarily due to the fact that almost all previous analyses have ignored the fact that the BGP graph exists as an expression of the interaction of business interests with the affordances of the Border Gateway Protocol itself. So, its topological structure is meaningless without accounting for the way it's used, and this means accounting for complexities of the customer-provider and peer-to-peer relationships on the edges (to say nothing of the sampling issues involved in getting an accurate BGP map).
January 23, 2007
DIMACS - Complex networks and their applications (Day 1)
Today and tomorrow, I'm at the DIMACS workshop on complex networks and their applications, held at Georgia Tech's College of Computing. Over the course of the workshop, I'll be blogging about the talks I see and whatever ideas they stimulate (sadly, I missed most of the first day because of travel).
The most interesting talk I saw Monday afternoon was by Ravi Kumar (Yahoo! Research), who took location data of users on LiveJournal, and asked Do we see the same kind of routable structure - i.e., an inverses-square law relationship in the distance between people and the likelihood that they have a LJ connection - that Kleinberg showed was optimal for distributed / local search? Surprisingly, they were able to show that in the US, once you correct for the fact that there can be many people at a single "location" in geographic space (approximated to the city level), you do indeed observe exactly the kind of power-law that Kleinberg predicted . Truly, this was a kind of stunning confirmation of Kleinberg's theory. So now, the logical question would be, What mechanism might produce this kind of structure in geographic space? Although you could probably get away with assuming a priori the population distribution, what linking dynamics would construct the observed topological pattern? My first project in graduate school asked exactly this question for the pure Kleinberg model, and I wonder if it could be adapted to the geographic version that Kumar et al. consider.
 D. Liben-Nowell, et al. "Geographic Routing in Social Networks." PNAS USA 102, 33 11623-1162 (2005).
January 20, 2007
I've had a wonderful Saturday morning watching various TED Talks. Here's an assorted list of good ones. Most of them are about 18 minutes long.
Richard Dawkins. Normally, I can only handle Dawkins in small doses since I've heard most of his polemics on religion before. But here, he waxes philosophical about the nature of our ability to understand the world around us, and the peculiar biases we have as a result of growing up (as a species) on the savannah.
David Deutsch. Echoing Dawkins' sentiment, Deutsch (a reknown quantum theorist) walks us through his thoughts on why genuine knowledge production - and by this he specifically means our learning how to model the world around us with increasing accuracy - is the thing that sets humans apart from all other pieces of matter.
Sir Ken Robinson. Robinson is apparently known as an expert on creativity, although I'm not sure why. His talk touches on many of the same themes that Dawkins and Deutsch mention, although he focuses more on the importance of cultivating our innate and differing abilities to produce knowledge.
Larry Brilliant. The year after I was born, smallpox was declared eradicated, and the man who helped oversee its eradication was Brilliant. In this talk, he describes that work, but also impresses on us just how devastating a global pandemic would be, both economically, socially and culturally. His messsage: early detection, early response.
Steven Levitt. The author of Freakonomics gives a fascinating account of the economics of gangs during the crack-cocaine era. The best part is the quotations at the end where gang members explain basic economy theory, but translated into the language of hustlers.
Barry Schwartz. I remember Prof. Schwartz from my freshman psychology course - he's a highly entertaining speaker and, apparently, still loves to use New Yorker cartoons to illustrate his points. Here, he talks about how having more choices makes it harder to choose, and less likely that we'll be pleasantly surprised. A nice counter-point to Malcolm Gladwell's talk on the benefits of diversity of choice.
Michael Shermer. When I was a teenager just getting interested in science, I remember being fascinated by Shermer's column in Scientific American where he debunked bad science of all kinds. His talk is like a live version of one of his columns.
Peter Donnelly. On a similar note as Shermer, Donnelly, a statistician from Oxford, gives an excellent talk about just how bad humans are with reasoning through uncertainty - a topic everyone should be better educated about, given how much authority our society places in numbers today.
January 19, 2007
Ken Robinson on education and creativity
I started running across the TED talks a while back, and thought the whole conference seemed like a wonderful idea. I'd completely forgotten about them until I was perusing the PresentationZen blog (which is chock full of useful tips on making presentations better - something I think about regularly since part of my life is spent standing in front of audiences blathering on about things I hope they find interesting), which linked to this one by Sir Ken Robinson on creativity and education. It's quite funny, but also makes some excellent points. (The link goes to a TED page with the same video file.)
Another I quite enjoyed was by Hans Rosling's talk on global health. He's quite a champion of both collecting and visualizing data, and his animations of how the nations of the world have changed (fertility, income, child mortality, etc.) over the past 40 years are truly impressive.
January 16, 2007
Big brains and bird brains
How did I become so fascinated by bird brains?
Last week, NewScientist posted an interesting article describing several recent studies on birds and brains. The hypothesis that both studies consider is whether larger brains offer a particular evolutionary advantage for birds.
In the first, Susanne Shultz and her colleagues in the UK considered whether large-brained birds are better able to adapt to the changing environmental conditions induced by human farming activities than their small-brained cousins. It turns out that it's not just having a big brain that helps birds here. Instead, it's having a big cerebrum - that part of the brain that is overly developed in we humans.
Although their study is purely empirical, there are some interesting theoretical questions here. For instance, past studies on the decline of farmland birds succeeded only in pointing out that so-called "generalists" fared better than "specialists." This distinction is one I've encountered before in the ecology literature, but it's never very well defined. Is it possible to come up with, perhaps from first principles, a reasonable quantity that captures the notion that species vary in how specific their needs for survival are? Humans, for instance, would seem to be generalists, but are we more or less general than ravens? Until we have such a quantity, the question of, for instance, whether humans or cockroaches are more general, doesn't make any sense. But, such details haven't stopped some scientists in the past, and it doesn't stop Shultz et al. from suggesting that bigger cerebrums correlate with more generalist behavior, and that explains why these bird species are able to adapt to changing conditions.
In summary, our results suggest that the farmland birds whose populations have suffered most under agricultural intensification are those with more specialized resource and habitat use and lesser cognitive abilities.
The work of Sol et al. (the second article) seems to support this idea, however. They consider whether larger bird-brains correlated with a higher degree of success in the colonization of new areas. The impressive thing about this work is the extent to which the authors try to control for other factors that might misleadingly give the appearance of a correlation between brain size and success. (This kind of careful statistical analysis makes their conclusion - that big brains help - all the more persuasive.)
Our findings support the hypothesis that large or elaborated brains function, and hence may have evolved, to deal with changes in the environment... [The many hypotheses to the origin of large brains] are essentially based on the same principle, that enlarged brains enhance the cognitive skills necessary to respond to changes in the environment...
The fact that their results suggest such a connection seems to contradict the idea that in order to be successful, an invading species must fit into a previously unexploited niche or out compete previously established species. That is, here, successful species adapt to their new environment by figuring out ways to get food, avoid becoming food, and reproducing. I'm not sure this kind of argument applies beyond higher vertabrates, though. What's missing, it seems, is some kind of theoretical explanation that connects brain size with adaptability. Of course, such a theory would itself depend on knowing what exactly brains do and why making certain parts of them (i.e., the cerebrum) bigger seem to improve one's ability to do certain things.
Shultz et al. "Brain size and resource specialization predict long-term population trends in British birds." Proc. Royal Society B 272, p2305-2311 (2005).
Sol et al. "Big brains, enhanced cognition, and response of birds to novel environments." Proc. National Academy of Science USA 102 (15), p5460 (2005).
January 02, 2007
One brain, two brains, Red brain, blue brains.
Brains, brains, brains.
How do they do that thing that they do?
One of my first posts here, almost two years ago, was a musing on the structure and function of brains, and how, although bird brains and primate brains are structured quite differently, they seem to perform many of the same "high cognitive" tasks that we associate with intelligence. Carrion crows that use tools, and magpies with a sense of permanence (my niece just recently learned this fact, and is infinitely amused by it). From my musing in early 2005:
So how is it that birds, without a neocortex, can be so intelligent? Apparently, they have evolved an set of neurological clusters that are functionally equivalent to the mammal's neocortex, and this allows them to learn and predict complex phenomena. The equivalence is an important point in support of the belief that intelligence is independent of the substrate on which it is based; here, we mean specifically the types of supporting structures, but this independence is a founding principle of the dream of artificial intelligence (which is itself a bit of a misnomer). If there is more than one way that brains can create intelligent behavior, it is reasonable to wonder if there is more than one kind of substance from which to build those intelligent structures, e.g., transitors and other silicon parts.
Parrots, those inimitable imitators, are linguistic acrobats, but are they actually intelligent? There is, apparently, evidence that they are. Starting in 1977, Irene Pepperberg (Dept. Psychology, Brandeis University) began training an African Grey parrot named Alex in the English language . Amazingly, Alex has apparently mastered a vocabulary of about a hundred words, understands concepts like color and size, can convey his desires, and can count. (Pepperton has a short promotional video (3MB) that demonstrates some of these abilities, although her work has been criticized as nothing but glorified operant conditioning by Noam Chomsky. Of course, one could probably also argue that what humans do is actually nothing more than the same.)
How long will it be, I wonder, before they stick Alex in an MRI machine to see what his brain is doing? Can we tell a difference, neurologically, between operant conditioning and true understanding? Can an inter-species comparative neuroscience resolve questions about how the brain does what it does? For instance, do Alex's cortical clusters specialize in tasks in the same way that regions of the mammalian brain are specialized? I wonder, too, what the genetics of such a comparative neuroscience would say - are there genes and genetic regularoty structures that are conserved between both (intelligent) bird and (intelligent) mammal species? Many, many interesting questions here...
 Sadly, I must admit, what brought Alex to my attention, was not his amazingly human-like linguistic abilities. Rather, it was an article in the BBC about another African Grey named N'kisi, who has been used to try to demonstrate telepathy in animals. N'kisi, trained by an artist Aimée Morgana, has a larger vocabulary than Alex, and also seems to have a (wry) sense of humor.
In the BBC article, there's a cryptic reference to an experiment that apparently demonstrates N'kisi's talent with language. But, a little digging reveals that this experiment was actually intended to show that N'kisi has a telepathic connection with Morgana. And this is what got the BBC to do an article about the intelligence of parrots, even though the article makes no overt mention of the pseudo-scientific nature of the experiment.