April 01, 2012
Things to read while the simulator runs (higher ed. edition)
Reforming Science: Methodological and Cultural Reforms by Arturo Casadevall and Ferric C. Fang
An opinion piece published by the American Society for Microbiology (with a popular summary here, titled Has Modern Science Become Dysfunctional?) about the increasingly perverse incentives in modern science. Despite the implication of the title, the authors are not arguing that science is failing to produce new or useful knowledge, merely that the social system of rewards discourages the type of curiosity-driven investigation that drove the dazzling scientific discoveries of the 20th century. On this point, I am highly sympathetic.
To be successful, today's scientists must often be self-promoting entrepreneurs whose work is driven not only by curiosity but by personal ambition, political concerns, and quests for funding.
They go on to argue that this hyper-competitive system does not, in fact, produce the best possible science because scientists choose projects based on their likelihood of increasing their payoffs within these narrow domains (more grant dollars, more high profile publications, etc.), rather than on their likelihood to produce genuine scientific progress. In short, scientific progress only sometimes aligns with ambition, politics or funding priorities, but behavior almost always well, eventually.
This line resonated particularly strongly. When my non-academic friends ask me how I like being a professor, I often describe my first two years in exactly these terms: it's like running a startup company (nonstop fund raising, intense competition, enormous pressure) with all "temp" labor (students have to be trained, and then they leave after a few years). There are certainly joys associated with the experience, but it is a highly stressful one, and I am certainly cognizant of the reward structures currently in place.
Psychology's Bold Initiative, by Siri Carpenter
Continuing the above theme, Carpenter writes about particular issues within Psychology. The problem is known as the "file-drawer problem" in which negative results tend not to be published. Combine that with noisy results from small sample sized experiments and you have a tendency for statistical flukes to be published in high profile journals as if they were facts. Carpenter describes an interesting community-based effort called the PsychFileDawer to push back against this pattern. The idea is to provide a venue for studies that only try to replicate existing claims, rather than focus on novel results in experimental psychology. Carpenter's coverage is both thoughtful and encouraging.
That being said, experimental psychology seems better positioned to succeed with something like this than, say, complex systems. "Complex systems" as a field (or even network science, if you prefer that) is so broad that well defined communities of dedicated, objective researchers have not really coalesced around specific sets of questions, a feature that seems necessary for there to be any incentive to check the results of published studies.
These pieces cover parts of the larger discussion about the crisis in higher education. The first is more about the humanities (which seem to have a greater disdain for non-academic careers than the sciences?), while the second focuses more on the benefits of getting a science PhD, in terms of intellectual sophistication, problem solving, project completion, etc., even if your career trajectory takes you outside of science. I'm highly sympathetic to the latter idea, and indeed, my impression is that many of the most exciting job opportunities at technology companies really demand the kind of training that only a PhD in a scientific or technical field can give you.
Update 6 April: Regarding the point in #2 about complex systems, perhaps I was too hasty. What I'd meant to suggest was that having a community of researchers all interested in answering the same basic questions seems like a sufficient condition for science to be productive of genuinely new knowledge. In other words, the best way to make forward progress is to have many critical eyes all examining the problem from multiple, and some redundant, angles, publishing both new and repeated results via peer review . But, this statement seems not to be true.
Cancer research can hardly be said to have a dearth of researchers, and yet a new editorial written by a former head of research at the big biotech firm Amgen and an oncology professor at the University of Texas argues that the majority of 'landmark' (their term) studies in oncology, many of which were published in the top journals and many of which spawned entire sub-disciplines with hundreds of followup papers, cannot be reproduced.
First, we know that a paper being peer reviewed and then published does not imply that its results are correct. Thus, that 47 out of 53 results could not be reproduced is not by itself worrying. But, what makes it a striking statement is that these results were chosen for testing because they are viewed as very important or influential and that many of them did generate ample followup studies. That is, something seems to have interfered with the self-corrective ideal of the scientific community that scientists are taught in graduate school, even in a field as big as cancer research.
Derek Lowe provides some nice commentary on the article, and points a popular press story that includes some additional comments by Begley, the former Amgen researcher. The important point is the Begley's position at Amgen provided him with the resources necessary to actually check the results of many of the studies . Here's the money quote from the popular piece:
Part way through his project to reproduce promising studies, Begley met for breakfast at a cancer conference with the lead scientist of one of the problematic studies.
"We went through the paper line by line, figure by figure," said Begley. "I explained that we re-did their experiment 50 times and never got their result. He said they'd done it six times and got this result once, but put it in the paper because it made the best story. It's very disillusioning."
"The best story." Since when has science been about the best story? Well, since always. The problem, I think, is not so much the story telling, but rather the emphasis on the story at the expense of the scientific, i.e., objectively verifiable, knowledge. It's not clear to me who in the peer-review pipeline, at the funding agencies or in the scientific community at large should be responsible for discouraging such a misplaced emphasis, but it does seem to be a problem, not just in young and diffuse fields like complex systems (or complex networks), but also in big and established fields like cancer research.
 Being published after peer review is supposed to mean that there are no obviously big mistakes. But, in practice, peer review is more of an aspirational statement, and passing it does not necessarily imply that there are no mistakes, even big ones, or obvious ones. In short, peer review is a human process, complete with genuine altruism, social pressures, "bad hair days," weird incentives and occasional skullduggery. The hope is that it works, on average, over the long run. Like democracy is for forms of government, peer review may be the worst way to vet scientific research, except for all the others.
 Can you imagine the NIH funding such a big effort to check the results of so many studies? I cannot.
October 07, 2011
This video of Steve Jobs' commencement address at Stanford in 2005 sat on my to-watch list for months. Now seemed like a good time to watch it. Good words, Steve.
December 14, 2010
Statistical Analysis of Terrorism
Much of the article focuses on the weird empirical fact that the frequency of severe terrorist attacks is well described by a power-law distribution [3,4], although it also discusses my work on robust patterns of behavior in terrorist groups, for instance, showing that they typically increase the frequency of their attacks as they get older (and bigger and more experienced), and moreover that they do it in a highly predictable way. There are several points I like most about Michael's article. First, he emphasizes that these patterns are not just nice statistical descriptions of things we already know, but rather they show that some things we thought were fundamentally different and unpredictable are actually related and that we can learn something about large but rare events by studying the more common smaller events. And second, he emphasizes the fact that these patterns can actually be used to make quantitative, model-based statistical forecasts about the future, something current methods in counter-terrorism struggle with.
Of course, there's a tremendous amount of hard-nosed scientific work that remains to be done to develop these empirical observations into practical tools, and I think it's important to recognize that they will not be a silver bullet for counter-terrorism, but they do show us that much more can be done here than has been traditionally believed and that there are potentially fundamental constraints on terrorism that could serve as leverage points if exploited appropriately. That is, so to speak, there's a forest out there that we've been missing by focusing only on the trees, and that thinking about forests as a whole can in fact help us understand some things about the behavior of trees. I don't think studying large-scale statistical patterns in terrorism or other kinds of human conflict takes away from the important work of studying individual conflicts, but I do think it adds quite a bit to our understanding overall, especially if we want to think about the long-term. How does that saying go again? Oh right, "those who do not learn from history are doomed to repeat it" (George Santayana, 1863-1952) .
The Miller-McCune article is fairly long, but here are a few good excerpts that capture the points pretty well:
Last summer, physicist Aaron Clauset was telling a group of undergraduates who were touring the Santa Fe Institute about the unexpected mathematical symmetries he had found while studying global terrorist attacks over the past four decades. Their professor made a comment that brought Clauset up short. "He was surprised that I could think about such a morbid topic in such a dry, scientific way," Clauset recalls. "And I hadn’t even thought about that. It was just … I think in some ways, in order to do this, you have to separate yourself from the emotional aspects of it."
But it is his terrorism research that seems to be getting Clauset the most attention these days. He is one of a handful of U.S. and European scientists searching for universal patterns hidden in human conflicts — patterns that might one day allow them to predict long-term threats. Rather than study historical grievances, violent ideologies and social networks the way most counterterrorism researchers do, Clauset and his colleagues disregard the unique traits of terrorist groups and focus entirely on outcomes — the violence they commit.
“When you start averaging over the differences, you see there are patterns in the way terrorists’ campaigns progress and the frequency and severity of the attacks,” he says. “This gives you hope that terrorism is understandable from a scientific perspective.” The research is no mere academic exercise. Clauset hopes, for example, that his work will enable predictions of when terrorists might get their hands on a nuclear, biological or chemical weapon — and when they might use it.
It is a bird’s-eye view, a strategic vision — a bit blurry in its details — rather than a tactical one. As legions of counterinsurgency analysts and operatives are trying, 24-style, to avert the next strike by al-Qaeda or the Taliban, Clauset’s method is unlikely to predict exactly where or when an attack might occur. Instead, he deals in probabilities that unfold over months, years and decades — probability calculations that nevertheless could help government agencies make crucial decisions about how to allocate resources to prevent big attacks or deal with their fallout.
 Here are the relevant scientific papers:
On the Frequency of Severe Terrorist Attacks, by A. Clauset, M. Young and K. S. Gledistch. Journal of Conflict Resolution 51(1), 58 - 88 (2007).
Power-law distributions in empirical data, by A. Clauset, C. R. Shalizi and M. E. J. Newman. SIAM Review 51(4), 661-703 (2009).
A generalized aggregation-disintegration model for the frequency of severe terrorist attacks, by A. Clauset and F. W. Wiegel. Journal of Conflict Resolution 54(1), 179-197 (2010).
The Strategic Calculus of Terrorism: Substitution and Competition in the Israel-Palestine Conflict, by A. Clauset, L. Heger, M. Young and K. S. Gleditsch Cooperation & Conflict 45(1), 6-33 (2010).
The developmental dynamics of terrorist organizations, by A. Clauset and K. S. Gleditsch. arxiv:0906.3287 (2009).
A novel explanation of the power-law form of the frequency of severe terrorist events: Reply to Saperstein, by A. Clauset, M. Young and K.S. Gleditsch. Forthcoming in Peace Economics, Peace Science and Public Policy.
 It was also slashdotted.
 If you're unfamiliar with power-law distributions, here's a brief explanation of how they're weird, taken from my 2010 article in JCR:
What distinguishes a power-law distribution from the more familiar Normal distribution is its heavy tail. That is, in a power law, there is a non-trivial amount of weight far from the distribution's center. This feature, in turn, implies that events orders of magnitude larger (or smaller) than the mean are relatively common. The latter point is particularly true when compared to a Normal distribution, where there is essentially no weight far from the mean.
Although there are many distributions that exhibit heavy tails, the power law is special and exhibits a straight line with slope alpha on doubly-logarithmic axes. (Note that some data being straight on log-log axes is a necessary, but not a sufficient condition of being power-law distributed.)
Power-law distributed quantities are not uncommon, and many characterize the distribution of familiar quantities. For instance, consider the populations of the 600 largest cities in the United States (from the 2000 Census). Among these, the average population is only x-bar =165,719, and metropolises like New York City and Los Angles seem to be "outliers" relative to this size. One clue that city sizes are not well explained by a Normal distribution is that the sample standard deviation sigma = 410,730 is significantly larger than the sample mean. Indeed, if we modeled the data in this way, we would expect to see 1.8 times fewer cities at least as large as Albuquerque (population 448,607) than we actually do. Further, because it is more than a dozen standard deviations above the mean, we would never expect to see a city as large as New York City (population 8,008,278), and largest we expect would be Indianapolis (population 781,870).
As a more whimsical second example, consider a world where the heights of Americans were distributed as a power law, with approximately the same average as the true distribution (which is convincingly Normal when certain exogenous factors are controlled). In this case, we would expect nearly 60,000 individuals to be as tall as the tallest adult male on record, at 2.72 meters. Further, we would expect ridiculous facts such as 10,000 individuals being as tall as an adult male giraffe, one individual as tall as the Empire State Building (381 meters), and 180 million diminutive individuals standing a mere 17 cm tall. In fact, this same analogy was recently used to describe the counter-intuitive nature of the extreme inequality in the wealth distribution in the United States, whose upper tail is often said to follow a power law.
Although much more can be said about power laws, we hope that the curious reader takes away a few basic facts from this brief introduction. First, heavy-tailed distributions do not conform to our expectations of a linear, or normally distributed, world. As such, the average value of a power law is not representative of the entire distribution, and events orders of magnitude larger than the mean are, in fact, relatively common. Second, the scaling property of power laws implies that, at least statistically, there is no qualitative difference between small, medium and extremely large events, as they are all succinctly described by a very simple statistical relationship.
 In some circles, power-law distributions have a bad reputation, which is not entirely undeserved given the way some scientists have claimed to find them everywhere they look. In this case, though, the data really do seem to follow a power-law distribution, even when you do the statistics properly. That is, the power-law claim is not just a crude approximation, but a bona fide and precise hypothesis that passes a fairly harsh statistical test.
 Also quoted as "Those who cannot remember the past are condemned to repeat their mistakes".
November 05, 2010
Nathan Explains Science, welcome to the blogosphere!
Nathan is a former theoretical astrophysicist who holds a PhD in political science. The first time I met him, I thought this meant that he studied awesome things like galactic warfare, blackhole coverups, and the various environmental disasters that come with unregulated and rampant terraforming. Fortunately for us, he instead studies actual politics and social science, which is probably more useful (and sadly more interesting) than astro-politics.
Here's Nathan explaining why he's now also a blogger:
...this gives me an place to tell you about science news that I think is interesting but that isn't necessarily going to get published in Science News or Nature's news section. For a variety of reasons, social science news especially doesn't get discussed as science, and that's unfortunate because there are scientific results coming out of psychology, political science, and economics that are vitally important for understanding the problems we face and the solutions we should pursue. In fact, there are a lot of old results that people should know about but don't because social science news seems less attractive than, say, finding a galaxy farther away than any other.
And, if you want to read more about the science, try out these stories, in which Nathan explains the heck out of narcissism, what makes us vote, political grammar and baby introspection:
Is Narcissism Good For Business?
Narcissists, new experiments show, are great at convincing others that their ideas are creative even though they're just average. Still, groups with a handful of narcissists come up with better ideas than those with none, suggesting that self-love contributes to real-world success.
Sweaty Palms and Puppy Love: The Physiology of Voting
Does your heart race at the sight of puppies? Do pictures of vomit make you sweat? If so, you may be more likely to vote.
Politicians, Watch Your Grammar
As congressional midterm elections approach in the United States, politicians are touting their credentials, likability, and, yes, sometimes even their policy ideas. But they may be forgetting something crucial: grammar. A new study indicates that subtle changes in sentence structure can make the difference between whether voters view a politician as promising or unelectable.
‘Introspection’ Brain Networks Fully Formed at Birth
Could a fetus lying in the womb be planning its future? The question comes from the discovery that brain areas thought to be involved in introspection and other aspects of consciousness are fully formed in newborn babies...
More Evidence for Hidden Particles?
Like Lazarus from the dead, a controversial idea that there may be a new, superhard-to-spot kind of particle floating around the universe has made a comeback. Using a massive particle detector, physicists in Illinois have studied the way elusive particles called antineutrinos transform from type or "flavor" to another, and their data bolster a decade-old claim that the rate of such transformation is so high that it requires the existence of an even weirder, essentially undetectable type of neutrino. Ironically, the same team threw cold water on that idea just 3 years ago, and other researchers remain skeptical.
Update 6 November 2010: added a new story by Nathan, on hidden particles.
November 06, 2009
Things to read while the simulator runs; part 8
While chatting with Jake Hofman the other day, he pointed me to some analysis by the Facebook Data Team about the way people use online social networks. One issue that seems to come up pretty regularly with Facebook is how many of your "friends" are "real" in some sense (for instance, this came up on a radio show this morning, and my wife routinely teases me for having nearly 400 "friends" on Facebook).
The answer, according to the Facebook Data Team, is that while it depends on how you define "real," with access to the underlying data, you can pretty clearly see how much interaction actually flows across the different links. One neat thing they found (within a lot of interesting analysis) is that the amount of interaction across all your connections scales up with the number of connections you have. That is, the more friends you have, the more friends you interact with. (It can't be a linear relationship, though, since otherwise, people with 1000s of friends would be spending all of their free time on Facebook... oh wait, some people actually do that.)
A related point that I've found myself discussing several times recently with my elders (some of whom I think are, at some level, alienated and befuddled by computer and Web technology), is whether Facebook (or, technology in general) increases social isolation, and thus is leading to some kind of collapse of civil society. I've argued passionately that it's human nature to be social and thus extremely unlikely that technology alone is having this effect, and that technology instead actually facilitates social interactions, allowing people to be even more social overall (even if they may spend slightly less time face-to-face) than before. Mobile phones are my favorite example of social facilitation, since they allow people to interact with their friends in situations when previously they could not (e.g., standing in line at the bank, walking around town, etc.), even if occasionally it leads to ridiculous situations like two people sitting next to each other, but each texting or talking on their phones with people elsewhere.
And, just in time to bolster my arguments, The Pew Internet and American Life Project released a study this week (also discussed in the NYTimes) showing that technology users are more social than non-technology users, and that other, non-technological trends are to blame for the apparent decrease in the size of (non-technology using) Americans' social circles over the past 20 years. Of course, access to and use of technology often correlates with affluence, so what really might be going on is that, like with nutrition, the affluent are better positioned to lead physically and socially healthy lives than the poor.
Recently, for a project on evolution, I've been reading pretty deeply in the paleontology and marine mammal literature (more on that in the next post). The first thing that I noticed is how easy it is now to access vast amounts of scientific literature from the comfort of your office. Occasionally, I had to get up to see Margaret, our librarian, but most of the time I could get what I needed through electronic access. But, sometimes I would encounter a pay wall that my institutional access wouldn't allow me to circumvent.
At first, it was extremely irritating and induced open-access revolutionary spirits in me. Then, I did what I suspect many of you have done, too, which is to ask my friends at other universities to try to get access to the paper using their institutional access, and to send me a copy. On a small scale, this is like asking your friends to share individual musical tracks with you. So, naturally, the logical solution to the problem is to make a P2P sharing system for scientific papers, right? Exactly. There's apparently already such a system for mainly medical papers, but I think the time is ripe for something more ambitious. Given what's been learned about how to run a good P2P system for music, it should be pretty simple to develop a good system (distributed, searchable, scalable) for sharing PDFs of journal papers, right? I can't wait until the academic publishing industry starts suing researchers for sharing papers...
If you're male, when you use a public restroom, what do you think about for those seconds while your body is busy but your mind is free to wander? Randall Munroe, of xkcd fame, apparently, thinks about the mathematics of restroom awkwardness and minimum awkward-ness packing arrangements for men using urinals. Who knew something so mundane could be so amusing?
Finally, this next bit is already almost a year old, but it's just so good. Remember last year when the media when predictably bonkers over two studies, by Nicholas Christakis and James Fowler, showing that happiness and obesity were (socially) contagious? That is, if you're depressed, you can blame your friends for not cheering you up, and if you're fat, you can blame your friends for making you eat poorly. (Or, wait, maybe it's that misery loves company...?) Shortly after those studies hit the media, a wonderful followup study was published by Cohen-Cole and Fletcher. Their study used the same techniques as Christakis and Fowler and showed that acne, headaches and height are also socially contagious! If only we had the data, I'm sure social network analysis could be show that hair color, IQ and wealth are socially contagious, too. Their concluding thoughts say it all, really:
There is a need for caution when attributing causality to correlations in health outcomes between friends using non-experimental data. Confounding is only one of many empirical challenges to estimating social network effects, but researchers do need to attempt to minimise its impact. Thus, while it will probably not be harmful for policy makers and clinicians to attempt to use social networks to spread the benefits of health interventions and information, the current evidence is not yet strong enough to suggest clear evidence based recommendations. There are many unanswered questions and avenues for future research, including use of more robust empirical methods to assess social network effects, crafting and implementing additional empirical solutions to the many difficulties with this research, and further understanding of how social networks are formed and operate.
E. Cohen-Cole and J. M. Fletcher, "Detecting implausible social network effects in acne, height, and headaches: longitudinal analysis." BMJ 337, a2533 (2008).
March 01, 2009
The future of privacy
Bruce Schneier (wikipedia page) recently wrote a nice essay on the consequences of computer technology on privacy. Here's the opening paragraph
Welcome to the future, where everything about you is saved. A future where your actions are recorded, your movements are tracked, and your conversations are no longer ephemeral. A future brought to you not by some 1984-like dystopia, but by the natural tendencies of computers to produce data.
Schneier hits the issue on the head: increasingly, our actions and statements are not lost in the sands of time, but are recorded, stored, and analyzed for profit and power. Sometimes recording information about yourself is desirable, since it can create a convenient way to remember things you might've forgotten . But right now, it's rarely you who actually stores and controls the data on yourself. Instead, corporations and governments "own" data about you , and use it to advance their own interests.
Ideally, we would each choose how much personal data to divulge and to which party we divulge it based on how much we value the services that use our personal data. For instance, advertising is a modern annoyance that could theoretically be made less annoying if advertisers could be more targeted. That is, part of the reason advertising is annoying is that most of the ads we see are not remotely interesting to us. Enter the miracle of personal data: if only advertisers knew enough about each of our real interests, then they would know which ads weren't interesting to us, and they would show us only ads that were actually interesting! This argument is basically a lie , but it highlights the idea that there should be a tradeoff between our privacy and our convenience, and that we should get to choose which we value more.
My favorite example of this tradeoff is Facebook, a place where people divulge all sorts of private information. Given that Facebook holds a treasure trove of demographic, interest, social data, and ad targets, its an obvious business plan to try to monetize it through advertising. Facebook's efforts to do so, e.g., their Beacon initiative and the recent revision to their Terms of Service, have gotten a strong backlash because people really do care about how their personal data is used, and whether its being used in a way that serves their interests or another's .
Another Facebook example comes from employers monitoring their employees' Facebook pages, and holding them accountable for their private actions (e.g., here and here). This issue exemplifies a deeper problem with the public availability of private data. Schneier mentions in his opening paragraph that it's bad that conversations are often no longer ephemeral. But, what does that really mean? Well, consider what it might be like to try to run for public office (say, Congress) in 2030, having grown up with most of your actions and statements being recorded, by Facebook, by modern advertisers, etc. During the campaign, all those records of the stupid, racy, naive things you did or said when you were young, innocent and didn't know better will come back to haunt you. In the past, you could be assured that most of that stuff was forgotten, and you could grow into a new, better, more mature person by leaving your past behind. If everything is recorded, you can never leave your past behind. Nothing is forgotten, and nothing is truly forgiven.
So, the cost of losing our privacy is not measured simply in terms of how much other people know about our current actions and attitudes, which is a high cost anyway. It's also the cost of defending your past actions and statements (potentially even those of your childhood), and of having those judged by unfriendly or unsympathetic voices. Sometimes I wonder whether blogging now will hurt me in the future, since it would be easy for a future potential employer to trawl my blog for statements that seem controversial or attitudes that they deem undesirable. There used to be a stronger respect for the division between public and private lives, but I think that's been fading for a long time now. Blogs are public forums. Facebook is a semi-public forum . Your workplace is under surveillance by your employer. Your streets are watched by the government (for your protection, naturally). In fact, the only truly private place is your home .
The upside of recording everthing, and a point missed by Schneier, is that it's not easy to use all this data in a coherent and coordinated fashion. Credit card companies know a tremendous amount about each of us from our purchase histories, but they struggle to use that information effectively because they don't have the computational tools to individually understand their customers. Instead, they build aggregate profiles or "segments", and throw out all the other details. Although the computational tools will certainly improve, and there will be startling revelations about how much corporations, governments and our neighbors know about us, I'm not terribly worried about the dystopian future Schneier paints. That is, for most of us, we'll be hiding in plain sight because there will be too much information out there for us to stick out. The real dangers lie in believing that you shouldn't be careful about what you let be recorded, that you can avoid being noticed regardless of what you do or say (aka security through obscurity), or that you can continue hiding once you've been noticed. Privacy is not dead, it's just a lot more complicated than it used to be.
 My favorite feature of Safari is the "Reopen All Windows From Last Session" one, which lets me remember what I was looking at before I rebooted my computer, or before Safari crashed.
 Who owns this data is a critical question that will ultimately require an act of Congress to sort out, I think. (I wonder if copyright law will eventually be applied here, in the sense that I "author" the data about myself and should thus be able to control who is able to profit from it.)
Generally, I come down on the side of personal control over data about yourself, at least for private parties. That is, I should be able to authorize a company to use data about myself, and I should be able to revoke that authority and know that the company will not store or sell information about me to other party. With governments, I think the issue is a little trickier, since I think they have legitimate reasons to know some things about their citizens.
 The marginal cost is so low for showing an ad to someone who's not interested in it that you'd be crazy to expect economics to drive advertisers to show you less of them. Besides, it's hard to say before seeing an ad whether we're actually not interested in it.
 This point makes it clear that businesses have a legitimate path to getting a hold of their customer's personal information, which is to give people something in return for it. Ideally, this would be a customized service that utilizes the personal data to make better recommendations, etc., but sadly it's often a one-time payment like a discount and the data is then sold to advertisers.
 To their credit, Facebook gives its users better control over who can see what aspects of their profile than many past social networking websites.
 And if you live in a city, attached to city services, even your home is not as private as you might think. One of my favorite examples of this comes from testing the raw sewage of a neighborhood for traces of illegal drugs.
June 30, 2008
More familiar than we thought
The nearly 10,000 living species of birds are amazingly diverse, and yet we often think of them as being fundamentally different from the more familiar 4000-odd mammalian species. For instance, bird brains are organized very differently from mammals -- birds lack the neocortex that we humans exhibit so prominently, among other things. The tacit presumption derived from this structural difference has long been that birds should not exhibit some of the neurological behaviors that mammals exhibit. And yet, evidence continues to emerge demonstrating that birds are at least functionally very much like mammals, exhibiting tools use, cultural knowledge , long-term planning behavior, and creativity among other things.
A recent study in the Proceedings of the National Academy of Science (USA) adds another trait: sleeping [1,2], at least among song birds. By hooking up some zebra finches to the machinery usually used to measure the brain activity of sleeping mammals, Philip Low and his colleagues discovered that song-bird brains exhibit the same kind of sleeping-brain activity (slow waves, REM, etc.) normally seen in mammals. The authors avoid the simplistic explanation that the cause of this similarity is due to a shared ancestry, i.e., mammalian-style sleep evolved in the common ancestor of birds and mammals, which would be about 340 million years ago (with the origin of the Amniote class of animals). This hypothesis would imply (1) that all birds should sleep this way (but the current evidence suggests that it's only song-birds that do so), and (2) that other amniotes like lizards would have mammalian-like sleep patterns (which they apparently do not).
So, the similarity must therefore be an example of convergent evolution, i.e., birds and mammals evolved this kind of sleep behavior independently. The authors suggest that this convergence is because there are functionally equivalent regions of mammal and bird brains (a familiar idea for long-time readers of this blog)  and that these necessitate the same kind of sleep behavior. That is, song birds and mammals sleep the same way for the same reason. But, without understanding what mammalian-like sleep behavior is actually for, this could be mere speculation, even though it seems like it's on the right track. Given the other similarities of complex behavior seen in birds and mammals, it's possible that this kind of sleep behavior is fundamental to complex learning behaviors, although there could be other explanations too (e.g., see  below). At the very least, this similarity of behavior in evolutionarily very distant species gives us a new handle into understanding why we, and other species, sleep the way we do.
Update 30 June 2008: The New York Times also has an article in its science section about this phenomenon.
 "Mammalian-like features of sleep structure in zebra finches." P. S. Low, S. S. Shank, T. J. Sejnowski and D. Margoliash. PNAS 105, 9081-9086 (2008).
A suite of complex electroencephalographic patterns of sleep occurs in mammals. In sleeping zebra finches, we observed slow wave sleep (SWS), rapid eye movement (REM) sleep, an intermediate sleep (IS) stage commonly occurring in, but not limited to, transitions between other stages, and high amplitude transients reminiscent of K-complexes. SWS density decreased whereas REM density increased throughout the night, with late-night characterized by substantially more REM than SWS, and relatively long bouts of REM. Birds share many features of sleep in common with mammals, but this collective suite of characteristics had not been known in any one species outside of mammals. We hypothesize that shared, ancestral characteristics of sleep in amniotes evolved under selective pressures common to songbirds and mammals, resulting in convergent characteristics of sleep.
 New Scientist has a popular science piece about the PNAS article.
 Mammals and birds have another important convergent similarity: they are both warm-blooded, but their common ancestor was cold-blooded. Thus, warm-bloodedness had to evolve independently for birds and for mammals, a phenomenon known as polyphyly. One interesting hypothesis is that warm-bloodedness and mammalian-like sleep patterns are linked somehow; if so, then presumably sleeping has something fundamental to do with metabolism, rather than learning as is more popularly thought. Of course, the fact that the similarity in sleeping seems to be constrained to song-birds rather than all birds poses some problems for the metabolism idea.
November 11, 2007
Things to read while the simulator runs; part 7
Using continuous systems for computation, examples being sorting using differential equations. In this latter application, the system of ODEs basically reproduces a bubble-sort style system, in which the sorted list is built up incrementally through comparison of adjacent elements in the list. Although the engineering applications of this stuff might be obvious, there are some interesting scientific questions imbedded in this work, too. For instance, can we use insights into how to analog-ize our typically discrete computation to recognize when Nature is making an effective computation? It's currently quite trendy to think of biological systems as being inherently computational, but our understanding of how living systems compute is mostly metaphorical. (tip to Jake)
I have rather mixed feelings about Malcolm Gladwell's writing. His books (e.g., Tipping Point and Blink) tend to be rather pseudo-scientific, but that hasn't stopped them from becoming virtual textbooks in the business community. On the other hand, his writing for The New Yorker is often excellent. For instance, his latest essay on criminal profiling is a wonderful skewering of the sleight of mind that criminal profilers (perhaps unknowingly) perform on their law enforcement clients.
Planarity is a wonderfully cute game (one that should come standard on every iPhone and iPod Touch!) based around a simple task. You're given a tangled "planar" graph, and your job is to untangle it by moving the graph's vertices around on the page until no two edges cross each other. Like sudoku, this game is based on some nice computational principles. Unlike sudoku, though, planarity is what computer scientists would call "easy", being solvable in polynomial time. (This recent arxiv paper discusses the relevant issues, and shows just how hard it is to untangle the graph quickly.)
In fact, I think there's probably a great business opportunity for taking classic computer science problems (SAT, vertex cover, sparsest cut, etc.) and turning them into entertaining puzzle games.
Finally, Brian Hayes, an intelligent amateur mathematician who frequently writes for the American Scientist magazine, has a pleasantly diverting essay on division and various algorithms for doing it. I remember learning the long-division algorithm in grade school and hating it. The only value I can see of learning long-division is that it teaches us that there's a symmetry in mathematical operations (addition and subtraction, multiplication and division, integration and differentiation, etc.), and helps build intuition into the relation of various numbers to each other. Otherwise, I think it's a complete waste of time (no one, not even careers in mathematics, uses long-division). Hayes, however, points out that while algorithms for addition, subtraction and multiplication are relatively straight forward, a legitimate and interesting question is Why is the algorithm for division so opaque?
(Another of Brian Hayes's essays at American Scientist is on a topic near to my heart: fat tails and power-law things. The byline is great: "Sometimes the average is anything but average.")
September 09, 2007
Losing the night sky
Last week I read an excellent article by David Owen in The New Yorker about light pollution, and the phenomenon of "sky glow". Living in New Mexico, I'm lucky to have relatively dark night skies compared to places on the coasts where light from the big cities consistently washes out most of the stars in the sky. But, the New Mexican sky (pictured below, in a long exposure; picture taken from the article) is not nearly as dark as the sky I remember in central Belize where it was so dark that you could see both man-made satellites passing overhead (these are easy to spot because of their slow but constant movements), as well as the magnificent Milky Way. To think that most views of the night sky were like that is a bit mind boggling.
The article is well written, describing the general phenomenon of light pollution and its effects, the grass-roots efforts by people to try to cut down on light pollution from cities, the effects of poor outdoor lighting design on local ecologies, human safety and power consumption, etc. What most piqued my interest though was the idea that losing our night skies to light pollution (for instance, light from Las Vegas dims the stars in the sky as far away as the Grand Canyon) has led to a loss of philosophical perspective of our place in the universe. From the introduction to the article:
In Galileo’s time, nighttime skies all over the world would have merited the darkest Bortle ranking, Class 1. Today, the sky above New York City is Class 9, at the other extreme of the scale, and American suburban skies are typically Class 5, 6, or 7. The very darkest places in the continental United States today are almost never darker than Class 2, and are increasingly threatened. For someone standing on the North Rim of the Grand Canyon on a moonless night, the brightest feature of the sky is not the Milky Way but the glow of Las Vegas, a hundred and seventy-five miles away. To see skies truly comparable to those which Galileo knew, you would have to travel to such places as the Australian outback and the mountains of Peru. And civilization’s assault on the stars has consequences far beyond its impact on astronomers. Excessive, poorly designed outdoor lighting wastes electricity, imperils human health and safety, disturbs natural habitats, and, increasingly, deprives many of us of a direct relationship with the nighttime sky, which throughout human history has been a powerful source of reflection, inspiration, discovery, and plain old jaw-dropping wonder.
More information at the International Dark-Sky Association (IDA), including information about lighting fixtures that can both reduce light pollution and save energy, lighting laws and dark-sky activities in parks in preserves.
August 13, 2007
Things to read while the simulator runs; part 6
Back from a relaxing but oppressively humid vacation on the South Carolina shore, I've missed a lot of interesting science news. Someone asked me recently if I literally read the things I blog about while I let my simulators run. The answer is... well, sometimes. It's true that I do spend both a lot of time reading and a lot of time simulating things, so naturally there will be a lot of overlap between the two activities. Anyway, here is a list of some interesting stories I missed over the past week.
Nanotubes plus paper make for flexible batteries. (Nature News; also via Ars Technica here)
Slightly helpful mutations in E. coli much more plentiful than thought. (Nature News)
Year-round schools don't boost learning (Science Blog)
X-ray images help explain limits to insect body size (Science Blog)
Baby DVDs may make kids dumb (Science Blog; also via Ars Technica here)
Fat is the new normal (Science Blog)
Homeland Security tests automated "Hostile Intent" detector (Ars Technica)
Big Media losing grip thanks to the Internet and America's political divide (Ars Technica)
The religious state of Islamic science (Salon)
Single amino acid change turns West Nile Virus into a killer (Ars Technica Science)
Watching the heat flow through a molecule (Ars Technica Science)
President Bush signs law boosting science funding (Ars Technica Science)
Speciation and the transcription factor shuffle (Ars Technica Science)
July 20, 2007
Things to read while the simulator runs; part 5
Continuing the ever popular series of things to read while the simulator runs, here's a collection of papers I've either read this month, or have been added to my never-vanishing stack of papers to read.
S. Redner, "Random Multiplicative Processes: An Elementary Tutorial." Am. J. Phys. 58, 267 (1990).
An elementary discussion of the statistical properties of the product of N independent random variables is given. The motivation is to emphasize the essential differences between the asymptotic behavior of the random product and the asymptotic behavior of a sum of random variables -- a random additive process. For this latter process, it is widely appreciated that the asymptotic behavior of the sum and its distribution is provided by the central limit theorem. However, no such universal principle exists for a random multiplicative process. [Ed: Emphasis added.] ...
A. Csikasz-Nagy, D. Battogtokh, K.C. Chen, B. Novak and J.J. Tyson, "Analysis of a generic model of eukaryotic cell-cycle regulation." Biophysical Journal 90, 4361-4379 (2006).
We propose a protein interaction network for the regulation of DNA synthesis and mitosis that emphasizes the universality of the regulatory system among eukaryotic cells. The idiosyncrasies of cell cycle regulation in particular organisms can be attributed, we claim, to specific settings of rate constants in the dynamic network of chemical reactions. The values of these rate constants are determined ultimately by the genetic makeup of an organism. To support these claims, we convert the reaction mechanism into a set of governing kinetic equations and provide parameter values (specific to budding yeast, fission yeast, frog eggs, and mammalian cells) that account for many curious features of cell cycle regulation in these organisms...
E.F. Keller, "Revisiting 'scale-free' networks." BioEssays 27, 1060-1068 (2005).
Recent observations of power-law distributions in the connectivity of complex networks came as a big surprise to researchers steeped in the tradition of random networks. Even more surprising was the discovery that power-law distributions also characterize many biological and social networks. Many attributed a deep significance to this fact, inferring a 'universal architecture' of complex systems. Closer examination, however, challenges the assumptions that (1) such distributions are special and (2) they signify a common architecture, independent of the system's specifics. The real surprise, if any, is that power-law distributions are easy to generate, and by a variety of mechanisms. The architecture that results is not universal, but particular; it is determined by the actual constraints on the system in question.
N. Tishby, F.C. Pereira and W. Bialek, "The information bottleneck method." In Proc. 37th Ann. Allerton Conf. on Comm., Control and Computing, B Hajek & RS Sreenivas, eds, 368-377 (1999).
We define the relevant information in a signal x \in X as being the information that this signal provides about another signal y \in Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize this problem as that of finding a short code for X that preserves the maximum information about Y. That is, we squeeze the information that X provides about Y through a 'bottleneck' formed by a limited set of codewords X-bar. ... Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning...
Update 21 July: Cosma points me to a very nice article related to the information bottleneck method: C.R. Shalizi and J.P. Crutchfield, "Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction." Advances in Complex Systems, 5, 91-95 (2002).
R.E. Schapire, "The strength of weak learnability." Machine Learning 5, 197-227 (1990).
... A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent...
Update 22 July: I should also add the following.
P. W. Anderson, "More is Different." Science 177 393-396 (1972).
The reductionist hypothesis may still be a topic for controversy among philosophers, but among the great majority of active scientists I think it is accepted without question. The workings of our minds and bodies, and of all the animate or inanimate matter of which we have any detailed knowledge, are assumed to be controlled by the same set of fundamental laws, which except under certain extreme conditions we feel we know pretty well. ... The main fallacy in [thinking that the only research of any valuable is on the fundamental laws of nature] is that the reductionist hypothesis does not by any means imply a "constructivist" one: The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe. ... The behavior of large and complex aggregates of elementary particles, it turns out, is not to be understood in terms of a simple extrapolation of the properties of a few particles. Instead, at each level of complexity entirely new properties appear, and the understanding of the new behaviors requires research which I think is as fundamental in its nature as any other. ...
June 16, 2007
My long-time friend Nick Yee recently sat on a CNN panel to discuss the impact and evolution of virtual worlds (online video games). This video is an 8 minute clip from the longer program on the subject. Nick is probably the world's expert on the culture and psychology of online gaming, and particularly for the massively multiplayer online roleplaying games (MMORPGs) like World of Warcraft, City of Heroes, or Second Life. He started his research (much of which appears on his Daedelus Project page) while we were both at Haverford and when online gaming hadn't hit the mainstream the way it has today. If you want to understand what virtual worlds and online gaming are about, his work is the place to go.
Also briefly featured is Philip Rosedale (CEO of Linden Labs, which makes Second Life), whom I recently met at a Santa Fe Institute business network meeting. Nick tells me that the clip is airing worldwide on CNN this week, and that he's working on a book on the subject as well.
Update 16 June: Nick's posted a brief writeup about his appearance.
March 19, 2007
Things to read while the simulator runs; part 4
Having been awakened this morning at an ungodly hour by my allergies, and being incapable of coherent intellectual thought, I spent the quiet hours of the morning trawling for interesting stuff online. Here's a selection of the good bits. (Tip to 3quarksdaily for many of these.)
YouTube and Viacom are finishing their negotiation over Viacom's content's availability on YouTube in the courts. My sympathies certainly lay with YouTube's democratization of content creation and distribution. And, much as my democratic prejudices incline me to distrust monopolistic or imperial authorities, I agree that old media companies like Viacom should have a role in the future of content, since they're rather invested in the media whose creation they helped manage. Mostly, I think these old media empires are afraid that new technologies (like YouTube) will fundamentally change their (disgustingly successful) business model, by shifting the balance of power within the content business. And rightly so.
The physics of the very small still has a few tricks up its sleeve. Actually, I find the discovery of the D-meson state mixing (previously thought impossible on theoretical grounds) rather reassuring. There are clearly several things that are deeply confusing about our current understanding of the universe, and it's nice to be reminded that even boring old particle physics is a little more complicated than the ridiculously successful Standard Model gives it credit for.
(This one is a bit old, but still good, even 5 mouths out.) Last year, during the American leg of his book tour for The God Delusion, the indefatigable Richard Dawkins (whom I often write about, try here, here, here, and here) read a few excerpts to an audience at Randolph Macon Women's College in Lynchburg VA. Immediately afterward, he did an hour of Q&A with the audience, many of whom were from nearby Liberty University. Dawkins handled the frequently banal questions with both wit and aplomb.
Building on this theme, The New Atlantis has a nice article on the topic of the moral role that modern science plays in society.
On similar point, The American Scientist has a piece on the use and misuse of complex quantitative models in public policy, at least in terms of the environment. Being militantly critical of bad science in all forms, I wholeheartedly agree with the basic argument here - that good models must be predictive, accurate, interpretable, and live as close to the empirical evidence as is possible. Since models and theories are basically interchangeable formalisms, let me mangle one of Einstein's more popular quotations: Evidence without theory is lame, theory without evidence is blind.
February 25, 2007
Things to read while the simulator runs; part 3
Who knew that you could fold a single piece of paper into a silverfish, or a chinese dragon? This story is about a physicist who dropped out of research to become a professional oragami folder. I'm still wrapping my head around that picture of the dragon... (via 3 Quarks Daily)
I've long been interested in video games (purely on an academic basis, I assure you), and have often found myself defending them to non-gamers. Sometime soon, I'll blog about a recent trip to a SFI business workshop that focused on how to make various kinds of office work more interesting by borrowing ideas from games, but that's not the point of this entry. Rather, I want to point you to the results of a comprehensive meta-analysis of the connection between violent behavior and violent video games. The conclusion? That current research doesn't support any conclusions. (via Ars Technica)
As someone who spent various numbers of grade-school years in both tracked gifted programs and non-tracked general education, Alexandre Borovik's notes on how mathematically gifted children's brains operate differently from normal children resonated deeply with me. One that's probably quite prevalent is
[A mathematically] able child can show no apparent interest in mathematics whatsoever because he or she is bored by a dull curriculum and uninspiring teacher.
Update Feb 27: Yesterday, I came across an interesting article in the New York Magazine about the problem with praising children for their intelligence rather than their hard work. The research described herein seems both simple and persuasive - praising kids for their intelligence encourages them to not take risks that might make them look unintelligent, while praising kids for their hard work encourages them to apply themselves, even to problems beyond their capabilities. I can't help but wonder if the tendency to praise mathematically talented kids for being smart, rather than hard working, somehow tends to push girls out of the field. And finally, on the subject of hard work, Richard Hamming says that this is partially what distinguishes great researchers from the merely good. End Update.
Finally, here's a great little video. Not sure about the whole "teaching the Machine" thing, but otherwise, spot on and very well done.
January 20, 2007
I've had a wonderful Saturday morning watching various TED Talks. Here's an assorted list of good ones. Most of them are about 18 minutes long.
Richard Dawkins. Normally, I can only handle Dawkins in small doses since I've heard most of his polemics on religion before. But here, he waxes philosophical about the nature of our ability to understand the world around us, and the peculiar biases we have as a result of growing up (as a species) on the savannah.
David Deutsch. Echoing Dawkins' sentiment, Deutsch (a reknown quantum theorist) walks us through his thoughts on why genuine knowledge production - and by this he specifically means our learning how to model the world around us with increasing accuracy - is the thing that sets humans apart from all other pieces of matter.
Sir Ken Robinson. Robinson is apparently known as an expert on creativity, although I'm not sure why. His talk touches on many of the same themes that Dawkins and Deutsch mention, although he focuses more on the importance of cultivating our innate and differing abilities to produce knowledge.
Larry Brilliant. The year after I was born, smallpox was declared eradicated, and the man who helped oversee its eradication was Brilliant. In this talk, he describes that work, but also impresses on us just how devastating a global pandemic would be, both economically, socially and culturally. His messsage: early detection, early response.
Steven Levitt. The author of Freakonomics gives a fascinating account of the economics of gangs during the crack-cocaine era. The best part is the quotations at the end where gang members explain basic economy theory, but translated into the language of hustlers.
Barry Schwartz. I remember Prof. Schwartz from my freshman psychology course - he's a highly entertaining speaker and, apparently, still loves to use New Yorker cartoons to illustrate his points. Here, he talks about how having more choices makes it harder to choose, and less likely that we'll be pleasantly surprised. A nice counter-point to Malcolm Gladwell's talk on the benefits of diversity of choice.
Michael Shermer. When I was a teenager just getting interested in science, I remember being fascinated by Shermer's column in Scientific American where he debunked bad science of all kinds. His talk is like a live version of one of his columns.
Peter Donnelly. On a similar note as Shermer, Donnelly, a statistician from Oxford, gives an excellent talk about just how bad humans are with reasoning through uncertainty - a topic everyone should be better educated about, given how much authority our society places in numbers today.
January 19, 2007
Ken Robinson on education and creativity
I started running across the TED talks a while back, and thought the whole conference seemed like a wonderful idea. I'd completely forgotten about them until I was perusing the PresentationZen blog (which is chock full of useful tips on making presentations better - something I think about regularly since part of my life is spent standing in front of audiences blathering on about things I hope they find interesting), which linked to this one by Sir Ken Robinson on creativity and education. It's quite funny, but also makes some excellent points. (The link goes to a TED page with the same video file.)
Another I quite enjoyed was by Hans Rosling's talk on global health. He's quite a champion of both collecting and visualizing data, and his animations of how the nations of the world have changed (fertility, income, child mortality, etc.) over the past 40 years are truly impressive.
November 25, 2006
Unreasonable effectiveness (part 3)
A little more than twenty years after Hamming's essay, the computer scientist Bernard Chazelle penned an essay on the importance of the algorithm, in which he offers his own perspective on the unreasonable effectiveness of mathematics.
Mathematics shines in domains replete with symmetry, regularity, periodicity -- things often missing in the life and social sciences. Contrast a crystal structure (grist for algebra's mill) with the World Wide Web (cannon fodder for algorithms). No math formula will ever model whole biological organisms, economies, ecologies, or large, live networks.
Perhaps this, in fact, is what Hamming meant by saying that much of physics is logically deducible, that the symmetries, regularities, and periodicities of physical nature constrain it in such strong ways that mathematics alone (and not something more powerful) can accurately capture its structure. But, complex systems like organisms, economies and engineered systems don't have to, and certainly don't seem to, respect those constraints. Yet, even these things exhibit patterns and regularities that we can study.
Clearly, my perspective matches Chazelle's, that algorithms offer a better path toward understanding complexity than the mathematics of physics. Or, to put it another way, that complexity is inherently algorithmic. As an example of this kind of inherent complexity through algorithms, Chazelle cites Craig Reynolds' boids model. Boids is one of the canonical simulations of "artificial life"; in this particular simulation, a trio of simple algorithmic rules produce surprisingly realistic flocking / herding behavior when followed by a group of "autonomous" agents . There are several other surprisingly effective algorithmic models of complex behavior (as I mentioned before, cellular automata are perhaps the most successful), but they all exist in isolation, as models of disconnected phenomenon.
So, I think one of the grand challenges for a science of complexity will be to develop a way to collect the results of these isolated models into a coherent framework. Just as we have powerful tools for working with a wide range of differential-equation models, we need similar tools for working with competitive agent-based models, evolutionary models, etc. That is, we would like to be able to write down the model in an abstract form, and then draw strong, testable conclusions about it, without simulating it. For example, imagine being able to write down Reynolds' three boids rules and deriving the observed flocking behavior before coding them up . To me, that would prove that the algorithm is unreasonably effective at capturing complexity. Until then, it's just a dream.
 This citation is particularly amusing to me considering that most computer scientists seem to be completely unaware of the fields of complex systems and artificial life. This is, perhaps, attributable to computer science's roots in engineering and logic, rather than in studying the natural world.
 It's true that problems of intractability (P vs NP) and undecidability lurk behind these questions, but analogous questions lurk behind much of mathematics (Thank you, Godel). For most practical situations, mathematics has sidestepped these questions. For most practical situations (where here I'm thinking more of modeling the natural world), can we also sidestep them for algorithms?
November 24, 2006
Unreasonable effectiveness (part 2)
In keeping with the theme , twenty years after Wigner's essay on The Unreasonable Effectiveness of Mathematics in the Natural Sciences, Richard Hamming (who has graced this blog previously) wrote a piece by the same name for The American Mathematical Monthly (87 (2), 1980). Hamming takes issue with Wigner's essay, suggesting that the physicist has dodged the central question of why mathematics has been so effective. In Hamming's piece, he offers a few new thoughts on the matter: primarily, he suggests, mathematics has been successful in physics because much of it is logically deducible, and that we often change mathematics (i.e., we change our assumptions or our framework) to fit the reality we wish to describe. His conclusion, however, puts the matter best.
From all of this I am forced to conclude both that mathematics is unreasonably effective and that all of the explanations I have given when added together simply are not enough to explain what I set out to account for. I think that we -- meaning you, mainly -- must continue to try to explain why the logical side of science -- meaning mathematics, mainly -- is the proper tool for exploring the universe as we perceive it at present. I suspect that my explanations are hardly as good as those of the early Greeks, who said for the material side of the question that the nature of the universe is earth, fire, water, and air. The logical side of the nature of the universe requires further exploration.
Hamming, it seems, has dodged the question as well. But, Hamming's point that we have changed mathematics to suit our needs is important. Let's return to the idea that computer science and the algorithm offer a path toward capturing the regularity of complex systems, e.g., social and biological ones. Historically, we've demanded that algorithms yield guarantees on their results, and that they don't take too long to return them. For example, we want to know that our sorting algorithm will actually sort a list of numbers, and that it will do it in the time I allow. Essentially, our formalisms and methods of analysis in computer science have been driven by engineering needs, and our entire field reflects that bias.
But, if we want to use algorithms to accurately model complex systems, it stands to reason that we should orient ourselves toward constraints that are more suitable for the kinds of behaviors those systems exhibit. In mathematics, it's relatively easy to write down an intractable system of equations; similarly, it's easy to write down an algorithm who's behavior is impossible to predict. The trick, it seems, will be to develop simple algorithmic formalisms for modeling complex systems that we can analyze and understand in much the same way that we do for mathematical equations.
I don't believe that one set of formalisms will be suitable for all complex systems, but perhaps biological systems are consistent enough that we could use one set for them, and perhaps another for social systems. For instance, biological systems are all driven by metabolic needs, and by a need to maintain structure in the face of degradation. Similarly, social systems are driven by, at least, competitive forces and asymmetries in knowledge. These are needs that things like sorting algorithms have no concept of.
 A common theme, it seems. What topic wouldn't be complete without its own wikipedia article?
November 23, 2006
Unreasonable effectiveness (part 1)
Einstein apparently once remarked that "The most incomprehensible thing about the universe is that it is comprehensible." In a famous paper in Pure Mathematics (13 (1), 1960), the physicist Eugene Wigner (Nobel in 1963 for atomic theory) discussed "The Unreasonable Effectiveness of Mathematics in the Natural Sciences". The essay is not too long (for an academic piece), but I think this example of the the application of mathematics gives the best taste of what Wigner is trying to point out.
The second example is that of ordinary, elementary quantum mechanics. This originated when Max Born noticed that some rules of computation, given by Heisenberg, were formally identical with the rules of computation with matrices, established a long time before by mathematicians. Born, Jordan, and Heisenberg then proposed to replace by matrices the position and momentum variables of the equations of classical mechanics. They applied the rules of matrix mechanics to a few highly idealized problems and the results were quite satisfactory.
However, there was, at that time, no rational evidence that their matrix mechanics would prove correct under more realistic conditions. Indeed, they say "if the mechanics as here proposed should already be correct in its essential traits." As a matter of fact, the first application of their mechanics to a realistic problem, that of the hydrogen atom, was given several months later, by Pauli. This application gave results in agreement with experience. This was satisfactory but still understandable because Heisenberg's rules of calculation were abstracted from problems which included the old theory of the hydrogen atom.
The miracle occurred only when matrix mechanics, or a mathematically equivalent theory, was applied to problems for which Heisenberg's calculating rules were meaningless. Heisenberg's rules presupposed that the classical equations of motion had solutions with certain periodicity properties; and the equations of motion of the two electrons of the helium atom, or of the even greater number of electrons of heavier atoms, simply do not have these properties, so that Heisenberg's rules cannot be applied to these cases.
Nevertheless, the calculation of the lowest energy level of helium, as carried out a few months ago by Kinoshita at Cornell and by Bazley at the Bureau of Standards, agrees with the experimental data within the accuracy of the observations, which is one part in ten million. Surely in this case we "got something out" of the equations that we did not put in.
As someone (apparently) involved in the construction of "a physics of complex systems", I have to wonder whether mathematics is still unreasonably effective at capturing these kind of inherent patterns in nature. Formally, the kind of mathematics that physics has historically used is equivalent to a memoryless computational machine (if there is some kind of memory, it has to be explicitly encoded into the current state); but, the algorithm is a more general form of computation that can express ideas that are significantly more complex, at least partially because it inherently utilizes history. This suggests to me that a physics of complex systems will be intimately connected to the mechanics of computation itself, and that select tools from computer science may ultimately let us express the structure and behavior of complex, e.g., social and biological, systems more effectively than the mathematics used by physics.
One difficulty in this endeavor, of course, is that the mathematics of physics is already well-developed, while the algorithms of complex systems are not. There have been some wonderful successes with algorithms already, e.g., cellular automata, but it seems to me that there's a significant amount of cultural inertia here, perhaps at least partially because there are so many more physicists than computer scientists working on complex systems.
September 24, 2006
Things to read while the simulator runs; part 2
In my frenetic drive to get a paper revised and resubmitted to a journal, and a parallel project writing a lengthy solo-paper on complex systems methodolodogy, I now have a backlog of interesting tid-bits to share. So, rather than blog about each individually, I'm collecting them together in the second of our multi-part series of things to read while you wait for the simulator to finish.
The Dwight H. Terry Lectures (at Yale) on Science and Religion. These are a series of six videos of lectures by prominent thinkers on the subject. Notables include Dr. Lawrence M. Krauss and Dr. Kenneth R. Miller. Well worth the time, with Ken Miller's talk being highly enjoyable. (tip to Carl Zimmer)
As if this topic is on many people's minds of late, Cosma blogs about Ginsparg's excellent perspective piece in The Journal of Neuroscience entitled As We May Read (which is a riff on Vannevar Bush's 1945 piece in The Atlantic entitled As We May Think, about the obligation of post-war scientists to make more accessible the store of human knowledge). Ginsparg, the creator of the arxiv and a recipient of a MacArthur award in 2002, has a lot to say about the future of academic publishing and the role that journals have in disseminating information. On a related point, hunch.net has some perspective on the development of collaborative research, and, for instance, the impact of Wikipedia on Vannevar Bush's dream.
Bernard Chazelle, professor of Computer Science at Princeton (who has graced this blog before), has penned a new version of his perspective piece on computers, and the significance of the algorithm to modern science. (tip to Suresh)
Many of you may recall Larry H. Summer, former president of Harvard, commenting on the reason that women make up less than a parity of scientists in this country. The National Academy of Sciences has finally weighed in on the subject with a comprehensive report in which they completely dismantle the Summer's claim, showing that women are under-represented because of systemic biases in the institutions of the academy. Corneila Dean writes on this for the New York Times. Bitch Ph.D. writes "I, personally, am expecting the apologies from Larry Summer's apologists to start pouring in any day now." Hear, hear. (tip to Cosmic Variance)
Liberal arts colleges have a special place in the constellation of academic training, although most Americans would be hard-pressed to name any of them. Writing in a 1999 special issue of Daedalus on liberal arts colleges, Thomas R. Cech (Nobel Prize in Chemistry, 1989; current President of the HHMI; graduated from Grinell College in 1970) writes on the disproportionately large number of liberal arts graduates who make up the nation's top scientists, why liberal arts colleges can give a better science education than large universities, and the importance of protecting the contributions these institutions make to science.
August 03, 2006
Things to read while the simulator runs; part 1
To commemorate the creation of a new recurrent topic here, two things caught my attention today:
Cosmic Variance, always an indelible source of cosmological weirdness, has a wonderfully detailed discussion about Boltzmann’s Anthropic Brain. That is, the argument that Boltzmann used, while thinking deeply about the nature of this entropy business and its incessant drumbeat of chaos, to explain the origin of the universe. The discussion begins with a simple questions "Why is the past different from the future, or equivalently, why was the entropy in the early universe so much smaller than it could have been?" and proceeds apace.
The unexpected consequence of Boltzmann’s microscopic definition of entropy is that the Second Law is not iron-clad — it only holds statistically... Faced with the deep puzzle of why the early universe had a low entropy, Boltzmann hit on the bright idea of taking advantage of the statistical nature of the Second Law. Instead of a box of gas, think of the whole universe. Imagine that it is in thermal equilibrium, the state in which the entropy is as large as possible. By construction the entropy can’t possibly increase, but it will tend to fluctuate, every so often diminishing just a bit and then returning to its maximum.
You can see where this is going: maybe our universe is in the midst of a fluctuation away from its typical state of equilibrium. The low entropy of the early universe, in other words, might just be a statistical accident, the kind of thing that happens every now and then. On the diagram, we are imagining that we live either at point A or point B, in the midst of the entropy evolving between a small value and its maximum. It’s worth emphasizing that A and B are utterly indistinguishable. People living in A would call the direction to the left on the diagram “the past,” since that’s the region of lower entropy; people living at B, meanwhile, would call the direction to the right “the past.”
Sean, our tour guide for this picturesque stroll through Boltzmann's mind, proceeds to explain why there is more going on here. Although the argument is flawed for more classical reasons, it also fails more directly because it doesn't account for things we can't hold Boltzmann responsible for not knowing: things like General Relativity, Inflation theory and Quantum Mechanics. It seems that the mystery of the arrow of time (or of time at all!) remains.
And finally, in case you have not been hiding under a rock for the past two years, you may have heard of a game called Sudoku. Many of my non-academic friends have succumbed to the faddishness of them, and I spy my neighbors on the plane scribbling away in books of these things. Even my younger sister is playing them now, on her Nintendo DS - oh, how the tools of procrastination have advanced since my youth, when we were forced to entertain ourselves with poorly wrapped bundles of dead trees. But, I do find Sudoku to be an interesting computational puzzle. That is, how difficult is it to solve these things, in general? The problem is clearly in NP, but not clearly NP-complete.
Lance Fortnow uses its NP membership as a nice little vehicle to explain how you can do a zero-knowledge interactive proof using Sudoku.
Victor has tried and failed to solve the latest Sudoku game and exclaims no solutions exists. His wife Paula has already solved the game. How does Paula convince Victor that a solution exists without giving the solution away?
It turns out that this can be easily done without the usual mapping to graph coloring. The result is intuitively pleasant and, because of Sudoku's popularity, probably highly accessible outside computer science. Not a bad use of something that reminds me of a glorified constraint optimization problem, or of that unpleasant "analytic" section of the GRE. In fact, any "puzzle" that can be solved by manually running a simple backtracking algorithm, or via first-order logic on the initial configuration, seems like a waste of brain power to me. Why not just write a computer program to solve it, and instead spend your time reading something interesting?
Update, Aug. 24: in truth, even problems that admit a more elegant solution can be solved by brute force via backtracking or reduction to satisfiability, but Sudoku doesn't yet admit such elegance, and I've never met anyone who solves them by doing something other than the two (boring) methods I mention. So, in my mind, by extension, that makes Sudoku a boring activity.