March 30, 2012
Visualizing the ocean and the wind
Last week while I was in Germany at the DPG 2012, I noticed some buzz about a cool visualization of wind flows over the United States. The web animation, put together by Fernanda Viégas and Martin Wattenberg (who lead Google's "Big Picture" visualization research group) pulls surface wind speed and direction forecast data directly from the big NOAA's National Digital Forecast Database. While it's not direct measurements, I expect that the National Weather Service forecasts are pretty close to what actually happens. These wind patterns are a mesmerizing example of fluid turbulence.
Another beautiful example of global turbulence comes from a recent global visualization of the world's ocean's surface currents (here's a popular description), put together by NASA using three years of satellite and other data. These empirical measurements were used to parameterize a detailed ocean dynamics simulation called MIT General Circulation Model. The output of the model is a visualization that looks a lot like van Gogh's Starry Night, with whorls and vortices abounding.
One cool thing is the scale of the coherent flows. We've all seen pictures of vortices in fluids, but typically these are somewhere between a few millimeters to a few meters in size. But, like a truly turbulent system, ocean currents exhibit structure at all scales, and that means vortices up to hundreds of miles across, in addition to all the small scale structure we normally think about. These are so big that you wouldn't know you were in a whirlpool because the curvature of the flow is so gentle that it would just feel like a regular current. Other cool things include the vortex shedding around South Africa, which persist well out into the Atlantic Ocean, thousands of miles away.
The wind visualization uses almost real-time model-based forecasts, but the ocean visualization is reconstructed from historical data. It would be especially cool if the latter could also be done in near real time. I can't think of a practical benefit for it (well, maybe container ships or pirates would like to know), but it would be cool.
March 19, 2012
Oops, I tweeted again
After some peer pressure from friends, I've signed up for twitter. This will be a purely professional account, focusing on science and research. If you're into that kind of thing, you can follow me @aaronclauset.
March 15, 2012
Today is a milestone. About a year ago, I blogged about the meteoric rate that my paper with Cosma Shalizi and Mark Newman, on power-law distributions in empirical data, was collecting citations. On that day, the paper had just crossed 500 Google Scholar citations and I used that milestone as an excuse to ask when it might cross the mind-boggling 1000 citations.  Since we know a thing or two about citation counts, I decide to apply a little model-based statistical forecasting to come up with a principled guess.
This produced a probability distribution of answers, with the modal crossing date among all the bootstrap models being 21 April 2012 (the 90% bootstrap confidence intervals were 11 Jan. 2012 to 29 Nov. 2013, but the bulk of the distribution is centered on Spring 2012). And, to my surprise and great amusement, 15 March 2012 was the actual crossing date, only a month off from the prediction. Here's what the forecasts from a year ago looked like, along with the actual citation data overlaid. I've also marked where on the forecast distribution the actual prediction landed.
In the new citation data, we again see strange drops and jumps in the citation count. These are presumably from the Google Scholar team tinkering with their algorithms. In fact, the crossing today was caused by the sudden appearance of 44 new citations in the past 5 days, which is high above the normal accumulation rate. But, this may have been a change to the algorithm that restored the citations misplaced in the large drop that occurred in late 2011, it seems reasonable to treat this as a real event. Either way, the closeness of the true crossing data to the forecasted one is a little eerie.
So, there you have it. A milestone. Huzzah. Perhaps I'll buy a mug to commemorate the event.
 It is worth saying that the popularity of this paper has been both pleasantly surprising, and gratifying, and I am immensely grateful for what great collaborators Cosma and Mark were on the paper.
March 12, 2012
Friends for the win!
Individuals often compete for personal status, for jobs, for mates, and groups of people, whether formalized as an organization or not, often compete for glory, for dominance, for financial rewards. Although the most visible form of human competition today is probably professional sports, competition via computer games is an increasingly common form of entertainment for regular people . And, most of us play these games socially, playing with or against our friends, but sometimes playing with strangers.
Last year, with Winter Mason, I started a project aimed at understanding the dynamics of complex competitions where decisions are made largely in real-time under large uncertainties at both the player and the game level . The idea was to investigate whether there are general patterns in the way humans compete in these environments, how well we could explain those patterns in terms of exogenous effects like player skill versus endogenous effects due to the rules of the game or the game environment, and whether we could build better tools for either predicting the outcome of competitions or for designing better games overall.
This is all very high minded, but the starting point was much more mundane: the rich and detailed data that Bungie made available for their blockbuster MMOFPS  Halo: Reach. It is hard to describe just how Big this data is. When I blogged about this project last summer, Reach players had already produced a staggering 700,000,000 competitions. The number now stands at 958,887,052 (and counting) . Through a web API, Bungie let us download statistics about each and every game of Reach ever played.
These data provided the raw behavioral information about what happens inside the 4-on-4 or 8-on-8, etc. player-versus-player competitions (as well as data on the various player-versus-environment game types). But they didn't tell us much about who was playing. To gather this information, we launched an anonymous web survey in which we asked Reach players to tell us a little about themselves, how they play Halo and who they play it with. The goal was to get real data from real players so that we could understand the role that friendships play in determining success by both the individual and the team in these complex competitive environments.
What we found was cool and surprising in several ways. Friendships, it turns out, are extremely important in shaping not only the performance of individuals, but also their teams. Friendships also shape the way we play the game.
Before diving into the friendships stuff, let's start with some cute results from the survey. First, we had 1191 unique individuals represented in the survey. The distribution of reported ages looks like this:
Unsurprisingly, there's a large population of college-aged players, and the median age was 20 years old . This contrasts with the statistics for MMORPGs, where the plurality of players are in their 30s. In our sample, only about 13% of players reported being 30 or older, so Reach is largely played by younger adults. Another interesting point is that the average number of hours per week spent playing video games of all kinds by our participants was 23.3 (3.3 hours per day). This might seem high to non-gamers, but it is slightly lower than the 25.9 reported for MMORPG players and the 27.5 reported in 2007 by the industry association for all gamers . The point is that our survey participants were not unusually serious gamers .
By looking at the game histories of each of our participants, we did discover several interesting age-related patterns. First, unlike the stereotype described so vividly in Gus Mastrapa's Wired Magazine article "21st-Century Shooters Are No Country for Old Men," older players are, in fact, better at the game (kills per game) than younger players, and this is especially true for the team-oriented players. Here's the figure:
The difference in the number of kills between the age groups is not large, but it is definitely an increase. Also, we define "older players" as being at least 24 years old (the oldest 30% of the population), which may not be what everyone thinks of as being "old". Second, in Reach, it's possible to make an "own goal" by killing a player on your own team. If this happens, it counts as a penalty against the team and may result in the offending player being booted from the competition. What we found is that younger players (at most 17) do this anti-social act much more often than older (18 or older) players:
Age does seem to correlate with the preferred style of playing, with younger players (slightly) favoring the "lone wolf" style. This supports one popular perception about younger players, but it turns out that younger players are not actually better at this role than older players (see the previous figure). That all being said, most players (almost 80%) prefer team-oriented roles. That is, Reach players seem to be strongly motivated by the social aspects of the game.
In fact, players seem to structure their activities within the game around opportunities to play with friends. Using fairly simple heuristics like looking at the length of "runs" of two players playing together, we can fairly easily recover the ground-truth labels on friendship we collected. That is, we asked our survey respondents to tell us who among all the other players they played games with were their friends. Accurately guessing these friendship labels turns out not to be a hard task when you have access to the game history alone.
Given that we can identify the friends, we can now ask whether playing with them changes a person's behavior in the game or changes their success. The answers are yes and yes. Two places we see a strong friendship effect are again with the betrayals, and with "assists," where two players cooperate to score a point.
It turn out that friendship matters a lot and encourages strong pro-social (cooperative) behavior within a competition. As the number of friends on a given team increase, the number of "assists," where two players cooperate to score a point, increases while the number betrayals decreases. And, these are large effects, with the assist rate increasing by almost 50% and the betrayal rate decreasing by almost 25% between a team of all friends and a team of all strangers .
Friendship also has positive effects on both the performance of individual players and the team overall. That is, friends who play together tend to play better together than when they play on their own, even if they play with other good players. This shows up both in the net number of points scored by a player (above and beyond what you'd expect based on skill alone) and the probability of winning, both of which increase with the proportion of friends on the team.
What's important about this "friends for the win" effect is that it appears despite Reach's best efforts to eliminate it. That is, when Reach assembles a new competition from the pool of currently online players, it explicitly tries to balance the teams so that they have equal skill levels. From a game designer perspective, balance is important because a mismatch might lead to a frustrating user experience: a fun competition is a close match. But, the algorithm Reach uses  does not control for the synergistic effect that comes from playing with friends, an effect that we see clearly in the data.
It is not really surprising that teams composed of "friends"  do better than teams composed of strangers. Friends have likely spent considerable time practicing together and thus may be able to effectively anticipate or adapt to each others' actions or strategies without an explicit need for verbal (and thus time consuming) communication or coordination. Friends may be able to more efficiently divide up multi-person tasks by falling into familiar, and pre-determined and practiced, roles. And, these benefits are precisely what sports teams and military units are aiming to reap when they train together. What's nice is that we see these effects appear even in a virtual environment like Halo, suggesting that they may be fairly universal, and not merely limited to the traditional domains like sports and war, where practicing together has a long tradition.
There's more, of course, but these were some results that seemed particularly interesting. If you'd like to read the rest, there's an arxiv version of the paper available .
 The entertainment software association claims that in 2011, 72% of American households play computer or video games. It's not clear exactly what they count as "playing" a game (probably something like "did you do it a non-zero number of times over all of 2011"), but it's certainly a very common form of entertainment today. The report I linked to above is filled with made-for-media factoids and you can absorb the entire 13 page document in about 30 seconds of skimming.
 This contrasts with classic game theory where generally much more of the game structure is known to the players and decision-making is typically not so highly constrained. That being said, there are some interesting extensions of game theory to similar domains.
 World of Warcraft, a social online RPG-style computer game played by millions, is called a massively multiplayer online role playing game, or MMORPG. But Halo: Reach, a social console FPS-style game played by millions, should probably be called a massively multiplayer first person shooter, or MMOFPS.
 To give you a sense of the raw popularity of this game, and how quickly its popularity faded, the first 130,000,000 competitions were generated within the first 2 weeks after the game was released on 14 September 2010. That rate of 10M games per day then gradually declined to about 2M games per day by 6 months later. That is an immense amount of Halo.
 You'll notice the anomalously large spike at 18. This is almost surely due to under-18 year olds misreporting their age in order to bypass the IRB-required parental consent step in the survey. But, the left-tail of the distribution does not look badly distorted and a large number of under-18s did successfully participate despite the extra consent step.
 The MMORPG number is likely fairly accurate. The industry association number may not be.
 But they were certainly unusually skilled at Reach relative to the typical player. This is not surprising given that we advertised the study through Halo community forums, where folks with a serious emotional investment in the game tend to hang out.
 The fact that the betrayal rate does not go to zero suggests that friendship only goes so far toward encouraging purely pro-social behavior.
 It uses the TrueSkill algorithm, which by design assumes that the skill of a team is the sum of the skills of the individual team members.
 As a caveat, it's true that we have not been precise about what exactly we mean by friendship here. We did not tell our survey respondents exactly what "friendship" meant, but instead allowed them to decide for themselves who was and wasn't an "online" or "offline" friend. Respondents did use the distinct labels, so they do mean something. That being said, it is possible, even plausible, that people labeled as "online friends" were, in fact, simply familiar individuals with whom they have practice a great deal, rather than some stronger notion. Or, it could indicate a stronger bond. It's not clear.
 Winter Mason and Aaron Clauset, "Friends FTW! Friendship and competition in Halo: Reach." Preprint, arxiv:1203.2268 (2012).