Things to read while the simulator runs; part 8
1.
While chatting with Jake Hofman the other day, he pointed me to some analysis by the Facebook Data Team about the way people use online social networks. One issue that seems to come up pretty regularly with Facebook is how many of your "friends" are "real" in some sense (for instance, this came up on a radio show this morning, and my wife routinely teases me for having nearly 400 "friends" on Facebook).
The answer, according to the Facebook Data Team, is that while it depends on how you define "real," with access to the underlying data, you can pretty clearly see how much interaction actually flows across the different links. One neat thing they found (within a lot of interesting analysis) is that the amount of interaction across all your connections scales up with the number of connections you have. That is, the more friends you have, the more friends you interact with. (It can't be a linear relationship, though, since otherwise, people with 1000s of friends would be spending all of their free time on Facebook... oh wait, some people actually do that.)
2.
A related point that I've found myself discussing several times recently with my elders (some of whom I think are, at some level, alienated and befuddled by computer and Web technology), is whether Facebook (or, technology in general) increases social isolation, and thus is leading to some kind of collapse of civil society. I've argued passionately that it's human nature to be social and thus extremely unlikely that technology alone is having this effect, and that technology instead actually facilitates social interactions, allowing people to be even more social overall (even if they may spend slightly less time face-to-face) than before. Mobile phones are my favorite example of social facilitation, since they allow people to interact with their friends in situations when previously they could not (e.g., standing in line at the bank, walking around town, etc.), even if occasionally it leads to ridiculous situations like two people sitting next to each other, but each texting or talking on their phones with people elsewhere.
And, just in time to bolster my arguments, The Pew Internet and American Life Project released a study this week (also discussed in the NYTimes) showing that technology users are more social than non-technology users, and that other, non-technological trends are to blame for the apparent decrease in the size of (non-technology using) Americans' social circles over the past 20 years. Of course, access to and use of technology often correlates with affluence, so what really might be going on is that, like with nutrition, the affluent are better positioned to lead physically and socially healthy lives than the poor.
3.
Recently, for a project on evolution, I've been reading pretty deeply in the paleontology and marine mammal literature (more on that in the next post). The first thing that I noticed is how easy it is now to access vast amounts of scientific literature from the comfort of your office. Occasionally, I had to get up to see Margaret, our librarian, but most of the time I could get what I needed through electronic access. But, sometimes I would encounter a pay wall that my institutional access wouldn't allow me to circumvent.
At first, it was extremely irritating and induced open-access revolutionary spirits in me. Then, I did what I suspect many of you have done, too, which is to ask my friends at other universities to try to get access to the paper using their institutional access, and to send me a copy. On a small scale, this is like asking your friends to share individual musical tracks with you. So, naturally, the logical solution to the problem is to make a P2P sharing system for scientific papers, right? Exactly. There's apparently already such a system for mainly medical papers, but I think the time is ripe for something more ambitious. Given what's been learned about how to run a good P2P system for music, it should be pretty simple to develop a good system (distributed, searchable, scalable) for sharing PDFs of journal papers, right? I can't wait until the academic publishing industry starts suing researchers for sharing papers...
4.
If you're male, when you use a public restroom, what do you think about for those seconds while your body is busy but your mind is free to wander? Randall Munroe, of xkcd fame, apparently, thinks about the mathematics of restroom awkwardness and minimum awkward-ness packing arrangements for men using urinals. Who knew something so mundane could be so amusing?
5.
Finally, this next bit is already almost a year old, but it's just so good. Remember last year when the media when predictably bonkers over two studies, by Nicholas Christakis and James Fowler, showing that happiness and obesity were (socially) contagious? That is, if you're depressed, you can blame your friends for not cheering you up, and if you're fat, you can blame your friends for making you eat poorly. (Or, wait, maybe it's that misery loves company...?) Shortly after those studies hit the media, a wonderful followup study was published by Cohen-Cole and Fletcher. Their study used the same techniques as Christakis and Fowler and showed that acne, headaches and height are also socially contagious! If only we had the data, I'm sure social network analysis could be show that hair color, IQ and wealth are socially contagious, too. Their concluding thoughts say it all, really:
There is a need for caution when attributing causality to correlations in health outcomes between friends using non-experimental data. Confounding is only one of many empirical challenges to estimating social network effects, but researchers do need to attempt to minimise its impact. Thus, while it will probably not be harmful for policy makers and clinicians to attempt to use social networks to spread the benefits of health interventions and information, the current evidence is not yet strong enough to suggest clear evidence based recommendations. There are many unanswered questions and avenues for future research, including use of more robust empirical methods to assess social network effects, crafting and implementing additional empirical solutions to the many difficulties with this research, and further understanding of how social networks are formed and operate.
E. Cohen-Cole and J. M. Fletcher, "Detecting implausible social network effects in acne, height, and headaches: longitudinal analysis." BMJ 337, a2533 (2008).
Posted on November 06, 2009 in Things to Read | permalink | Comments (1)
The trouble with community detection
I'm a little (a month!) late in posting it, but here's a new paper, largely by my summer student Ben Good, about the trouble with community detection algorithms.
The short story is that the popular quality function called "modularity" (invented by Mark Newman and Michelle Girvan) admits serious degeneracies that make it somewhat impractical to use in situations where the network is large or has a non-trivial number of communities (a.k.a. modules). At the end of the paper, we briefly survey some ways to potentially mitigate this problem in practical contexts.
The performance of modularity maximization in practical contexts
Benjamin H. Good, Yves-Alexandre de Montjoye, Aaron Clauset, arxiv:0910.0165 (2009).
Although widely used in practice, the behavior and accuracy of the popular module identification technique called modularity maximization is not well understood. Here, we present a broad and systematic characterization of its performance in practical situations. First, we generalize and clarify the recently identified resolution limit phenomenon. Second, we show that the modularity function Q exhibits extreme degeneracies: that is, the modularity landscape admits an exponential number of distinct high-scoring solutions and does not typically exhibit a clear global maximum. Third, we derive the limiting behavior of the maximum modularity Q_max for infinitely modular networks, showing that it depends strongly on the size of the network and the number of module-like subgraphs it contains. Finally, using three real-world examples of metabolic networks, we show that the degenerate solutions can fundamentally disagree on the composition of even the largest modules. Together, these results significantly extend and clarify our understanding of this popular method. In particular, they explain why so many heuristics perform well in practice at finding high-scoring partitions, why these heuristics can disagree on the composition of the identified modules, and how the estimated value of Q_max should be interpreted. Further, they imply that the output of any modularity maximization procedure should be interpreted cautiously in scientific contexts. We conclude by discussing avenues for mitigating these behaviors, such as combining information from many degenerate solutions or using generative models.
Posted on November 03, 2009 in Networks | permalink | Comments (0)
Happy halloween!
Last year I was in New York City for Halloween. But, this year, I was at home, which meant it was time to carve another pumpkin. This time, I made a starry night:

(This was my first time using power tools to carve a pumpkin, and I have to say, they make it a lot easier and a lot more fun!)
Posted on October 31, 2009 in Self Referential | permalink | Comments (3)
Irony, tinged with truth
During the G-20 protests in Pittsburgh held September 22-25, CMU machine learning students took to the streets to support their causes. "Support vector machines!" and "Ban genetic algorithms!", they demanded. "Bayesians against discrimination!", they cried. And my favorite of all:

Luckily, the news media, in the form of the indomitable John Oliver, were there to cover and support the efforts. (And thus these savvy protesters made it on the Oct. 1 Daily Show for about 3 seconds at the 9m11s mark; blink and you'll miss them.)

Tip to Jake Hofman and Arthur Gretton (whose photos these are).
Posted on October 24, 2009 in Humor | permalink | Comments (1)
This is the life I've chosen
An oldie, but goodie: John Oliver reporting on how academia really works.
| The Daily Show With Jon Stewart | Mon - Thurs 11p / 10c | |||
| Human's Closest Relative | ||||
| ||||
If that's not enough hilarity about chimps vs. orangs, or, if you were really intrigued by the arguments in favor of orangs, read this.
Tip to Jake Hofman.
Posted on October 24, 2009 in Simply Academic | permalink | Comments (0)
National Computer Science Education Week, or: It's About Time
Is it cliche to say "it's about time"?
The ACM, with Microsoft, Google, Intel and some other organizations, managed to persuade the US Congress that Computer Science is a Good Thing(tm) and that it deserves some recognition for driving economic growth (you know, making things like medicine, movies, music, and cars) [1]. To recognize the goodness, Congress passed a resolution (H. RES. 558) to designate the week of December 7 as “National Computer Science Education Week.” [2]
The resolution, H. RES. 558, sponsored by Congressmen Vernon Ehlers (R-MI) and Jared Polis (D-CO) [3], designates the week of December 7 as “National Computer Science Education Week.” Citing the influence of computing technology as a significant contributor to U.S. economic output, the House resolution calls on educators and policymakers to improve computer science learning at all educational levels, and to motivate increased participation in computer science.
“Increasing energy efficiency, advancing healthcare, and improving communication in the digital age are just a few of the national priorities that depend on computer science, which Congress has recognized. Computer science teaches students design, logical reasoning, and problem-solving – all skills that have value well beyond the classroom,” said Rick Rashid, senior vice president of Research for Microsoft.
“Despite serious economic challenges confronting the nation, computer science-related jobs are among the fastest-growing and highest paying over the next decade,” said Alfred Spector, vice president of Research and Special Initiatives at Google, Inc. “These times require an increasing supply of diverse students exposed to rigorous and engaging computing courses at the K-12 level, and National Computer Science Education Week can help to reinforce this effort.”
Good fanfare, and good effort for sure. It's a small gesture really, but I guess it does give organizations like the ACM a hook to hang their public campaigns on. And for sure, education about computers, computer science, and their use (and abuse) in society is something the public could do with some educating on.
Tip to Tanya Berger-Wolf.
-----
[1] Thankfully, they didn't mention that computer science and computers have also produced massive amounts of wasted time, the estimation of which never ceases to amuse me. (If you'd like to estimate it for yourself, try this.)
[2] In a fit of gender-neutrality (something Computer Science is not known for), the date was chosen to honor Grace Hopper, who wrote the first compiler and helped invent the indispensable COBOL, in addition to being a Rear Admiral in the Navy, and having a Naval destroyer named after her.
[3] Incidentally, I was very happy to discover that Mr. Polis represents the 2nd District of Colorado, and he'll be my representative once I move to Boulder next summer.
Posted on October 22, 2009 in Computer Science | permalink | Comments (0)
Machinima meets science geekery
Very poetic.
We are all connected (ft. Sagan, Feynman, deGrasse Tyson & Bill Nye)
Tip to Cris Moore.
Posted on October 21, 2009 in Pleasant Diversions | permalink | Comments (0)
