News Archives

[Colloquium] ConceptDoppler: A Weather Tracker for Internet Censorship

September 7, 2007

Watch Colloquium: 


  • Date: Friday, September 7th, 2007 
  • Time: 1 pm — 2:30 pm 
  • Place: ME 218

Jed Crandall 
Department of Computer Science, UNM

Abstract: Imagine you want to remove the history of the Wounded Knee massacre from the Library of Congress, two ways to do this are: 1) remove “Bury my Heart at Wounded Knee” and a few other selected books; or 2) remove every book in the entire library that contains the word “massacre” in its text. If you view the Internet as one large library, Chinese Internet censorship based on keyword filtering is the equivalent of the latter. In this talk I’ll present results from a paper we recently published about China’s keyword-based Internet censorship mechanism.

We present two sets of results: 1) Internet measurements of keyword filtering by the Great “Firewall of China (GFC); and 2) initial results of using latent semantic analysis as an efficient way to reproduce a blacklist of censored words via probing.

Our Internet measurements suggest that the GFC’s keyword filtering is more a panopticon than a firewall, i.e., it need not block every illicit word, but only enough to promote self-censorship. China’s largest ISP, ChinaNET, performed 83.3% of all filtering of our probes, and 99.1% of all filtering that occurred at the first hop past the Chinese border. Filtering occurred beyond the third hop for 11.8% of our probes, and there were sometimes as many as 13 hops past the border to a filtering router. Approximately 28.3% of the Chinese hosts we sent probes to were reachable along paths that were not filtered at all. While more tests are needed to provide a definitive picture of the GFC’s implementation, our results disprove the notion that GFC keyword filtering is a firewall strictly at the border of China’s Internet.

While evading a firewall a single time defeats its purpose, it would be necessary to evade a panopticon almost every time. Thus, in lieu of evasion, we propose ConceptDoppler, an architecture for maintaining a censorship “weather report about what keywords are filtered over time. Probing with potentially filtered keywords is arduous due to the GFC’s complexity and can be invasive if not done efficiently. Just as an understanding of the mixing of gases preceded effective weather reporting, understanding of the relationship between keywords and concepts is essential for tracking Internet censorship. We show that LSA can effectively pare down a corpus of text and cluster filtered keywords for efficient probing, present 122 keywords we discovered by probing, and underscore the need for tracking and studying censorship blacklists by discovering some surprising blacklisted keywords such as (in Chinese) conversion rate, Mein Kampf, and International geological scientific federation (Beijing).

(Joint work with Daniel Zinn, Michael Byrd, Earl Barr, UC Davis and Rich East)

Bio: Jed received his Ph.D. from the University of California at Davis and his B.S. from Embry-Riddle Aeronautical University in Prescott, Arizona. He is currenly an Assistant Professor with the Department of Computer Science, Univeristy of New Mexico.

Jed’s research area is computer security and privacy, with a background in computer architecture, ranging from architectural support for systems security to the capturing and analyzing of Internet worms. More recent work includes behavior-based analysis of malicious code, including using a new technique called temporal search to detect timebombs within computer viruses based on their use of hardware timers. He also studies the technical issues of government censorship.