Information was something guessed at rather than spoken of, something implied in a dozen ways before it was finally tied down. Information was a presence offstage. It was there in the studies of the physiologist Hermann von Helmholtz, who, electrifying frog muscles, first timed the speed of messages in animal nerves just as Thomson was timing the speed of messages in wires. It was there in the work of physicists like Rudolf Clausius and Ludwig Boltzmann, who were pioneering ways to quantify disorder—entropy—little suspecting that information might one day be quantified in the same way. Above all, information was in the networks that descended in part from that first attempt to bridge the Atlantic. In the attack on the practical engineering problems of connecting Points A and B—what is the smallest number of wires we need to string up to handle a day’s load of messages? how do we encrypt a top-secret telephone call?—the properties of information itself, in general, were gradually uncovered.
By the time of Claude Shannon’s childhood, the world’s communications networks were no longer passive wires acting as conduits for electricity, a kind of electron plumbing, as they were in Thomson’s day. They were continent-spanning machines, arguably the most complex machines in existence. Vacuum-tube amplifiers strung along the telephone lines added power to voice signals that would have otherwise attenuated and died out on their thousand-mile journeys. A year before Shannon was born, in fact, Bell and Watson inaugurated the transcontinental phone line by reenacting their first call, this time with Bell in New York and Watson in San Francisco. By the time Shannon was a young wig-wag signaling champion, feedback systems managed the phone network’s amplifiers automatically, holding the voice signals stable and silencing the “howling” or “singing” noises that plagued early phone calls, even as the seasons turned and the weather changed around the sensitive wires that carried them. Each year that Shannon placed a call, he was less likely to speak to a human operator and more likely to have his call placed by machine, by one of the automated switchboards that Bell Labs grandly called a “mechanical brain.” In the process of assembling and refining these sprawling machines, Shannon’s generation of scientists came to understand information in much the same way that an earlier generation of scientists came to understand heat in the process of building steam engines.
It was Shannon who made the final synthesis, who defined the concept of information and effectively solved the problem of noise. It was Shannon who was credited with gathering the threads into a new science. But he had important predecessors at Bell Labs, two engineers who had shaped his thinking since he discovered their work in Ann Arbor, who were the first to consider how information might be put on a scientific footing, and whom Shannon’s landmark paper singled out as pioneers.
One was Harry Nyquist. When he was eighteen, his family left its Swedish farm and joined the wave of Scandinavian immigration to the upper Midwest; he worked construction in Sweden for four years to pay for his share of the passage. Ten years after his arrival, he had a doctorate in physics from Yale and a job as a scientist in the Bell System. A Bell lifer, Nyquist was responsible for one of the first prototype fax machines: he sketched out a proposal for “telephotography” as early as 1918. By 1924 there was a working model: a machine that scanned a photograph, represented the brightness of each chunk with its own level of electrical current, and sent those currents in pulses over the phone lines, where they were retranslated into a photographic negative on the other end, ready for the darkroom. Impressive as the display was, the market had little appetite for it, especially with its seven minutes of transmission time for a single small photo. But Nyquist’s thoughts on a less glamorous technology, the telegraph, were published in the same year. Those insights would prove far more lasting.
By the 1920s, telegraphy was an old technology; it had not been at the leading edge of innovation for decades. The exciting hardware developments were in telephone networks and even, as Nyquist showed, in telephotography—applications that made use of continuous signals, while the telegraph could only speak in dot and dash. Yet the Bell System still operated a massive telegraph network, and money and careers were still riding on the same problems with which Thomson had grappled: how to send signals through that network at a maximum of speed and a minimum of noise.
Engineers already understood, Nyquist recalled, that the electrical signals carrying messages through networks—whether telegraph, -phone, or -photo—fluctuated wildly up and down. Represented on paper, the signals would look like waves: not calmly undulating sine waves, but a chaotic, wind-lashed line seemingly driven without a pattern. Yet there was a pattern. Even the most anarchic fluctuation could be resolved into the sum of a multitude of calm, regular waves, all crashing on top of one another at their own frequencies until they frothed into chaos. (This was the same math, in fact, that revealed tidal fluctuations to be the sum of many simple functions, and so helped make possible the first analog computers.) In this way, communications networks could carry a range, or a “band,” of frequencies. And it seemed that a greater range of frequencies imposed on top of one another, a greater “bandwidth,” was needed to generate the more interesting and complex waves that could carry richer information. To efficiently carry a phone conversation, the Bell network needed frequencies ranging from about 200 to 3,200 hertz, or a bandwidth of 3,000 hertz. Telegraphy required less; television would require 2,000 times more.
Nyquist showed how the bandwidth of any communications channel provided a cap on the amount of “intelligence” that could pass through it at a given speed. But this limit on intelligence meant that distinction between continuous signals (like the message on a phone line) and discrete signals (like dots and dashes or, we might add, 0’s and 1’s) was much less clear-cut than it seemed. A continuous signal still varied smoothly in amplitude, but you could also represent that signal as a series of samples, or discrete time-slices—and within the limit of a given bandwidth, no one would be able to tell the difference. Practically, that result showed Bell Labs how to send telegraph and telephone signals on the same line without interference between the two. More fundamentally, as a professor of electrical engineering wrote, it showed that “the world of technical communications is essentially discrete or ‘digital.’ ”
Nyquist’s most important contribution to the idea of information was buried in the middle of a 1924 paper read into the record of an engineers’ technical conference in Philadelphia. It was only four short paragraphs under the unpromising heading “Theoretical Possibilities Using Codes with Different Numbers of Current Values.” Those four paragraphs were, it turned out, a first crack at explaining the relationship between the physical properties of a channel and the speed with which it could transmit intelligence. It was a step beyond Thomson: intelligence was not electricity.
So what was it? In Nyquist’s words, “by the speed of transmission of intelligence is meant the number of characters, representing different letters, figures, etc., which can be transmitted in a given length of time.” This was much less clear than it might have been—but for the first time, someone was groping toward a meaningful way of treating messages scientifically. Here, then, is Nyquist’s formula for the speed at which a telegraph can send intelligence:
W = k log m
W is the speed of intelligence. m is the number of “current values” that the system can transmit. A current value is a discrete signal that a telegraph system is equipped to send: the number of current values is something like the number of possible letters in an alphabet. If the system can only communicate “on” or “off,” it has two current values; if it can communicate “negative current,” “off,” and “positive current,” it has three; and if it can communicate “strong negative,” “negative,” “off,” “positive,” and “strong positive,” it has five.I Finally, k is the number of current values the system is able to send each second.
In other words, Nyquist showed that the speed at which a telegraph could transmit intelligence depended on two factors: the speed at which it could send signals, and the number of “letters” in its vocabulary. The more “letters” or current values that were possible, the fewer that would actually have to be sent over the wire. As an extreme case, imagine that there were a single ideogram that represented the entire content of this paragraph, and another single ideogram that represented the entire content of the paragraph just above; if that were the case, then we could convey the intelligence in these paragraphs to you hundreds of times faster. That was Nyquist’s surprising result: the larger the number of “letters” a telegraph system could use, the faster it could send a message. Or we can look at it the other way around. The larger the number of possible current values we can choose from, the greater the density of intelligence in each signal, or in each second of communication. In the same way, our hypothetical ideogram could carry as much intelligence as all 1,262 characters in this paragraph—but only because it would have been chosen from a dictionary of millions and millions of ideograms, each somehow representing an entire paragraph of its own.II
Nyquist’s short digression on current values offered the first hint of a connection between intelligence and choice. But it remained just that. Nyquist was more interested in engineering more efficient systems than in speculating about the nature of this intelligence; and, more to the point, he was still expected to produce some measure of practical results. So, after recommending to his colleagues that they build more current values into their telegraph networks, he turned to other work. Nor, after leaving the tantalizing suggestion that all systems of communication resembled the telegraph in their digital nature, did he go on to generalize about communication itself. At the same time, his way of defining intelligence—“different letters, figures, etc.”—remained distressingly vague. Behind the letters and figures there was—what, exactly?
From Intelligence to information: such a change in names tells us little about the math that underlies them. But in this case, the renaming is a useful marker. It is a border—arbitrary, in the way that very many borders are—between the adolescence and the maturity of a new science.
Reading the work of Ralph Hartley, Shannon said, was “an important influence on my life.” Not simply on his research or his studies: Shannon spent much of his life working with the conceptual tools that Hartley built, and for the better part of his life, much of his public identity—“Claude Shannon, Father of Information Theory”—was bound up in having been the one who extended Hartley’s ideas far beyond what Hartley, or anyone, could have imagined. Aside from George Boole, that obscure logician, no one shaped Shannon’s thought more. In the 1939 letter in which Shannon first laid out the study of communications that he would complete nine years later, he used Nyquist’s “intelligence.” By the time the work was finished, he used Hartley’s crisper term: “information.” While an engineer like Shannon would not have needed the reminder, it was Hartley who made meaning’s irrelevance to information clearer than ever.
After his graduation from Oxford as one of the first Rhodes scholars, Hartley was put to work on yet another effort to bridge the Atlantic. He led the Bell System team designing receivers for the first transatlantic voice call, one sent over radio waves, not wires. This time, the hindrance was not physical, but political. By the time the test was ready, in 1915, Europe was at war. The Bell engineers had to beg the French authorities for the use of the continent’s highest radio antenna, which doubled as a key military asset. In the end, the Americans were allowed just minutes of precious time atop that antenna, the Eiffel Tower, but they were enough: Hartley’s receivers were a success, and a human voice sent from Virginia was heard at the top of the tower.
From the beginning, Hartley’s interests in communications networks were more promiscuous than Nyquist’s: he was in search of a single framework that could encompass the information-transmitting power of any medium—a way of comparing telegraph to radio to television on a common scale. And Hartley’s 1927 paper, which brought Nyquist’s work to a higher level of abstraction, came closer to the goal than anyone yet. Suiting that abstraction, the paper Hartley presented to a scientific conference at Lake Como, in Italy, was simply called “Transmission of Information.”
It was an august crowd that had assembled at the foot of the Alps for the conference. In attendance were Niels Bohr and Werner Heisenberg, two founders of quantum physics, and Enrico Fermi, who would go on to build the world’s first nuclear reactor, under the bleacher seats at the University of Chicago’s stadium—and Hartley was at pains to show that the study of information belonged in their company. He began by asking his audience to consider a thought experiment. Imagine a telegraph system with three current values: negative, off, and positive. Instead of allowing a trained operator to select the values with his telegraph key, we hook the key up to a random device, say, “a ball rolling into one of three pockets.” We roll the ball down the ramp, send a random signal, and repeat as many times as we’d like. We’ve sent a message. Is it meaningful?
It depends, Hartley answered, on what we mean by meaning. If the wire was sound and the signal undistorted, we’ve sent a clear and readable set of symbols to our receiver—much clearer, in fact, than a human-generated message over a faulty wire. But however clearly it comes through, the message is also probably gibberish: “The reason for this is that only a limited number of the possible sequences have been assigned meanings,” and a random choice of sequence is far more likely to be outside that limited range. We’ve arbitrarily agreed that the sequence dot dot dot dot, dot, dot dash dot dot, dot dash dot dot, dash dash dash carries meaning, while the sequence dot dot dot dot, dot, dot dash dash dot, dot dash dot dot, dash dash dash carries nonsense.III There’s only meaning where there’s prior agreement about our symbols. And all communication is like this, from waves sent over electrical wires, to the letters agreed upon to symbolize words, to the words agreed upon to symbolize things.
For Hartley, these agreements on the meaning of symbol vocabularies all depend on “psychological factors”—and those were two dirty words. Some symbols were relatively fixed (Morse code, for instance), but the meaning of many others varied with language, personality, mood, tone of voice, time of day, and so much more. There was no precision there. If, following Nyquist, the quantity of information had something to do with choice from a number of symbols, then the first requirement was getting to clarity on the number of symbols, free from the whims of psychology. A science of information would have to make sense of the messages we call gibberish, as well as the messages we call meaningful. So in a crucial passage, Hartley explained how we might begin to think about information not psychologically, but physically: “In estimating the capacity of the physical system to transmit information we should ignore the question of interpretation, make each selection perfectly arbitrary, and base our results on the possibility of the receiver’s distinguishing the result of selecting any one symbol from that of selecting any other.”
In this, Hartley formalized an intuition already wired into the phone company—which was, after all, in the business of transmission, not interpretation. As in the thought experiment of a telegraph controlled by a rolling ball, the only requirements are that the symbols make it through the channel, and that someone at the other end can tell them apart.
The real measure of information is not in the symbols we send—it’s in the symbols we could have sent, but did not. To send a message is to make a selection from a pool of possible symbols, and “at each selection there are eliminated all of the other symbols which might have been chosen.” To choose is to kill off alternatives. We see this most clearly, Hartley observed, in the cases in which messages happen to bear meaning. “For example, in the sentence, ‘Apples are red,’ the first word eliminated other kinds of fruit and all other objects in general. The second directs attention to some property or condition of apples, and the third eliminates other possible colors.” This rolling process of elimination holds true for any message. The information value of a symbol depends on the number of alternatives that were killed off in its choosing. Symbols from large vocabularies bear more information than symbols from small ones. Information measures freedom of choice.
In this way, Hartley’s thoughts on choice were a strong echo of Nyquist’s insight into current values. But what Nyquist demonstrated for telegraphy, Hartley proved true for any form of communication; Nyquist’s ideas turned out to be a subset of Hartley’s. In the bigger picture, for those discrete messages in which symbols are sent one at a time, only three variables controlled the quantity of information: the number k of symbols sent per second, the size s of the set of possible symbols, and the length n of the message. Given these quantities, and calling the amount of information transmitted H, we have:
H = k log sn
If we make random choices from a set of symbols, the number of possible messages increases exponentially as the length of our message grows. For instance, in our 26-letter alphabet there are 676 possible two-letter strings (or 262), but 17,576 three-letter strings (or 263). Hartley, like Nyquist before him, found this inconvenient. A measure of information would be more workable if it increased linearly with each additional symbol, rather than exploding exponentially. In this way, a 20-letter telegram could be said to hold twice as much information as a 10-letter telegram, provided that both messages used the same alphabet. That explains what the logarithm is doing in Hartley’s formula (and Nyquist’s): it’s converting an exponential change into a linear one. For Hartley, this was a matter of “practical engineering value.”IV
Engineering value was indeed what he was after, despite efforts to pin down information that sounded more like those of a philosopher or a linguist. What is the nature of communication? What happens when we send a message? Is there information in a message you can’t even understand? These were powerful questions in their own right. But in all the generations of human communication, those questions were posed with urgency and rigor just then because the answers had suddenly grown exceptionally valuable. In the profusion of undersea cables, transcontinental radio calls, pictures sent by phone line, and moving images passing through the air, our sudden skill at communicating had outstripped our knowledge of communication itself. And whether in disaster—a fried cable—or merely an inconvenience—the flicker and blur of the first televisions—that ignorance exacted its toll.
Hartley came the nearest thus far to the essence of information. More than that, his work reflected the dawning awareness that clarity about information was already extending engineers’ powers. For instance, they could chop up continuous signals, such as the human voice, into digital samples—and with that done, the information content of any message, continuous or discrete, could be held to a single standard. How much information, for instance, is in a picture? We can think of a picture just as we think of a telegraph. In the same way we can break a telegraph into a discrete string of dots and dashes, we can break a picture into a discrete number of squares that Hartley called “elementary areas”: what were later termed picture elements, or pixels. Just as telegraph operators choose from a finite set of symbols, each elementary area is defined by a choice from a finite number of intensities. The larger the set of intensities, and the larger the number of elementary areas, the more information the picture holds. That explains why color images hold more information than images in black and white—the choice made in each pixel comes from a larger vocabulary of symbols.
Squares and intensities: the image might be the Last Supper or a dog’s breakfast, but information is indifferent. In this notion that even a picture can be quantified, there’s an insight into information’s radically utilitarian premises, its almost Faustian exchange. But when we accept those premises, we have the first inklings of a unity behind every message.
And if some humans can achieve indifference to meaning only with great, practically ascetic effort, our machines are wired for this indifference: they have it effortlessly. So a common measure of information might allow us to express the limits of our machines and the content of our human messages in the same equations—how to shape machines and messages to a common fit. A measure for information, for example, helps us uncover the connections between the bandwidth of a medium, and the information in the message, and the time devoted to sending it. As Hartley showed, there is always a trade-off between these three quantities. To send a message faster, we can pay for more bandwidth or simplify the message. If we save on bandwidth, we pay the price in less information or a longer transmission time. This explained why, in the 1920s, sending an image over phone lines took so impractically long: phone lines lacked the bandwidth for something so complicated. Treating information, bandwidth, and time as three precise, swappable quantities could show which ideas for sending messages were “within the realm of physical possibility”—and which shouldn’t even be attempted.
Last, clarity about information might lead to clarity about noise. Noise might be something more precise than the crackle of static or a series of electric pulses lost somewhere under the Atlantic; noise might be measurable, too. Hartley ventured only part of the way toward this goal, but he shed light on a specific kind of distortion he called “intersymbol interference.” If the main criterion for a valid message was that the receiver tells the symbols apart, then an especially worrisome kind of inaccuracy was the type that causes symbols to blur into unreadability, as in the overlap of telegraph pulses sent by an overeager operator. With a measurement of information, we might calculate not only the time required to send any message over a given bandwidth, but the number of symbols we can send each second before they arrive too quickly to be distinguished.
This, then, was roughly where information sat when Claude Shannon picked up the thread. What began in the nineteenth century as an awareness that we might speak to one another more accurately at a distance if we could somehow quantify our messages had—almost—ripened into a new science. Each step was a step into higher abstraction. Information was the electric flow through a wire. Information was a number of characters sent by a telegraph. Information was choice among symbols. At each iteration, the concrete was falling away.
As Shannon chewed all of this over for a decade in his bachelor’s apartment in the West Village or behind his closed door at Bell Labs, it seemed as if the science of information had nearly ground to a halt. Hartley himself was still on the job at Bell Labs, a scientist nearing retirement when Shannon signed on, but too far out of the mainstream for the two to collaborate effectively. The Hartley whom Shannon finally met in person seemed far removed from the Hartley who had captivated him in school. Shannon remembered him as
very bright in some ways, but in some ways he got hung up on things. He was kind of hung up on a theory that Einstein was wrong. That Newtonian classical physics could be rescued, you see. And he was spending all his time trying to explain all the things that relativity explained by changing the picture, just as people did . . . back in the 1920s or so, but the scientific community had finally come around to realizing that Einstein was right. All the scientific community except Hartley I guess.
So from Hartley to Shannon, said Bell Labs’ John Pierce, the science of information “appears to have taken a prolonged and comfortable rest.” Blame Hartley’s relativity fixation, perhaps. Or blame the war—a war that unleashed tremendous applications in plane-tracking robot bombs and digital telephony, in code making and codebreaking and computing, but a war that saw few scientists with the time or incentive to step back and ask what had been learned about communication in general. Or simply blame the fact that the next and decisive step after Hartley could only be found with genius and time. We can say, from our hindsight, that if the step were obvious, it surely wouldn’t have stayed untaken for twenty years. If the step were obvious, it surely wouldn’t have been met with such astonishment.
“It came as a bomb,” said Pierce.
I. Even with three, five, or more current values, such a system is still digital: it still uses discrete steps from one value to another (as on a digital clock), rather than a continuous sweep (as on an analog clock). Digital systems are very often binary (they have only two values, as in Shannon’s discussion of switching circuits), but they don’t have to be.
II. Of course, the impossibility of maintaining and memorizing such a dictionary points to the price of wildly accumulating symbols or current values. What an alphabetic language loses in density and efficiency, it can gain in ease of comprehension. In the same way, there is a point at which the cost of building more current values into a telegraph system outweighs the savings of faster messaging.
III. Decoded from Morse code, the first sequence reads as “hello,” the second as “heplo.” Still, the receiver might recognize “heplo” as a typo or transmission error thanks to the redundancy of our language—an idea that would prove highly useful to Shannon.
IV. It’s entirely fair to design a new measurement with human needs in mind, as long as the measurement is internally consistent. By comparison, there’s no natural reason why a single degree Celsius should cover a wider range of temperature than a single degree Fahrenheit—it’s just that many people find it convenient to think of water’s freezing point as 0° and its boiling point as 100° and define the degrees in between accordingly. Choosing whether to think of information as a quantity that increases exponentially or linearly with message length is a matter of human convenience in the same way, which is why Shannon would describe the logarithmic scale for information as “nearer to our intuitive feelings as to the proper measure.”