May 21, 2012
If it disagrees with experiment, it is wrong
The eloquent Feynman on the essential nature of science. And, he nails it exactly: science is a process of making certain types of guesses about the world around us (what we call "theories" or hypotheses), deriving their consequences (what we call "predictions") and then comparing those consequences with experiment (what we call "validation" or "testing").
Although he doesn't elaborate them, the two transitions around the middle step are, I think, quite crucial.
First, how do we derive the consequences of our theories? It depends, in fact, on what kind of theory it is. Classically, these were mathematical equations and deriving the consequences meant deriving mathematical structures from them, in the form of relationships between variables. Today, with the rise of computational science, theories can be algorithmic and stochastic in nature, and this makes the derivation of their consequences trickier. The degree to which we can derive clean consequences from our theory is a measure of how well we have done in specifying our guess, but not a measure of how likely our guess is to be true. If a theory makes no measurable predictions, i.e., if there are no consequences that can be compared with experiment or if there is no experiment that could disagree with the theory, then it is not a scientific theory. Science is the process of learning about the world around us and measuring our mistakes relative to our expectations is how we learn. Thus, a theory that can make no mistakes teaches us nothing. 
Second, how do we compare our predictions with experiment? Here, the answer is clear: using statistics. This part remains true regardless of what kind of theory we have. If the theory predicts that two variables should be related in a certain way, when we measure those variables, we must decide to what degree the data support that relation. This is a subtle point even for experienced scientists because it requires making specific but defensible choices about what constitutes a genuine deviation from the target pattern and how large a deviation is allowed before the theory is discarded (or must be modified) . Choosing an optimistic threshold is what gets many papers into trouble.
For experimental science, designing a better experiment can make it easier to compare predictions with data , although complicated experiments necessarily require sensitive comparisons, i.e., statistics. For observational science (which includes astronomy, paleontology, as well as many social and biological questions), we are often stuck with the data we can get rather than the data we want, and here careful statistics is the only path forward. The difficulty is knowing just how small or large a deviation is allowed by your theory. Here again, Feynman has something to say about what is required to be a good scientist:
I'm talking about a specific, extra type of integrity that is not lying, but bending over backwards to show how you are maybe wrong, that you ought to have when acting as a scientist. And this is our responsibility as scientists...
This is a very high standard to meet. But, that is the point. Most people, scientists included, find it difficult to be proven wrong, and Feynman is advocating the active self-infliction of these psychic wounds. I've heard a number of (sometimes quite eminent) scientists argue that it is not their responsibility to disprove their theories, only to show that their theory is plausible. Other people can repeat the experiments and check if it holds under other conditions. At its extreme, this is a very cynical perspective, because it can take a very long time for the self-corrective nature of science to get around to disproving celebrated but ultimately false ideas.
The problem, I think, is that externalizing the validation step in science, i.e., lowering the bar of what qualifies as a "plausible" claim, assumes that other scientists will actually check the work. But that's not really the case since almost all the glory and priority goes to the proposer not the tester. There's little incentive to close that loop. Further, we teach the next generation of scientists to hold themselves to a lower standard of evidence, and this almost surely limits the forward progress of science. The solution is to strive for that high standard of truth, to torture our pet theories until the false ones confess and we are left with the good ideas .
 Duty obliges me to recommend Mayo's classic book "Error and the Growth of Experimental Knowledge." If you read one book on the role of statistics in science, read this one.
 A good historical example of precisely this problem is Millikan's efforts to measure the charge on an electron. The most insightful analysis I know of is the second chapter of Goodstein's "Fact or Fraud". The key point is that Millikan had a very specific and scientifically grounded notion of what constituted a real deviation from his theory, and he used this notion to guide his data collection efforts. Fundamentally, the "controversy" over his results is about this specific question.
 I imagine Lord Rutherford would not be pleased with his disciples in high energy physics.
 There's a soft middle ground here as to how much should be done by the investigator and how much should be done by others. Fights with referees during the peer-review process often seem to come down to a disagreement about how much and what kind of torture methods should be used before publication. This kind of adversarial relationship with referees is also a problem, I think, as it encourages authors to do what is conventional rather than what is right, and it encourages referees to nitpick or to move the goal posts.