日志

Is Statistics Hard?

热度 504已有 520 次阅读2010-5-18 12:13 |

Is Statistics Hard?

Gerard E. Dallal, Ph.D.

No! Questions like this invariably lead to self-fulfilling prophecies. Tell yourself statistics is hard, and it's hard. Tell yourself statistics is easy, and it's easy! As with most activities rich enough to demand formal study, there are traps for the unwary that must be avoided. Fall into them at the beginning, and statistics is hard Avoid them from the outset, and you'll wonder what the fuss is all about. The amount of success and the speed with which you'll achieved it depend in large part on how quickly these particular lessons are learned.

Statistics is as much about philosophy as about anything else. There are many ways to analyze data. The two major camps split themselves into frequentists and Bayesians, but there are many flavors of each as well as other camps, such as the decision-theorists and those who favor the likelihood approach. The reason for these different approaches has to do with questions like, "What is the probability that a particular theory is true?" Bayesians will answer that question with a number. Frequentists will say that the question is meaningless--the theory is either true or not. It can be a worthwhile exercise to spend a few minutes (hours, years, professional lifetimes,...) considering what is meant by "the probability that a theory is true". Problems ensue when people forget about what is allowed under each system and try making statements about the probability of something being true while conducting a frequentist analysis.

Statistics is backwards! You might think that given a particular set of data, you are able to say how likely it is that a particulare theory is true. Unfortunately, you would be wrong!

One thing most people (even statisticians!) would like to do is describe how likely a theory or hypothesis might be in light of a particular set of data. This is not possible in the commonly used classical/frequentist approach to statistics, which is the approach taken in these notes. Instead, statistics talks about the probability of observing particular sets of data, assuming a theory holds. We are NOT allowed to say,

"Because of these data, there is only a small probability that this theory is true." Instead, we say things like,

"If this theory is true, the probability of seeing data like these is small."

The first statement is relatively clear. If we could say that based on a particular set of data a theory has a 10% chance of being true, then the theory has a 10% chance of being true. The second statement is murky. If the result is that there is only a 10% chance of seeing data like these if a theory is true, is that small enough to make us doubt the theory? How likely are the data under some other theory? Perhaps there's no theory under which data like these are more likely! This means we need methods for translating this latter type of statement into a declaration that a theory is true or false. As a result...

Statistical methods are convoluted! In order to show an effect exists,

statistics begins by assuming there is no effect.
Prior to collecting data, rules are chosen to decide whether whether the data are consistent with the assumption of no effect.
If the data are found to be inconsistent with the assumption, the assumption must be false and there is, in fact, an effect!

As a result, it is easy to argue that statistics is about...nothing! To show that an effect or difference exists, classical or frequentist statistics begins by asking what would happen if there were no effect...nothing. The analyst compares study data to what is expected when there is nothing. If the data are not typical of what is seen when there is nothing, there must be something! Thus, I've spent my professional life studying nothing, but I know how it behaves in exquisite detail!

Simple? Maybe. Intuitive? Certainly not! Does it have to be done this way? Only because the consensus at the moment is that this is the approach that makes the most sense. Another worthwhile exercise is to spend a few minutes (hours, years, professional lifetimes,...) thinking about how it might be done differently.

Failing to find an effect is different from showing there is no effect! You might think that if you fail to show an effect, you can say that there's no effect. Unfortunately, you would be wrong!

In the convoluted way of showing an effect exists, a statistician uses data to draw up a list of all of the reasonable ways that the observed data could have been generated. If one of these possibilities is "no effect", it is said that the statistical test fails to demonstrate an effect.

If the data could reasonably have been generated without an effect being present, the data fail to demonstrate an effect. That's common sense. An effect is demonstrated only when the possibility of "no effect" has been ruled out.

Arguably the most serious error statistical error committed today is fail to demonstrate an effect (that is, "no effect" can't be ruled out), then claiming that there is no effect whatsoever. These are two different things.

"Failing to show an effect" means just that--"no effect" is among the list of possibilities, which might also include huge effects of great importance. However, because "no effect" has not been ruled out, it cannot be said that an effect has been demonstrated, regardless of what the other possibilities are!
Showing that "there is no effect" means something more than having "no effect" among the possibilities. It also means that all possibilities of practical importance have been ruled out. The only other possibilities are of no practical importance.

A typical misstatement is, "There is no effect," when the analyst should be saying, "The data failed to demonstrate an effect." The distinction might appear to be trivial, but it is critical. If there is no effect, there is no more work to be done. We know something--no effect. The line of inquiry can be abandoned. On the other hand, it is possible to fail to demonstrate an effect without showing that there is no effect. This usually happens with small samples.

This is best illustrated by an example.

Suppose a researcher decides to compare the effectiveness of two diet programs (W and J) over a six-month period and the best she is able to conclude is that, on average, people on diet W might lose anywhere from 15 pounds more to 20 pounds less than those on diet J. The researcher has failed to show a difference between the diets because "no difference" is among the list of possibilities. However, it would be a mistake to say the data show conclusively that there is no difference between the diets. It is still possible that diet W might be much better or much worse than diet J.
Suppose another researcher is able to conclude that, on average, people on diet W might lose anywhere from a pound more to a half pound less than those on diet J. This researcher, too, has failed to show a difference between the diets. However, this researcher is entitled to say there is no difference between the diets because here the difference, whatever it might actually be, is of no practical importance.

This example demonstrates why it is essential that the analyst report all effects that are consistent with the data when no effect has been shown. Only if none of the possibilities are of any practical importance may the analyst claim "no effect" has been demonstrated.

If these hints to the inner workings of statistics can be kept in mind, the rest really is easy!

As with any skill, practice makes perfect. The reason seasoned analysts can easily dismiss a data set that might confound novices is that the experienced analysts have seen it all before...many times! This excerpt from The Learning Curve by Atul Gawande (The New Yorker, January 28, 2002, pp 52-61) speaks directly to the importance of practice.

There have now been many studies of elite performers--concert violinists, chess grandmasters, professional ice-skaters, mathematicians, and so forth--and the biggest difference researchers find between them and lesser performers is the amount of deliberate practice they've accumulated. Indeed, the most important talent may be the talent for practice itself. K.Anders Ericsson, a cognitive psychologist and expert on performance, notes that the most important role that innate factors play may be in a person's willingness to engage in sustained training. He has found, for example, that top performers dislike practicing just as much as others do. (That's why, for example, athletes and musicians usually quit practicing when they retire.) But, more than others, they have the will to keep at it anyway.

I and others are good at what we do because we keep doing it over and over (and over and over until we get it right!). Persevere and you will succeed. For students, this means working every problem and dataset at their disposal. For those who have completed enough coursework to let them work with data, this means analyzing data every time the opportunity presents itself.