NHST: Can Psychology Do Better?

Mark A. Pitt and In Jae Myung

Department of Psychology

The Ohio State University

[Note: This is a commentary submitted on Hagan (1997) "In praise of the null hypothesis statistical test," appeared in American Psychologist, 52, 15-24.]

In his interesting and insightful paper on Null Hypothesis Significance Testing (NHST; e.g., Figure 1), Hagen (1997) argues that NHST is a reasonable and informative method of statistical inference. But is it the best method available to psychologists? In this reply, we suggest that Bayesian Inference (BI), which Hagen discusses on pages 18- 19, offers many attractive advantages over NHST. After briefly describing the basics of the approach, we discuss its appeal as well as misconceptions about its use.

In BI, the hypothesis or parameter under inquiry is considered a random variable rather than a constant but unknown quantity as in NHST. It is expressed quantitatively as the degree of subjective belief (i.e., probability) about the hypothesis. When BI is applied for the first time to a given question, all available information is utilized to form the initial probability (prior). Using Bayes rule (see Hagen, p.18), the prior is combined with data collected in an experiment to yield an updated probability of the hypothesis (posterior). When additional data are collected, this process is repeated except that the current posterior probability serves as a (updated) prior probability. This recursive quality of statistical inference is unique to BI.

Why is BI appealing? To begin with, it is intuitive, providing an estimate of the degree of support for the hypothesis of interest, that is, the probability of the hypothesis (H) given the data (D), p(H|D). Inference in NHST is made indirectly through the inverse of the posterior probability, p(D|H), and furthermore, statistical conclusions pertain only to the null hypothesis (H0) and not to the hypothesis of interest (H1).

The errors that are made in interpreting p-values in NHST (e.g., 1-alpha is evidence for H1; see Cohen, 1994) are informative because they reveal what information researchers often seek, which is what the data say about the probability of their hypothesis, p(H|D). The posterior probability in BI is just this, a quantifiable estimate of the degree of support for the hypothesis.

Another attractive feature of BI is that the iterative application of Bayes rule complements the cumulative nature of science. Data collected in multiple experiments within a single study, or across studies, build upon each other to yield a quantifiable measure of support for a hypothesis (Hagen,1997, p.18 provides an example of this). It is in this respect that BI differs from meta-analysis, which assesses the size and reliability of a result across studies, but not necessarily the support for the hypothesis of interest (H1).

Criticisms of BI frequently focus on the subjectivity involved in determining the priors. During initial applications of Bayes rule, priors may have a disproportionately large and potentially misleading effect on the inference process. Over many experiments, and thus successive applications of Bayes rule, the original prior's contribution drops to a negligible level and data are weighted more and more heavily. The initial prior should be thought of as a best guess assumption that bootstraps the inference process. The data then take over and determine the eventual accuracy (probability) of the hypothesis.

It should be pointed out that setting priors is not a matter of wizardry. Justifiable estimates can be made. For example, noninformative, neutral priors such as a uniform distribution can be used to avoid unfounded biases in favoring one hypothesis over another. Alternatively, an empirical Bayes method can be employed, in which priors are obtained from past or even current data (Casella, 1985). This last method is becoming increasingly popular among biostatisticians (Breslow, 1990).

Although the subjectivity involved in setting priors in BI may explain why it has been so maligned and ignored, it is useful to remember that subjectivity is very much a part of NHST. An over-reliance on the p-value as the key determiner of the validity of an experimental result can lead one to believe otherwise, however. A likely reason why too much weight is often given to whether an analysis is statistically significant rather than to what the summary data actually say is that use of a decision criterion (i.e., the alpha level) makes statistical inference appear objective. Ambiguous data will always be ambiguous; an appeal to a p-value for a decision will not alter the outcome. Objectivity is illusory because there is no objective means by which to justify setting alpha at one value versus another; the subjective opinion of the researcher is necessary. Furthermore, for a given data set, the p-value is also dependent on the intended experimental design (Berger & Berry, 1988). It may be because the alpha level has attained benchmark status that this subjective aspect of NHST tends to be overlooked.

The computationally rigorous nature of BI was a primary drawback of the approach, but this is a thing of the past. Recent breakthroughs in Bayesian computation, such as Markov Chain Monte Carlo methods (e.g., Gelman et al, 1995), have revolutionized the field and make it no longer an excuse to eschew BI. Furthermore, a plethora of Bayesian software is available (Carlin & Louis, 1996), and there is even an introductory, undergraduate-level statistics textbook written entirely from a Bayesian perspective (Berry, 1996).

BI could be used to answer many questions about behavior, but it is probably not ideally suited for addressing all of them. Indeed, it may be a bit naive to think that there is a single statistical inference method that is perfectly suited for addressing all questions about behavior. NHST might be satisfactory for asking some questions (e.g., testing the efficacy of drugs) but less so for others (e.g., testing psychological models).

REFERENCES

     Berger, J. O., & Berry, D. A. (1988). Statistical analysis and the illusion of objectivity. American Scientist, 76, 159-165.

     Berry, D. A. (1996). Statistics: A Bayesian Perspective. Duxbury Press.

     Breslow, N. (1990). Biostatistics and Bayes. Statistical Science, 5, 269-298.

     Carlin, B. P., & Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis (Appendix B: Software Guide). New York, NY: International Thomson Publishing.

     Casella, G. (1985). An introduction to empirical Bayes data analysis. The American Statistician, 39, 83-87.

     Cohen, J. (1994). The earth is round (p < 0.05). American Psychologist, 49, 997-1003.

     Gelman, A., Carlin, J. B., Stern, H. A., & Rubin, D. B. (1995). Bayesian Data Analysis. Chapman & Hall.

     Hagan, R. L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15 - 24.

Please take a moment to tell me your thoughts on this matter.