[Note: This is a commentary submitted on Hagan (1997) "In
praise of the null hypothesis statistical test," appeared
in American Psychologist, 52, 15-24.]
In his interesting and insightful paper on Null Hypothesis Significance
Testing (NHST; e.g., Figure 1), Hagen (1997) argues that NHST
is a reasonable and informative method of statistical inference.
But is it the best method available to psychologists? In this
reply, we suggest that Bayesian Inference (BI), which Hagen discusses
on pages 18- 19, offers many attractive advantages over NHST.
After briefly describing the basics of the approach, we discuss
its appeal as well as misconceptions about its use.
In BI, the hypothesis or parameter under inquiry is considered
a random variable rather than a constant but unknown quantity
as in NHST. It is expressed quantitatively as the degree of subjective
belief (i.e., probability) about the hypothesis. When BI is applied
for the first time to a given question, all available information
is utilized to form the initial probability (prior). Using Bayes
rule (see Hagen, p.18), the prior is combined with data collected
in an experiment to yield an updated probability of the hypothesis
(posterior). When additional data are collected, this process
is repeated except that the current posterior probability serves
as a (updated) prior probability. This recursive quality of statistical
inference is unique to BI.
Why is BI appealing? To begin with, it is intuitive, providing
an estimate of the degree of support for the hypothesis of interest,
that is, the probability of the hypothesis (H) given the data
(D), p(H|D). Inference in NHST is made indirectly through the
inverse of the posterior probability, p(D|H), and furthermore,
statistical conclusions pertain only to the null hypothesis (H0)
and not to the hypothesis of interest (H1).
The errors that are made in interpreting p-values in NHST (e.g.,
1-alpha is evidence for H1; see Cohen, 1994) are informative because
they reveal what information researchers often seek, which is
what the data say about the probability of their hypothesis, p(H|D).
The posterior probability in BI is just this, a quantifiable estimate
of the degree of support for the hypothesis.
Another attractive feature of BI is that the iterative application
of Bayes rule complements the cumulative nature of science. Data
collected in multiple experiments within a single study, or across
studies, build upon each other to yield a quantifiable measure
of support for a hypothesis (Hagen,1997, p.18 provides an example
of this). It is in this respect that BI differs from meta-analysis,
which assesses the size and reliability of a result across studies,
but not necessarily the support for the hypothesis of interest
(H1).
Criticisms of BI frequently focus on the subjectivity involved
in determining the priors. During initial applications of Bayes
rule, priors may have a disproportionately large and potentially
misleading effect on the inference process. Over many experiments,
and thus successive applications of Bayes rule, the original prior's
contribution drops to a negligible level and data are weighted
more and more heavily. The initial prior should be thought of
as a best guess assumption that bootstraps the inference process.
The data then take over and determine the eventual accuracy (probability)
of the hypothesis.
It should be pointed out that setting priors is not a matter of
wizardry. Justifiable estimates can be made. For example, noninformative,
neutral priors such as a uniform distribution can be used to avoid
unfounded biases in favoring one hypothesis over another. Alternatively,
an empirical Bayes method can be employed, in which priors are
obtained from past or even current data (Casella, 1985). This
last method is becoming increasingly popular among biostatisticians
(Breslow, 1990).
Although the subjectivity involved in setting priors in BI may
explain why it has been so maligned and ignored, it is useful
to remember that subjectivity is very much a part of NHST. An
over-reliance on the p-value as the key determiner of the validity
of an experimental result can lead one to believe otherwise, however.
A likely reason why too much weight is often given to whether
an analysis is statistically significant rather than to what the
summary data actually say is that use of a decision criterion
(i.e., the alpha level) makes statistical inference appear objective.
Ambiguous data will always be ambiguous; an appeal to a p-value
for a decision will not alter the outcome. Objectivity is illusory
because there is no objective means by which to justify setting
alpha at one value versus another; the subjective opinion of the
researcher is necessary. Furthermore, for a given data set, the
p-value is also dependent on the intended experimental design
(Berger & Berry, 1988). It may be because the alpha level
has attained benchmark status that this subjective aspect of NHST
tends to be overlooked.
The computationally rigorous nature of BI was a primary drawback
of the approach, but this is a thing of the past. Recent breakthroughs
in Bayesian computation, such as Markov Chain Monte Carlo methods
(e.g., Gelman et al, 1995), have revolutionized the field and
make it no longer an excuse to eschew BI. Furthermore, a plethora
of Bayesian software is available (Carlin & Louis, 1996),
and there is even an introductory, undergraduate-level statistics
textbook written entirely from a Bayesian perspective (Berry,
1996).
BI could be used to answer many questions about behavior, but
it is probably not ideally suited for addressing all of them.
Indeed, it may be a bit naive to think that there is a single
statistical inference method that is perfectly suited for addressing
all questions about behavior. NHST might be satisfactory for asking
some questions (e.g., testing the efficacy of drugs) but less
so for others (e.g., testing psychological models).
Berger, J. O., & Berry, D. A. (1988). Statistical analysis
and the illusion of objectivity. American Scientist, 76,
159-165.
Berry, D. A. (1996). Statistics: A Bayesian Perspective. Duxbury Press.
Breslow, N. (1990). Biostatistics and Bayes. Statistical Science, 5, 269-298.
Carlin, B. P., & Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis (Appendix B: Software Guide). New York, NY: International Thomson Publishing.
Casella, G. (1985). An introduction to empirical Bayes data analysis. The American Statistician, 39, 83-87.
Cohen, J. (1994). The earth is round (p < 0.05). American Psychologist, 49, 997-1003.
Gelman, A., Carlin, J. B., Stern, H. A., & Rubin, D. B. (1995). Bayesian Data Analysis. Chapman & Hall.
Hagan, R. L. (1997). In praise of the null hypothesis statistical
test. American Psychologist, 52, 15 - 24.