The origins of the label "significant" in statistics

greenspun.com : LUSENET : History & Theory of Psychology : One Thread

Does anyone know how it originated that the label "significant" was applied to a statistic with an unlikely probability in its sampling distribution? (i.e., p<.05 for example). If so, why did they call it a "significant" statistic? Why not simply "unlikely"? It seems that much misunderstanding could have been avoided in research settings (through equating significance with "important") had a different term been used.

-- Daniel J. Denis (dand@yorku.ca), April 16, 2002

Answers

You've made the error of forgetting that one word may have several denotations. Significant derives from "sign" (= sain) which refers to "meaning." If you ask "what is the significance of this?" you may be asking "what does it mean?" as well as "what is it's importance?" So when one asks "what is the statistical significance" of a set of numbers, you are asking first of all "what do the numbers mean?" Early articles using the word "significant" appear to use it in the sense of "relevance." Although I don't know the details of the history of significance testing, I suspect that there has been a gradual drift in the direction of the logic you're applying. According to the Oxford Dictionary of Etymology, "significance" as meaning derives from the 15th century, "significance" as importance derives from the 18th century.

You might find an answer in one of the following books:

Statistics in psychology: an historical perspective Cowles, Michael. Mahwah, NJ: L.Erlbaum Associates, 1989,.2001 Statistics on the table: the history of statistical concepts and methods; Stigler, Stephen M. Cambridge, MA: Harvard University Press, 1999

Quantification and psychology: toward a "new" history Graff, Harvey J.; Monaco, Paul. Conference on Quantification in History and Psycho-history (1977) University of Texas at Dallas) Washington: University Press of America, 1980

-- Hendrika Vande Kemp (hendrika@earthlink.net), April 16, 2002.


[Posted for RDT by cdg.]

Good question!

A quick browse shows that Gossett ("Student") used the term in 1923, in an article on agricultural analysis using Fisher's then-new analysis of variance. And Garrett used the term in an off-hand way in his 1926 "Statistics in Psychology and Education." Garret preferred the term "reliability" to mean what we might call significance.

C.S. Myers used the term in pretty much the modern sense (but without specifying much about the minimum difference needed for significance) in his "Text-Book of Experimental Psychology" 1909.

That's the earliest I've found; I'll keep looking.

Karl Pearson is generally credited with developing the first test, chi-square, in 1900, but I don't have the paper handy.

-- Ryan D. Tweney (tweney@bgnet.bgsu.edu), April 16, 2002.


Good question. I think David Salsburg is correct when he comments in his recent book THE LADY TASTING TEA (W.H. Freeman & Co., 2001) that the term 'significance' is used with the "late nineteenth-century English meaning" that the result signifies or shows something. This answer corresponds to Hendrika Vande Kemps' response. Salsburg's book, by the way,is a quick read and full a bits that are quite interesting if your interested in a largely non-technical survey of the history of statistics. Certainly Salsburg's comment fits with how F.Y. Edgeworth used the term in his statistical papers in the 1880s. For example, Stephen M. Stigler in his THE HISTORY OF STATISTICS: THE MEASUREMENT OF UNCERTAINTY BEFORE 1900 (The Belknap Press, 1986) has reprinted the syllabus for Edgeworth's 1885 lectures where in lecture III where he refers to "significant difference". I'll give you a couple of sentences: "Nevertheless, if a considerable set, say a thousand men, is taken at random from that large group, and the mean of the statures of that set is formed, and again the mean of another and another set of a thousand, the means thus formed do (tend to) range under the typical law of error. Accordingly the reasoning based upon that law may be employed to determine whether within that department there is a significant difference of stature between different classes, e.g., artisans and agriculturalists." (p 364, Appendix A). Certainly Pearson and Fisher use the term. There is probably someone who used the term before Edgeworth, but we see that certainly by 1885 it used to tell us that there is "a sign" of a difference between groups. I hope this is helpful. Sincerely, Dale

-- Dale Stout (dstout@ubishops.ca), April 19, 2002.

I just discovered another old history of statistics that might be useful: Helen M. Walker, Studies in the History of Statistical Methods (Baltimore: Williams & Wilkins, 1931).

-- Hendrika Vande Kemp (hendrika@earthlink.net), April 23, 2002.

Moderation questions? read the FAQ