E. S. Pearson’s reviews of R. A. Fisher’s Statistical Methods for Research Workers
John Aldrich, University of Southampton, UK. (home) August 2003, most recent changes May 2019.
R. A. Fisher’s Statistical Methods for Research Workers (1925) was probably the most influential statistics book of the 20th century. It presented in book form for the first time Fisher’s work on maximum likelihood, t-tests (including applications to regression), the z-transformation of the correlation coefficient, the analysis of variance, randomisation and blocking in the design of experiments, etc. The first edition is available on Christopher Green’s Classics in the History of Psychology website
The 1st edition of the Statistical Methods was widely reviewed. 6 other reviews are also available.
All are worth looking at as they emphasise different facets of the book
Egon Sharpe Pearson (ESP) reviewed the first two editions of Fisher’s book. Fisher replied to the first review and eventually to the second, though only after their common friend, W. S. Gosset (“Student”), tried unsuccessfully to appease him. The reviews and the ensuing published letters are reproduced here with links to Fisher’s book and to one of his papers.
The private letters, which are an important part of the human story, are not included here (see Letters).
Fisher, ESP, KP and Gosset
In 1926 when he reviewed the 1st edition of Fisher’s book Egon Pearson (1895-1980) was a lecturer of only a few years standing in his father’s (Karl Pearson) department at University College London. Fisher and Karl Pearson (KP) were settled into a feud; for details of the feud see here or here. Student (William Sealy Gosset (1876-1937)), who was friends to all concerned, wanted Fisher and the younger Pearson to enjoy a good relationship. In September 1926 he wrote to Fisher (see Letters) to—in effect—recommend Pearson. The question had come up of ESP joining Fisher on a committee of the British Association and Gosset wrote to Fisher as follows:
… He says he will be very glad to go on the committee but asks me to warn you that he is likely to be rather a silent member.
The fact is that he is rather shy but I think you will find him a nice lad and willing to be helpful.
He naturally takes a biased view of the controversy between his Father and yourself but you will doubtless make allowances.
The next move lies with you and I suggest that if possible you should not only get him co-opted fairly soon but see if you can’t meet him on neutral ground. He has met Huxley casually, do you think Huxley could invite you both to tea or something of the kind?
Perhaps a quotation from his letter may be in order—
“I am all against these violent animosities which always seem to hang about biometry and biology too. I don’t think physics or chemistry are ever troubled in the same way. So will you thank him for his idea of getting hold of me and say I am very willing to do anything I can to help.”
Possibly somewhat naïve but natural!
In the event relations between Fisher and ESP were soured by the latter’s review of the second edition of Statistical Methods.
First Edition: The awkwardness of Egon Pearson’s position is clear from his review, which reflects admiration for Fisher’s research, difficulty in understanding it and defensiveness about his father’s work. The two topics mentioned in the review—the number of degrees of freedom and the correlation ratio—were those where Fisher was most critical of KP’s treatment, though of course Fisher’s applications of the notion of degrees of freedom went far beyond the χ2 distribution that KP had introduced. In his reply—ESP’s was the only review to elicit one—Fisher explained his position on the correlation ratio.
Second Edition: This exchange from 1929 is interesting in bringing out the attitudes of Fisher, Pearson and Student to robustness. The question of robustness was not prompted by anything Fisher had added to the 2nd edition—it was there to be asked by anyone who did not think the normal distribution was especially normal. Fisher considered normality quite appropriate and said so in his 1926 article The Arrangement of Field Experiments. When ESP reviewed the 2nd edition he was beginning to investigate the robustness of the test procedures Fisher was advocating. The pioneer here was W. A. Shewhart. Pearson’s (unsigned) review was, in effect, a plea for robustness studies. Lehmann (1999) discusses the issues involved. The story of Fisher’s furious reaction and how Student became involved is told by Pearson in his book ‘Student’ (pp. 95-101). An important part of the story was the correspondence between Student and Fisher—in a little over a month 8 letters passed between them (see Letters from Gosset to Fisher).
Fisher and Egon Pearson were never to have a happy relationship. In 1933 Pearson published his most famous contribution to statistical theory, the joint paper with Jerzy Neyman, “On the Problem of the Most Efficient Tests of Statistical Hypotheses,” Phil. Trans. A, 231,289-337 (See the Earliest Uses entry Hypothesis & hypothesis testing for further information). In the same year Karl Pearson retired and ESP became Head of the new Department of Applied Statistics. Fisher inherited the rest of KP’s empire, becoming Professor of Eugenics in a separate department. The unhappy story of the co-existence of two statistics departments is told in Box’s biography of Fisher and Reid’s biography of Neyman. Neyman had been collaborating with ESP since 1928 and was later a member of his department. See A Guide to R. A. Fisher for an account of Fisher’s relations with Neyman and further references.
In 1935 Fisher published The Design of Experiments. There is an allusion to Pearson’s work in §21, Test of Wider Hypothesis. Fisher writes, “There has, … in recent years, been a tendency for theoretical statisticians, not closely in touch with the requirements of experimental data, to stress the element of normality, in the hypothesis tested, as though it were a serious limitation to the test applied.” As in the 1929 letter reprinted below, Fisher dismissed the danger of non-normality but he also used a variation of the sign test described in Statistical Methods §24, Ex 19 pp. 108-9 to test “the wider hypothesis, which merely asserts that the two series are drawn from the same population, without specifying that this is normally distributed.” In the seventh edition of The Design of Experiments, published in 1960, Fisher added a sub-section §21.1, “Non-parametric” Tests, where he castigated mathematicians “who often discuss experimentation without personal knowledge of the material.”
There is a nice article on ESP by David and a memoir by Bartlett, but the best source for ESP’s relationship with his father, with Student and with Fisher is the book he wrote about ‘Student’ which was assembled after his death by Plackett.
Statistical Methods for Research Workers. By Mr. R. A. Fisher.
Mr. Fisher has undertaken the very difficult task of attempting to put before research workers in biology and agriculture, who are without any special mathematical training, a summary covering a great range of methods and results in the mathematical theory of statistics. The book is chiefly concerned with the best methods of handling small samples, and purposely no general attempt is made to supply proofs of the results quoted, but forty-five examples are worked out fully in the text illustrating different methods of attacking a variety of problems. After conscientiously working through the examples, the student should feel able to apply the methods to exactly similar problems, but it appears a little doubtful whether without a thorough grasp of the underlying principles he could be safely trusted to tackle other problems where the conditions are somewhat altered. This is of course a criticism which applies to all books that attempt to provide a short cut to advanced results by avoiding a thorough grounding in the elements.
From the theoretical point of view it is necessary to read the book in conjunction with the author's papers published elsewhere, and one must confess to some difficulty in following several of the proofs based on the idea of degrees of freedom [chapter IV, chapter V, chapter VII] which appear to rest on arguments from analogy. Again, a long-established method such as the use of the correlation ratio [§45 The “Correlation Ratio” h] is passed over in a few words without adequate description, which is perhaps hardly fair to the student who is given no opportunity of judging its scope for himself.
But if old methods are dismissed somewhat summarily there are several fresh results of considerable interest as well as new tables, and anyone interested in the theory of small samples can hardly pass over Mr. Fisher's contribution to the subject.
E. S. P.
DEAR SIR,—The kindly notice of my book on Statistical Methods for Research Workers, which appears on pp. 733-4 of the April number of SCIENCE PROGRESS, contains one sentence which, if uncorrected, might give rise to some misapprehension, and that on an important point of statistical theory. “E.S.P.” writes:
“Again, a long-established method such as the use of the correlation ratio is passed over in a few words without adequate description, which is perhaps hardly fair to the student who is given no opportunity of judging its scope for himself.”
May I point out that my sin, if I am at fault, is one of commission, not of omission. I warn the student as plainly as I can (p. 219) “As a descriptive statistic the utility of the correlation ratio is extremely limited.” This conclusion (with which I cannot, of course, expect “E.S.P.” to agree) was not formed without laborious examinations of the theory and practice of this “long-established” method, as the result of which I was able to establish (1922) ["The goodness of fit of regression formulae, and the distribution of regression coefficients" p. 602ff] the true distribution of the sampling errors of this statistic, and so to investigate the pitfalls into which eminent biometricians had repeatedly fallen. The three pages given in my book to the correlation ratio and Blakeman’s criterion, are there simply to warn the student against a roundabout process, which has already wasted too much valuable time. I do little more than indicate the main reason for the failure, both of the ratio and of the criterion, namely, that the sampling distributions are not merely modified by, but are wholly dependent on, the number of arrays, and that this number is left entirely out of account in both cases.
Statistical Methods for Research Workers. By Dr. R. A.
WITH the increasing application of statistical methods to new fields of work, the problem of the handling of small samples has become more and more important. It is true that the larger the sample the more trustworthy are the inferences which can be drawn from it, but there are certain problems, whether biological or industrial, in which the time and cost involved in obtaining even a moderately large sample would be quite prohibitive. This need for a development of small sample theory has emphasised the importance of placing the methods of inference on a clearly defined and logical basis. For loose thinking and careless interpretation are both easier and more dangerous when dealing with small than with large samples. The aim of the statistician must be to bring the simplifying assumptions of theoretical analysis into correspondence with the varied and complex situations of practical work
Dr. Fisher sets out in the introduction to this book, of which a second edition has been published recently, what may be termed his statistical philosophy. It may not be perhaps easy to follow at a first reading—perhaps not before his mathematical papers published elsewhere have been read and if necessary interpreted in more familiar terms—but a grasp of the ideas involved is essential to a clear understanding of his methods. These are perhaps, after all, more like those criticised than he will allow, but the line of approach is somewhat different. His aim has been to develop on systematic lines a series of tests appropriate for use in a great variety of problems. This has involved a very considerable extension of theory, based in several cases upon a most elegant use of the geometry of multiple space. These proofs are not, of course, given in the present book, which is primarily intended for biological research workers, but the practical applications of these methods to a variety of problems are given with numerical illustrations, and the necessary probability tables.
To discuss how far the author has achieved this object of putting clearly before the research worker the means of applying statistical tests, would require perhaps a reviewer who is a non-mathematical biologist. There is one criticism, however, which must be made from the statistical point of view. A large number of tests are developed upon the assumption that the population sampled is of ‘normal’ form. That this is the case may be gathered from a very careful reading of the text, but the point is not sufficiently emphasised. It does not appear reasonable to lay stress on the ‘exactness’ of tests, when no means whatever are given of appreciating how rapidly they become inexact as the population samples diverges from normality. That the tests, for example, connected with the analysis of variance are far more dependent on normality than those involving 'Student's' z (or t) distribution is almost certain, but no clear indication of the need for caution in their application is given to the worker. It would seem wiser in the long run, even in a text-book, to admit the incompleteness of the theory in this direction, rather than risk giving the reader the impression hat the solution of all his problems have been achieved. The author's contributions to the development of ‘normal’ theory will stand by themselves, both for their direct practical values and as an important preliminary to the wider extension of theory, without any suggestion of undue completeness.
A last chapter on the principles of statistical estimation has been added to this edition. It provides a good illustration of the application of the ideas contained in the introduction and elsewhere, although perhaps it may prove stiff reading for the biologist.
IN the review of Dr.
That such a misconception should arise is perhaps not unnatural when a mathematician is trying to explain what he has been doing to those who lack the mathematical outlook, but this would presumably not apply to the reviewer; yet by his use of the word “admit” when doubtless he meant “stress” (“It would seem wiser … to admit the incompleteness theory …”) he runs the risk of seeming to support a misstatement which Dr. Fisher may well resent.
The question of the applicability of normal theory to non-normal material is, however, of considerable importance and merits attention both from the mathematician and from those of us whose province it is to apply the results of his labours to practical work. Personally, I have always believed, without perhaps any very definite grounds for this belief, that in point of fact ‘Student’s’ distribution will be found to be very little affected by the sort of small departures from normality which obtain in most biological and experimental work, and recent work on small samples confirms me in this belief. We should all of us, however, be grateful to Dr. Fisher if he would show us elsewhere on theoretical grounds what sort of modification of his tables we require to make when the samples with which we are working are drawn from populations which are neither symmetrical nor mesokurtic.
The Galton Laboratory,
IN NATURE of July 20, “Student” propounds to me the problem of what sort of modification of my tables for the analysis of variance would be required to adapt that process to non-normal distributions. Since he and others evidently feel that a legitimate extension of my methods might be made along these lines, it may be worth while to point out the reasons for which, quite unusually, I disagree with his view.
The theoretical reasons may be made quite clear by ignoring the limits of practical possibility, and supposing that an army of computers had extended the existing tables some two hundred fold, with the view of providing tests of significance for the all the distributions conforming to the Pearsonian system of frequency curves. The system of tests of significance as produced would then he exposed to criticism from three different angles.
(1) Following the lead of the reviewer in NATURE of June 8, p. 486, of my book “Statistical Methods for Research Workers”, it would be said that the new tables still needed ‘correction’ in order to include equally possible forms of distribution outside the Pearsonian system.
(2) A student of “Student” would surely point out that the parameters needed to enter the new tables must be calculated from the data available, and that allowance must be made for their sampling errors, by eliminating the Pearsonian parameters and replacing them by formulae involving statistics only.
(3) A worker along my own lines would suggest that the particular statistics, means and mean squares entering these tests are only efficient for the normal distribution, and that for Pearsonian curves quite other statistics are required, and not merely revised distributions of the familiar statistics appropriate to normal material.
The last two points could be met by dropping the Pearsonian system for one which the moments are appropriate, when the way would lie open for the development of an analysis of the third and higher semi-invariants, bearing the same relation to the first attempt as the analysis of variance bears to some of the earlier calculus of correlations. An expert in cubic and bi-quadratic forms might here open out a new realm of statistical theory, for the application of which adequate data might in time be accumulated.
If the first objection is not ignored, more and more parameters may be introduced, but the patient investigator is still pursued by the analogues of these three criticisms; nor is there any doubt as to the limit of the process. It is not, as at first it may seem, the stultification of all statistical methods, but merely the abandonment of the theory of errors. Beyond all questions of metrical variates there are, largely undeveloped, a system of tests which depend only upon frequency and on order magnitude. Examples exist in “Student’s” writings, and in my own. [Statistical Methods §24, Ex 19 pp. 108-9] They are free from all taint of normality, but are too insensitive to be very useful; still, their development would be of more interest than the programme of research first considered.
On the practical side there is little enough room for anxiety, especially among biologists, who are used to checking the adequacy of their methods by control experiments. The difficulty of obtaining decisive results often flows from the heterogeneity of material, often from causes of bias, often too, from the difficulty of setting up an experiment in such a way as to obtain a valid estimate of error. I have never known difficulty to arise in biological work from imperfect normality of the variation, often though I have examined data for this particular cause of difficulty; nor is there, I believe, any case to the contrary in the literature. This is not to say that the deviation from “Student’s” t-distribution found by Shewhart and Winters, for samples from rectangular and triangular distributions, may not have a real application in some technological work, but rather that such deviations have not been found, and are scarcely to be looked for, in biological research as ordinarily conducted.
Rothamsted Experimental Station,
Harpenden, July 26.
IN a letter to NATURE of Aug. 17,
Turning to the more theoretical aspects of the problem, no one who appreciates the lines along which Dr. Fisher had developed the theory of sampling will deny that as the form of variation deviates more and more from the normal, not only may (a) the frequency constants or “statistics” cease to be distributed in random samples according to the “normal theory” law; but also (b), even if they are still approximately so distributed, they begin to lose efficiency as discriminating criteria. For each form of variation there exists in theory different “ statistics” leading to the most efficient tests of the significance of differences observed on samples. The subject is one of extreme interest, but, as Dr. Fisher writes, this new realm of statistical theory is at present scarcely opened. But whether it were opened or not, I am inclined to think that a fundamental difficulty would still be present. It is this.
In practice the worker in small samples can rarely be certain from the evidence available of the exact form of variation in his population; he must, therefore, use some standard form of analysis, and if he believes from previous experience, that deviations from normality if existing are unlikely to be great, he will naturally use the “normal theory” tests. But, logically, confidence in his results is then only justified if he is certain that deviations from normality of this order will not introduce the difficulties (a) or (b) above.
As a concrete example, suppose that a biologist wishes to compare the eggs of two groups of a species of bird living in different habitats. He has collected and measured an egg from each of some ten or a dozen nests from both groups. The numbers are small, but would be of value at any rate in a preliminary inquiry. May he now compare the means and standard deviations of, say, length and breadth of eggs for the two groups, using Dr. Fisher’s t and z tests? His samples are too small to give him any information on this point, but he may turn to literature containing egg measurements on a large scale and see whether the variation in length and breadth of eggs is generally normally distributed. He would find, for example, in Biometrika:
Length of cuckoo’s egg, 1572 cases. Frequency constants β1 = .0044, β2 = 3.3483
Breadth of common tern’s egg (1914), 1592 cases. Frequency constants β1 = .2618, β2 =3.5315
Breadth of common tern’s egg (1920), 956 cases. Frequency constants β1 = .1624, β2 = 3.9276
(From Biometrika, iv. p. 368, xii. p.348, xv. p. 337)