What Is Scientific Exactitude?

April 1931 Issue

I

NOT long ago I heard a Unitarian minister (over the radio!) define religion as a spiritual service to ideal and somewhat intangible values. Having adopted that very intelligent postulate, he built an interesting and quite logical argument upon it. But he offended me in one way, and that was by saying, ‘Of course, I cannot demonstrate the spiritual forces of religion to you with “scientific certitude.” They do not stand up under the rigors of laboratory analysis, but they are none the less real, I assure you.’ Similarly I have often heard workers in the social or the biological sciences say to workers in physical science: ‘Of course, we cannot make statements about things in our domain with the scientific exactitude you can use in yours, but our findings do have value.’

What is this scientific exactitude? Twenty years’ intimate experience with physical science has failed to reveal it to me. I can understand religious certitude and mathematical exactitude, but scientific exactitude completely eludes me. The Unitarian minister definitely knew what he was talking about; he defined the entity quite well, all things considered, and, granting his postulate, his logical structure was very good. He believed in what he said. It had an emotional appeal for him. Ergo it was absolutely true for him, he had certitude, and there was nothing further to be said. In reality his logical structure was a gloss or embellishment and was quite unnecessary, for his hearers must each experience religious truth intuitively, and it had for them an individual significance in each case. The minister’s logic cannot produce that intuitive conviction in anyone because it is not a thing of logic. It is felt to be certainly true, and it possesses an absolutism no scientific proposition ever can possess.

I am aware that ‘mathematical exactitude’ is an expression often abused and that many advanced mathematicians rebuke laymen, saying, ‘You must not say that, because mathematics is no longer exact.’ I disagree. At least I disagree with their definition of mathematics and believe that, when they say that, they have science mixed up with mathematics. Thus, when you say that the square on the hypothenuse of a right-angled triangle is equal to the sum of the squares on the other two sides, you state something which is exact. But when you say, ‘The square on the hypothenuse of this right-angled triangle which I have drawn here on paper is equal to the sum of the squares on the other two sides,’ you are over in a different preserve altogether. You are then dealing with science, and your statement is not exact because no instrument of precision exists sufficiently refined for you absolutely to prove it, while you cannot draw on paper points without dimensions and lines with only one dimension to begin with. You cannot even prove that you are dealing with a right-angled triangle in the first place.

This is obviously the distinction Bertrand Russell had in mind when he wrote: —

Pure mathematics consists entirely of such asseverations as that, if such and such a proposition is true of anything, then such and such another proposition is true of that same thing. It is essential not to discuss whether the first proposition is really true, and not to mention what the anything is of which it is supposed to be true. If our hypothesis is about anything, and not about some one or more particular things, then our deductions constitute mathematics.

Such statements, based, of course, upon a few undefined initial postulates, have the most precise logical exactitude it is possible to attain.

In the laboratory, however, I met little of such exactitude. If in reading a polariscope (or any instrument of precision), or in making my final weighing incidental to a phosphorus determination, I managed to get four or five results which were exactly alike, I at once became suspicious of myself. I felt quite certain that something was ‘wrong.’ Instead of such exactitude I expected to find errors of five per cent not at all unusual, while I should have been a victim of extreme anxiety if most results did not bob back and forth, above and below the ideal mathematical value, to the extent of two or three per cent. To secure what science calls exactitude I averaged these results and took the final average as correct, but, please observe that immediately I did so I had invoked the austere logic of mathematics and had left science and reality altogether.

This also is precisely the attitude of the physicist confronted with a gas composed, he assumes, of minute particles which are in violent straight-line motion of different, velocity per particle. Suppose he wants to compute the position and velocity of a specific molecule. Each molecule is supposed to be a rigid sphere and its motions are assumed to be governed by the ordinary laws of collisions in mechanics which work so well when two automobiles come together. Yet the task is hopeless. For no physicist can hope to know exactly the initial conditions in a complex case like this where the number of entities (particles) is so great. So what does the scientist do — stand awed? Not at all. He falls back on the exactitude of mathematics; he ceases to attend individual molecules and begins to deal with aggregates en masse; he employs the principle of large numbers and concentrates on averages; he invokes statistics. He leaves reality and utilizes the æsthetic exactitude of pure mathematics, thus attaining his type of certitude.

Thirty years ago Rowland said: ‘I do not know what an atom of iron may be, but it must be as complicated a structure as a grand piano.’ To-day the electrons in Bohr’s atom are in rapid motion in their orbits; however, in Shrödinger’s atom — built by another system of logic — the electrons do not move about, but fluctuate in intensity and set up light waves in surrounding space! All of these scientific objects are abstractions or fictions having reality only so far as they are mathematical.

Scientists at Wilson Observatory measure the speeds of three hitherto unclocked nebulae and report them as 3100, 4600, and 4900 miles per second in a direction away from the sun. But astronomers note that the further away a nebula is the faster it moves. Dr. Walter S. Adams suggests that all these measured speeds may be illusory or fictional. Dr. Harlow Shapley remarks that ‘the measured velocity is probably not a measure of actual motion, but more likely a measure of crumpling space.’ Professor Eddington insists that astronomers have merely been observing the slowing down of vibrations due to the fact that the light from these distant objects has traveled part way around the cosmos, and that the nebulæ are probably not receding from us at all. The exactitude of the measurements made may have been considerable, mathematically, but the scientific certitude they offer is almost nil.

II

As I see it, we confront only three kinds of knowledge, whatever sphere of activity we consider. The first is the intuitive knowledge of the mystic, which is based squarely upon feeling, is not called upon to be logical, and cannot be verified empirically. That such knowledge satisfies many people I have no doubt. That they can live successfully thereby is obvious. It is, however, unfortunate that such people will repeatedly resort to logic and attempts at empirical verification when their knowledge is purely intuitive. That causes much confusion, and the great masses of men would be far better off if it could be avoided.

The other two types of knowledge we confront are the mathematical and the scientific. They are based upon postulates just as surely as is intuitive knowledge, and therefore do not differ qualitatively. But they are based upon as few undefined postulates as possible; their other terms are very carefully and precisely defined and they are logically consistent in the systems they produce — for, of course, in mathematics or in science you can have many systems, each logically consistent, each useful to a certain extent, yet each contradictory of the other.

Then how do science and mathematics differ? They differ in that the propositions of the latter are purely hypothetical and cannot be verified in experience, while the propositions of the former are empirical and can be verified in experience. Mathematics is, therefore, more autonomous than science, but it is purely a convention. Both mathematics and science use logic for purposes of their own. In both cases the final propositions are true only if founded logically upon the basic, undefined postulates, and the systems are true only if the postulates are true.

In this classification the simple geometry of surveying comes under science, because it can be verified empirically; the same goes for practical arithmetic. On the other hand, geometric systems like those of Euclid, Riemann, and Lobachevsky are purely logical mathematical systems because they cannot be verified in experience. We have no instruments of sufficiently refined precision, for instance, to prove that the sum of the angles of any specific triangle is equal to two right angles, but there is a strong, logical necessity for our general acceptance of this assumption and we find it useful.

But my interest at the moment is in the fact that all three types of knowledge are more similar than most laymen think, and that science is at best far from being the domain of precise exactitude. According to myth it is stated that the Buddha was born of a virgin. I am not familiar with any attempt on the part of followers of this religious leader to prove the fact logically. My impression is that they realize this is mystic or intuitive knowledge and that logic and empiricism have nothing whatever to do with its validity. But I believe it is quite generally known that people in much more ‘advanced’ countries than those whose inhabitants worship Buddha attempt the impossible feat of establishing by ‘logical proofs’ the fact that another religious leader was born of a virgin.

The people who could be convinced by logical proofs would not be mystically religious in the first place, and, if convinced by logical proof, would be believing something altogether different from what mystically religious people believe anyway. The validity of their religion rests, for them, upon something quite different from logical proof. It has, indeed, certitude and a finality which science might well envy, but it should not be subjected to the processes either of logic or of empiricism.

On the other hand, you can make a very great many exact observations without being able to reach a logically valid conclusion. As Poincaré put it, you may measure every bit of wood on the ship in the most meticulous fashion without ever arriving at data from which you can calculate the age of the captain. In the same way, while we have a vast store of physical and astronomical data, it is very doubtful whether any of our theories about the formation of the universe have much greater validity than myths, for the simple reason that we are now and seem destined to remain ignorant of the initial condition of the matter composing this universe. In this case we have something quite like a mystical intuition, even though it be presumably arrived at after the correlation of many careful observations. These two cases offer an excellent contrast between science and religion.

III

How indeed does science reach its conclusions anyway, and how exact are they?

This question brings us back to postulates which all scientists, like mathematicians or religious people, must make, and to the actual manner in which data are collected and conclusions or generalizations are reached. I shall take a simple example for purposes of explanation. I shall stick to biology — in fact, to a nutrition investigator testing a food for its vitamin content by the use of rats in the usual manner. How does he use logic and mathematics in his work? Just what exactitude have his final generalizations? Can he be entirely impersonal? What safeguards might he use in evaluating his results objectively?

Since the scientist does not live forever, and therefore cannot observe everything, he must be selective. He must first select the problems with which he cares to deal. The so-called ‘pure’ scientist selects those problems which are, or appear to be, the most important links in the chain of scientific logic. This has to be done whether immediate applications impend or not, otherwise there can be no chain upon which to depend. Other scientists make practical applications of scientific principles to immediately useful ends. But even the pure scientist must not only select his problem; he must select the results he intends to regard as significant, and, in order to do that, must be as free from emotionally disturbing factors as it is possible to have him.

Suppose, to take a simple instance, a nutrition scientist wants to test a food for its content of the growth-promoting vitamin A. He uses rats. These are the progeny of brother and sister matings and have been bred with extreme care; t hey are as nearly like one another as it is possible to get living organisms. Since the rat reacts to the absence of vitamin A very much as a human being does, and since a human being is a complex organism with which to experiment, the choice of carefully bred rats is judicious. The rats are next depleted of their entire store of vitamin A, for rats, like humans, can store small amounts of this vitamin. When their store is gone the young rats show symptoms of an easily recognized eye disease, and young rats are used, of course, because growth is the function regulated by vitamin A. The investigator now usually decides to divide his rats into six groups, and each group is to be fed a different amount of the food containing vitamin A — say from one to six grams thereof per rat per day for a period of eight weeks, no other source of this vitamin being fed.

When the results are examined it is quite evident that one gram of the food does not contain enough vitamin A to sustain the animals, because all ten of them die. This is a clear-cut result. On two grams, however, four of the group live and four die, and the individuals lose or gain in weight the following number of grams in the eight weeks, or until their death: —36, —25, -15, -16, -8, +2, -44, -12, of which the average is -19. It will be seen that some of the animals lost vastly more weight than others, and that one actually gained in weight. Also half died and half lived. Yet all were treated alike. Finally, how much does the average loss of 19 grams for the group actually mean with divergences of from —44 to +2 involved? Nevertheless these results are typical of those regularly reported by scientific investigators who use such averages.

On the three-gram ration, ten rats live and five die, and the losses or gains in weight are as follows: —1, +27, + 14, +17, +9, +15, —14, +15, —28, +21,+ 1, +6, +26, —5, +2, oran average of + 7 for the group. Does this average mean anything with divergences within the group of from a weight loss of 28 to a weight gain of 27 grams during the test period? Again it will be seen that one third of the rats died and two thirds lived. But such results (and these have merely been formulated hypothetically by way of illustration) are typical, and their averages are commonly reported. Why could not investigators cull out large negative figures like —14 and —28 and average the others, especially if some manufacturer who was paying for the investigation very much wanted his special food to appear especially high in vitamin A ? Suppose by mere chance this hypothetical investigator had decided to use only eight rats in this case and that he had picked the individuals giving gains of +27, +14, +17, +15, +9, +15, +21, and +26 — see how potent the food tested would have appeared in vitamin A!

At this point I should remind you that such results arc usually expressed by a scientist in the form of a graph or a curve. He does not pretend to know all of the points on that curve, because they are infinite in number. He does ascertain a certain finite number of points and then draws his curve. He does not draw what we might call a ‘natural’ curve which zigzags in every direction, because he assumes that no such complex curve would express any finding in physical or natural science. That, please observe, is a metaphysical assumption of a postulated simplicity at the bottom of things. In order, therefore, to get a ‘smooth’ curve, the scientist usually omits points which Stand too far out of line and assumes that he has made a sensible accidental error in that case. Points very near the line are averaged, so to speak, to make a smooth curve, and the assumption is made that the errors involved were unavoidably inherent in the method used and that they will cancel each other.

If the introduction of a typical curve here can be pardoned, — and I confess that I do this with trepidation, for I am quite as timid about mathematics in its more intricately ornate forms as any of my readers, — I believe I can show what I mean still more dramatically. As a matter of fact, I shall be quite bold while I am about it and introduce three curves. These curves happen to have been drawn theoretically by a scientist interested in statistical methods as applied to forestry. But the exposition I am giving applies to such curves generally, whether they represent the actions of rats or stars or trees.

Curve A, it is quite apparent, is a highly fictional piece of business; while it does express the general drift of things in a very superficial way, it is marked by so little exactitude that not a single individual observation is touched by it. The particular bits of data in curse B, however, all hover much more closely to the line, and we may say that the curve therefore expresses the general state of things much more exactly than curve A. It is disconcerting to reflect, however, that though the data differ so markedly, the curves themselves are exactly alike.

But let us turn to curve C. This is based on averages. The two crosses in the second block in A and B, the three in the third, and so on, have been averaged, and, whichever of these two you use, you get curve C as a result. We now have the thing demonstrated in two ways. Not only will individual data as diverse as those in curves A and B give, when plotted, a curve that is identical and runs from 5 and 14 to 25 and 100 in exactly the same sweep, but if a third curve is plotted, using the averaged data from either A or B, this new curve, C, runs in the same identical way from 5 and 14 to 25 and 100. It is therefore apparent that the artistic ability to plot an æsthetically gratifying curve, which satisfies the eye and the passion for regularity and continuity, does not add the value of exactitude to the individual observations originally made or transmute inexactitude into precision. The final curve is beautifully exact, but it is mathematical, not scientific.

IV

Returning now to our hypothetical nutrition investigator: when he fed four grams of the food to the rats, let us suppose that he used eighteen animals of which two died and sixteen lived. The individual gains or losses in weight in grams during the eight weeks, or until death, were, we may assume: -14, +4, -7, +21, +12, +17, +20, -18, +17, +30, +20,+21, +18, -11, +23, +13, +22, +10, giving an average for the group of +11 grams. Then eight rats were fed five grams daily, all lived, the average gain in weight for the group was 22 grams, and individual gains were 25, 16, 29, 17, 19, 12, 31, and 27 grams. At a sixgram weight all five animals lived, the average gain for the group was 39 grams, and the individuals gained 26, 45, 43, 36, and 45 grams respectively. These detailed figures are introduced because they are typical of actual data often reported in scientific investigations of this character, and because by comparison it is very easy to see that the selective function of the scientist could do a great deal to make a food appear to contain an excessive amount of vitamin A when it was really not so rich in that substance after all.

This could be done with a show of great honesty, too, and high talk of exactitude and certitude. For did not certain rats thrive and gain 26 or 27 grams when fed only three grams of the substance for the eight-week test period? Surely they did. But just as surely two individuals lost 18 and 11 grams in weight, and finally died when fed four grams of the food daily throughout the test period. I think this should dispose of the notion that the scientist is a mere dispassionate observer, studying purely objective facts with instruments of precision and recording his results willy-nilly. Instead of that he is compelled to exercise judgment and to be the selecting agency determining first the problem to be studied, and secondly the data to be admitted as significant. These functions it is apparent he cannot exercise as he should when depending upon fees from commercial firms, or when he is animated by strong sentimental loyalties to individuals, or prey to powerful personal prejudices.

In the theoretical investigation we have just so briefly examined there were certain deficiencies. In the first place, such results do not have real value unless at least 100 and perhaps 200 or 250 rats were used in the crucial experiments — that is, for the feedings at the threeand four-gram rates. Consequently the averages given — and just such averages are given repeatedly as exact — are statistically absurd, because there is entirely too great a deviation among the individuals for any average to be made justifiably. There are investigators who are far more critical of their results and who are slow to claim positive certitude. One comes to mind — this is an actual case — whose work may be examined to show how a more competent scientist works.

This scientist wanted to find out whether feeding iron citrate would cure anæmia in suckling pigs. To do so he ran a so-called paired-feeding experiment. In such an experiment a group of twenty suckling pigs is selected, and individuals are then paired which are just as much alike as it is possible for two pigs to be. The animals got a basic ration which contained only a minimum amount of iron. Then one little pig of each pair was fed three grams of ferric citrate daily and the other little pig got none. The results, among other things, appeared to show that giving the iron depressed the growth rate, or the tendency to gain in weight, of the pigs. Many an investigator would have let things go at that and so reported. This one did not.

He found that, in the first place, he had 143 pair-weeks as a basis of comparison. Hence, if the iron citrate did not affect the growth rate, 71.5 pairweeks would favor the check pigs and 71.5 the test pigs. Upon examining the weekly gains in weight of the animals, he found that 79.5 actually favored the check and 63.5 the test pigs, meaning that there was a deviation of 8 from the ideal result, and, on the face of it, it certainly appeared that the pigs which got no iron gained weight faster. But the scientist next calculated the probabilities, and the socalled ‘standard deviation of the frequency distribution’ of the outcome in a group of 143 events came to 6. In plain language, this meant that the deviation of 8 could easily occur in one of any six trials by pure chance; that the apparent result was not significant and could safely be regarded as negligible. In short, and without getting mixed up in a great deal of detail which would be useless to us here, there are mathematical means of finding out whether a result in such an experiment is significant or whether it could just as likely have been achieved by pure chance.

This same investigator later tested the iron content of the blood of seven pairs of the pigs, and found that in each of the seven cases the blood of the pig which was fed iron contained more iron and more red cells than the blood of its control mate. This, now, is a consistent outcome in seven comparisons, and statistical examination showed that such a result could occur by chance only about once in 128 trials, and it was therefore mathematically, if not scientifically, significant. Hence it seemed safe to report that feeding suckling pigs additional iron increased the iron content and the red-cell count of their blood. In fact, the red-cell count on the pigs which got the iron averaged 8,070,000, and on those which got no iron 7,210,000, a difference of nearly 12 per cent. Also the iron content of the blood of the former was 0.0479 per cent and of the latter 0.0438, or a difference of 9.4 per cent. However, this should not be overlooked (and the investigator did not overlook it): the pigs which got no additional iron were not anæmic. That was an empirical observation of scientific significance. The investigator therefore did not consider even these results significant, and his ultimate report was negative for the entire work. Unfortunately too many investigators desire to get a positive report to press too quickly to undertake such devastating and thoroughgoing examination of their own results, though, needless to say, the true scientist follows the example of the man who studied the pigs.

This does not mean that the careful scientist has attained exactitude and can speak with positive certitude. While means of refining exactitude exist, we know of none to make scientific exactitude perfect. The minute we perfect our results, by averaging or by drawing a curve and assuming that we have the correct solution to our problem, we have become mathematical. An irreducible minimum of scientific doubt must remain after the most patiently critical investigation, for scientific exactitude is a fiction.

V

I have adhered rather closely to biology, but the same general rules hold in physical science. The kinetic theory of gas pressure assumes that a gas consists of infinitesimal particles all of which are in violent, straightline motion of varying velocities and never stop unless impeded by a confining wall or another particle. We derive a certain theory of gas pressure in gross, however, by assuming a uniform velocity for the particles, a thing which is certainly untrue, for some surely go slower and some faster. However, the very simple law works well enough; it is stated in terms of an ‘ideal’ gas which does not exist, and we use that as a measuring rod with which to evaluate the actions of ‘real’ gases. Here is a logical structure of pure theory put to some practical use. But beneath this simple law it is certainly true that the tiny particles of gases behave quite erratically, although some deeper simplicity still may be found in time to underlie their antics. However, certain assumptions are made and certain data are selected, correlated, and put to good use.

Scientists make calculations involving gravitation — say calculations of the orbit of a planet. They assume that there is not in the near-by universe some enormous, yet invisible, body which may be exerting a powerful force hereabouts. That is possible, but the assumption is made that no such body exists. Also, in all physical measurement, such assumptions are made as that the velocity of light, or of the earth in its orbit, is constant. There is no final way of testing these assumptions. There is no final way of knowing that some distant stellar body does not exert a peculiar, but powerful, influence upon some physical phenomena we examine here, meanwhile assuming the effect to be negligible. Finally, when it comes to drawing a graph or curve to express his findings, the physicist, just like the biologist, must content himself with knowing the location of only a few points out of an infinite number on that curve; others he must ignore because they are too far out of line; others still he must interpolate, and assume the position of, when he draws a pretty, smooth curve to illustrate the general principle he has found, or wishes to use to ‘explain’ phenomena he has observed. In short, no scientist can ever be purely objective and act as nature’s automaton-secretary. He must select, interpret, interpolate, and, finally, dare to generalize without knowing all the particulars concerned.

This leads us directly back to the three types of knowledge we noted in the beginning, which seem to confuse so many people. The religious person or the mystic has intuitive knowledge. This knowledge has emotional validity, and that emotional validity, for the persons accepting it, is its truth. It does not depend upon logic, nor does it require verification in experience. Its certitude is compelling and final. If these simple facts could only be widely comprehended, not only would dissension between creeds be abolished, but religious people would cease trying to prove their contentions by objective means, and the so-called ‘warfare’ between science and religion would simply be annihilated.

Mathematical like religious knowledge is based upon undefined postulates, but it adopts just as few such undefined terms as it possibly can, instead of using them wholesale. It builds systems upon these postulates with logical consistency. There are various mathematical systems, depending upon the basic postulates taken. They differ from each other, and are, in fact, often in logical contradiction, however consistent within themselves; but it is also a curious fact that, logic being what it is, each one of them may be applied practically to reality and will assist man in dealing with the phenomena his senses apprehend. Mathematical systems have real æsthetic beauty, appreciated, like all high art, by the few, for their perfect logic is worthy of admiration. Their curious utility is still more interesting.

Scientific knowledge also assumes postulates and is compelled to take a certain number of them as undefined terms or axioms upon which to build. One of them is the assumption that light has a constant velocity; another is the postulate of continuity which permits the scientist to interpolate points on his graphs. In building up from these postulates, both logic and mathematics are used, but they arc not paramount. The paramount thing in science is checking up against the phenomena our senses bring us, which we call reality. As we apprehend reality more completely, scientific principles must fall and be rebuilt, using logic and mathematics in the process. But scientific exactitude (and therefore scientific certainty) can never be absolute. Such certitude is possible only in a theoretical system like mathematics, or in an intuitive but emotionally valid system like a religion. It is very curious to me, and long has been, that this simple fact should not have been realized and that those who alone have absolute certitude in their possession so often refer to science as alone exact and seek to use it as a criterion of metaphysical values which need no such buttresses.

Sections

The Print Edition

What Is Scientific Exactitude?

I

II

III

IV

V