In experiments in which it is possible to observe each experimental subject before and after administering a single treatment, we will test a hypothesis about the average change the treatment produces instead of the difference in average responses with and without the treatment. This approach reduces the variability in the observations due to differences between individuals and yields a more sensitive test.
Figure 9-1 illustrates this point. Figure 9-1A shows daily urine production in two samples of 10 different people each; one sample group took a placebo and the other took a drug. Since there is little difference in the mean response relative to the standard deviations, it would be hard to assert that the treatment produced an effect on the basis of these observations. In fact, t computed using the methods of Chapter 4 is only 1.33, which comes nowhere near t.05 = 2.101, the critical value for ν = npla + ndrug −2 = 10 + 10−2 = 18 degrees of freedom.
Now consider Figure 9-1B. It shows urine productions identical to those in Figure 9-1A but for an experiment in which urine production was measured in one sample of 10 individuals before and after administering the drug. A straight line connects the observations for each individual. Figure 9-1B shows that the drug increased urine production in 8 of the 10 people in the sample. This result suggests that the drug is an effective diuretic.
By concentrating on the change in each individual that accompanied taking the drug (in Fig. 9-1B), we could detect an effect that was masked by the variability between individuals when different people received the placebo and the drug (in Fig. 9-1A).
Now, let us develop a statistical procedure to quantify our subjective impression in such experiments. The paired t test can be used to test the hypothesis that there is, on the average, no change in each individual after receiving the treatment under study. Recall that the general definition of the t statistic is
t=Parameter estimate−true value of population parameterStandard error of parameter estimate
The parameter we wish to estimate is the average difference in response δ in each individual due to the treatment. If we let d equal the observed change in each individual that accompanies the treatment, we can use d¯ the mean change, to estimate δ. The standard deviation of the observed differences is
sd=∑(d−d¯)2n−1
So the standard error of the difference is
sd¯=sdn
Therefore,
t=d¯−δsd¯
To test the hypothesis that there is, on the average, no response to the treatment, set δ = 0 in this equation to obtain
t=d¯sd¯
The resulting value of t is compared with the critical value of ν = n −1 degrees of freedom.
To recapitulate, when analyzing data from an experiment in which it is possible to observe each individual before and after applying a single treatment:
Compute the change in response that accompanies the treatment in each individual d.
Compute the mean change d¯ and the standard error of the mean change sd¯.
Use these numbers to compute t=d¯/sd¯.
Compare this t with the critical value for ν = n −1 degrees of freedom, where n is the number of experimental subjects.
Note that the number of degrees of freedom, ν, associated with the paired t test is n −1, less than the 2 (n −1) degrees of freedom associated with analyzing these data using an unpaired t test. This loss of degrees of freedom increases the critical value of t that must be exceeded to reject the null hypothesis of no difference. While this situation would seem undesirable, because of the typical biological variability that occurs between individuals this loss of degrees of freedom is virtually always more than compensated for by focusing on differences within subjects, which reduces the variability in the results used to compute t. All other things being equal, paired designs are almost always more powerful for detecting effects in biological data than unpaired designs.
Finally, the paired t test, like all t tests, is predicated on a normally distributed population. In the t test for unpaired observations developed in Chapter 4, responses needed to be normally distributed. In the paired t test, the differences (changes within each subject) associated with the treatment need to be normally distributed.
9.1.1. Cigarette Smoking and Platelet Function
Smokers are more likely to develop diseases caused by abnormal blood clots (thromboses), including heart attacks and occlusion of peripheral arteries, than nonsmokers. Platelets are small bodies that circulate in the blood and stick together to form blood clots. Since smokers experience more disorders related to undesirable blood clots than nonsmokers, Peter Levine drew blood samples in 11 people before and after they smoked a single cigarette and measured the extent to which platelets aggregated when exposed to a standard stimulus. This stimulus, adenosine diphosphate, makes platelets release their granular contents, which, in turn, makes them stick together and form a blood clot.
Figure 9-2 shows the results of this experiment, with platelet stickiness quantified as the maximum percentage of all the platelets that aggregated after being exposed to adenosine diphosphate. The pair of observations made in each individual before and after smoking the cigarette is connected by straight lines. The mean percentage aggregations were 43.1% before smoking and 53.5% after smoking, with standard deviations of 15.9% and 18.7%, respectively. Simply looking at these numbers does not suggest that smoking had an effect on platelet aggregation. This approach, however, omits an important fact about the experiment: the platelet aggregations were not measured in two different (independent) groups of people, smokers and nonsmokers, but in a single group of people who were observed both before and after smoking the cigarette.
In all but one individual, the maximum platelet aggregation increased after smoking the cigarette, suggesting that smoking facilitates thrombus formation. The means and standard deviations of platelet aggregation before and after smoking for all people taken together did not suggest this pattern because the variability between individuals masked the variability in platelet aggregation that was due to smoking the cigarette. When we took into account the fact that the data consisted of pairs of observations done before and after smoking in each individual, we could focus on the change in response and so remove the variability that was due to the fact that different people have different platelet-aggregation tendencies regardless of whether they smoked a cigarette or not.
The changes in maximum percent platelet aggregation that accompany smoking are (from Fig. 9-2) 2%, 4%, 10%, 12%, 16%, 15%, 4%, 27%, 9%, −1%, and 15%. Therefore, the mean change in percent platelet aggregation with smoking in these 11 people is d¯ = 10.3%. The standard deviation of the change is 8.0%, so the standard error of the change is sd¯=8.0 / 11=2.41%. Finally, our test statistic is
t=d¯sd¯=10.32.41=4.27
This value exceeds 3.169, the value that defines the most extreme 1% of the t distribution with ν = n −1 = 11−1 = 10 degrees of freedom (from Table 4-1). Therefore, we report that smoking increases platelet aggregation (P <>
How convincing is this experiment that a constituent specific to tobacco smoke, as opposed to other chemicals common to smoke in general (e.g., carbon monoxide), or even the stress of the experiment produced the observed change? To investigate this question, Levine also had his subjects “smoke” an unlit cigarette and a lettuce leaf cigarette that contained no nicotine. Figure 9-3 shows the results of these experiments, together with the results of smoking a standard cigarette (from Fig. 9-2).
When the experimental subjects merely pretended to smoke or smoked a non-nicotine cigarette made of dried lettuce, there was no discernible change in platelet aggregation. This situation contrasts with the increase in platelet aggregation that followed smoking a single tobacco cigarette. This experimental design illustrates an important point:
In a well-designed experiment, the only difference between the treatment group and the control group, both chosen at random from a population of interest, is the treatment.
In this experiment the treatment of interest was tobacco constituents in the smoke, so it was important to compare the results with observations obtained after exposing the subjects to non tobacco smoke. This step helped ensure that the observed changes were due to the tobacco rather than smoking in general. The more carefully the investigator can isolate the treatment effect, the more convincing the conclusions will be.
There are also subtle biases that can cloud the conclusions from an experiment. Most investigators, and their colleagues and technicians, want the experiments to support their hypothesis. In addition, the experimental subjects, when they are people, generally want to be helpful and wish the investigator to be correct, especially if the study is evaluating a new treatment that the experimental subject hopes will provide a cure. These factors can lead the people doing the study to tend to slant judgment calls (often required when collecting the data) toward making the study come out the way everyone wants. For example, the laboratory technicians who measure platelet aggregation might read the control samples on the low side and the smoking samples on the high side without even realizing it. Perhaps some psychological factor among the experimental subjects (analogous to a placebo effect) led their platelet aggregation to increase when they smoked the tobacco cigarette. Levine avoided these difficulties by doing the experiments in a double blind manner in which the investigator, the experimental subject, and the laboratory technicians who analyzed the blood samples did not know the content of the cigarettes being smoked until after all experiments were complete and specimens analyzed. As discussed in Chapter 2, double-blind studies are the most effective way to eliminate bias due to both the observer and experimental subject.
In single blind studies one party, usually the investigator, knows which treatment is being administered. This approach controls biases due to the placebo effect but not observer biases. Some studies are also partially blind, in which the participants know something about the treatment but do not have full information. For example, the blood platelet study might be considered partially blind because both the subject and the investigator obviously knew when the subject was only pretending to smoke. It was possible, however, to withhold this information from the laboratory technicians who actually analyzed the blood samples to avoid biases in their measurements of percent platelet aggregation.
The paired t test can be used to test hypotheses when observations are taken before and after administering a single treatment to a group of individuals. To generalize this procedure to experiments in which the same individuals are subjected to a number of treatments, we now develop repeated measures analysis of variance.
To do so, we must first introduce some new nomenclature for analysis of variance. To ease the transition, we begin with the analysis of variance presented in Chapter 3, in which each treatment was applied to different individuals. After reformulating this type of analysis of variance, we will go on to the case of repeated measurements on the same individual.