Filter results by Topics

Your search for all content returned 13 results

Save search You must be logged in as an individual save a search. Log-in/register
Book
Primer of Biostatistics, 7th Edition

by Stanton A. Glantz

A concise, engagingly written introduction to understanding statistics as they apply to medicine and the life sciences

CD-ROM performs 30 statistical tests

Don't be afraid of biostatistics anymore! Primer of Biostatistics, 7th Edition demystifies this challenging topic in an interesting and enjoyable manner that assumes no prior knowledge of the subject. Faster than you thought possible, you'll understand test selection and be able to evaluate biomedical statistics critically, knowledgeably, and confidently.

With Primer of Biostatistics, you'll start with the basics, including analysis of variance and the t test, then advance to multiple comparison testing, contingency tables, regression, and more. Illustrative examples and challenging problems, culled from the recent biomedical literature, highlight the discussions throughout and help to foster a more intuitive approach to biostatistics.

The companion CD-ROM contains everything you need to run thirty statistical tests of your own data. Review questions and summaries in each chapter facilitate the learning process and help you gauge your comprehension. By combining whimsical studies of Martians and other planetary residents with actual papers from the biomedical literature, the author makes the subject fun and engaging.

Coverage includes:

• How to summarize data

• How to test for differences between groups

• The t test

• How to analyze rates and proportions

• What does "not significant" really mean?

• Confidence intervals

• How to test for trends

• Experiments when each subject receives more than one treatment

• Alternatives to analysis of variance and the t test based on ranks

• How to analyze survival data

Book Chapter
10. Alternatives to Analysis of Variance and the t test Based on Ranks

10. Alternatives to Analysis of Variance and the t test Based on Ranks

As already noted, analysis of variance is called a parametric statistical method because it is based on estimates of the two population parameters, the mean and standard deviation (or variance), that completely define a normal distribution. Given the assumption that the samples are drawn from normally distributed populations, one can compute the distributions of the F or t test statistics that will occur in all possible experiments of a given size when the treatments have no effect. The critical values that define a value of F or t can then be obtained from that distribution. When the assumptions of parametric statistical methods are satisfied, they are the most powerful tests available.

If the populations the observations were drawn from are not normally distributed (or are not reasonably compatible with other assumptions of a parametric method, such as equal variances in all the treatment groups), parametric methods become quite unreliable because the mean and standard deviation, the key elements of parametric statistics, no longer completely describe the population. In fact, when the population substantially deviates from normality, interpreting the mean and standard deviation in terms of a normal distribution can produce a very misleading picture.

For example, recall our discussion of the distribution of heights of the entire population of Jupiter. The mean height of all Jovians is 37.6 cm in Figure 2-3A and the standard deviation is 4.5 cm. Rather than being equally distributed about the mean, the population is skewed toward taller heights. Specifically, the heights of Jovians range from 31 to 52 cm, with most heights around 35 cm. Figure 2-3B shows what the population of heights would have been if, instead of being skewed toward taller heights, they had been normally distributed with the same mean and standard deviation as the actual population (in Figure 2-3A). The heights would have ranged from 26 to 49 cm, with most heights around 37 to 38 cm. Simply looking at Figure 2-3 should convince you that envisioning a population on the basis of the mean and standard deviation can be quite misleading if the population does not, at least approximately, follow the normal distribution.

The same thing is true of statistical tests that are based on the normal distribution. When the population the samples were drawn from does not at least approximately follow the normal distribution, these tests can be quite misleading. In such cases, it is possible to use the ranks of the observations rather than the observations themselves to compute statistics that can be used to test hypotheses. By using ranks rather than the actual measurements it is possible to retain much of the information about the relative size of responses without making any assumptions about how the population the samples were drawn from is distributed. Since these tests are not based on the parameters of the underlying population, they are called nonparametric or distribution-free methods.[1] All the methods we will discuss require only that the distributions under the different treatments have similar shapes, but there is no restriction on what those shapes are.[2]

When the observations are drawn from normally distributed populations, the nonparametric methods in this chapter are about 95% as powerful as the analogous parametric methods. As a result, power for these tests can be estimated by computing the power of the analogous parametric test. When the observations drawn from populations that are not normally distributed, nonparametric methods are not only more reliable but also more powerful than parametric methods.

Unfortunately, you can never observe the entire population. So how can you tell whether the assumptions such as normality are met, to permit using the parametric tests such as analysis of variance? The simplest approach is to plot the observations and look at them. Do they seem compatible with the assumptions that they were drawn from normally distributed populations with roughly the same variances, that is, within a factor of 2 to 3 of each other? If so, you are probably safe in using parametric methods. If, on the other hand, the observations are heavily skewed (suggesting a population such as the Jovians in Fig. 2-3A) or appear to have more than one peak, you probably will want to use a nonparametric method. When the standard deviation is about the same size or larger than the mean and the variable can take on only positive values, this is an indication that the distribution is skewed. (A normally distributed variable would have to take on negative values.) In practice, these simple rules of thumb are often all you will need.

There are two ways to make this procedure more objective. The first is to plot the observations as a normal probability plot. A normal probability plot has a distorted vertical scale that makes normally distributed observations plot as a straight line (just as exponential functions plot as a straight line on a semilogarithmic graph). Examining how straight the line is will show how compatible the observations are with a normal distribution. One can also construct a χ2 statistic to test how closely the observed data agree with those expected if the population is normally distributed with the same mean and standard deviation. Since in practice simply looking at the data is generally adequate, we will not discuss these approaches in detail.[3]

Unfortunately, none of these methods is especially convincing one way or the other for the small sample sizes common in biomedical research, and your choice of approach (i.e., parametric versus nonparametric) often has to be based more on judgment and preference than hard evidence.

One informal approach is to do the analysis with both the applicable parametric and nonparametric methods, then compare the results. If the data are from a normal population, then the parametric method should be more sensitive (and so provide a lower P value), whereas if there is substantial nonnormality then the nonparametric method should be more sensitive (and so provide the lower P value). If the data are only slightly nonnormal, the two approaches should give similar results.

Things basically come down to the following difference of opinion: Some people think that in the absence of evidence that the data were not drawn from a normally distributed population, one should use parametric tests because they are more powerful and more widely used. These people say that you should use a nonparametric test only when there is positive evidence that the populations under study are not normally distributed. Others point out that the nonparametric methods discussed in this chapter are 95% as powerful as parametric methods when the data are from normally distributed populations and more reliable when the data are not from normally distributed populations. They also believe that investigators should assume as little as possible when analyzing their data. They therefore recommend that nonparametric methods be used except when there is positive evidence that parametric methods are suitable. At the moment, there is no definitive answer stating which attitude is preferable. And there probably never will be such an answer.

Book Chapter
1. Biostatistics and Clinical Practice

1. Biostatistics and Clinical Practice

Suppose researchers believe that administering some drug increases urine production in proportion to the dose and to study it they give different doses of the drug to five different people, plotting their urine production against the dose of drug. The resulting data, shown in Figure 1-2A, reveal a strong relationship between the drug dose and daily urine production in the five people who were studied. This result would probably lead the investigators to publish a paper stating that the drug was an effective diuretic.

Figure 1-2 (A) Results of an experiment in which researchers administered five different doses of a drug to five different people and measured their daily urine production. Output increased as the dose of drug increased in these five people, suggesting that the drug is an effective diuretic in all people similar to those tested. (B) If the researchers had been able to administer the drug to all people and measure their daily urine output, it would have been clear that there is no relationship between the dose of drug and urine output. The five specific individuals who happened to be selected for the study in panel A are shown as shaded points. It is possible, but not likely, to obtain such an unrepresentative sample that leads one to believe that there is a relationship between the two variables when there is none. A set of statistical procedures called tests of hypotheses permits one to estimate the chance of getting such an unrepresentative sample.
01x02

The only statement that can be made with absolute certainty is that as the drug dose increased, so did urine production in the five people in the study. The real question of interest, however, is: How is the drug likely to affect all people who receive it? The assertion that the drug is effective requires a leap of faith from the limited experience, shown in Figure 1-2A, to all people.

Now, pretend that we knew how every person who would ever receive the drug would respond. Figure 1-2B shows this information. There is no systematic relationship between the drug dose and urine production! The drug is not an effective diuretic.

How could we have been led so far astray? The dark points in Figure 1-2B represent the specific individuals who happened to be studied to obtain the results shown in Figure 1-2A. While they are all members of the population of people we are interested in studying, the five specific individuals we happened to study, taken as a group, were not really representative of how the entire population of people responds to the drug.

Looking at Figure 1-2B should convince you that obtaining such an unrepresentative sample of people, though possible, is not very probable. One set of statistical procedures, called tests of hypotheses, permit you to estimate the likelihood of concluding that two things are related as Figure 1-2A suggests when the relationship is really due to bad luck in selecting people for study, and not a true effect of the drug. In this example, one can estimate that such a sample of people will turn up in a study of the drug only about 5 times in 1000 when the drug actually has no effect.

Of course it is important to realize that although statistics is a branch of mathematics, there can be honest differences of opinion about the best way to analyze a problem. This fact arises because all statistical methods are based on relatively simple mathematical models of reality, so the results of the statistical tests are accurate only to the extent that the reality and the mathematical model underlying the statistical test are in reasonable agreement.

Book Chapter
7. Confidence Intervals

7. Confidence Intervals

In Chapter 4, we defined the t statistic to be

t=Difference of sample meansStandard error of difference of sample means

then computed its value for the data observed in an experiment. Next, we compared the result with the value tα that defined the most extreme 100α percent of the possible values to t that would occur (in both tails) if the two samples were drawn from a single population. If the observed value of t exceeded tα (given in Table 4-1), we reported a “statistically significant” difference, with P <>α As Figure 4-4 showed, the distribution of possible values of t has a mean of zero and is symmetric about zero.

On the other hand, if the two samples are drawn from populations with different means, the distribution of values of t associated with all possible experiments involving two samples of a given size is not centered on zero; it does not follow the t distribution. As Figures 6-3 and 6-5 showed, the actual distribution of possible values of t has a nonzero mean that depends on the size of the treatment effect. It is possible to revise the definition of t so that it will be distributed according to the t distribution in Figure 4-4 regardless of whether or not the treatment actually has an effect. This modified definition of t is

t=           Difference of sample means true difference in population meansStandard error of difference of sample means

Notice that if the hypothesis of no treatment effect is correct, the difference in population means is zero and this definition of t reduces to the one we used before. The equivalent mathematical statement is

t=(X¯1X¯2)(μ1μ2)sX¯1X¯2

In Chapter 4 we computed t from the observations, then compared it with the critical value for a “big” value of t with ν = n1 + n2 − 2 degrees of freedom to obtain a P value. Now, however, we cannot follow this approach since we do not know all the terms on the right side of the equation. Specifically, we do not know the true difference in mean values of the two populations from which the samples were drawn, μ1μ2. We can, however, use this equation to estimate the size of the treatment effect, μ1μ2.

Instead of using the equation to determine t, we will select an appropriate value of t and use the equation to estimate μ1μ2. The only problem is that of selecting an appropriate value for t.

By definition, 100α percent of all possible values of t are more negative than −tα or more positive than +tα. For example, only 5% of all possible t values will fall outside the interval between −t.05 and +t.05, where t.05 is the critical value of t that defines the most extreme 5% of the t distribution (tabulated in Table 4-1). Therefore, 100(1 − α) percent of all possible values of t fall between −tα and +tα. For example, 95% of all possible values of t will fall between −t.05 and +t.05.

Every different pair of random samples we draw in our experiment will be associated with different values of, X¯1X¯2 and sx¯1x¯2 and 100(1 − α) percent of all possible experiments involving samples of a given size will yield values of t that fall between −tα and +tα. Therefore, for 100(1 − α) percent of all possible experiments

tα(X¯1X¯2)(μ1μ2)sX¯1X¯2+tα

Solve this equation for the true difference in sample means

(X¯1X¯2)tαsX¯1X¯2μ1μ2(X¯1X¯2)+tαsX¯1X¯2

In other words, the actual difference of the means of the two populations from which the samples were drawn will fall within ta standard errors of the difference of the sample means of the observed difference in the sample means. (ta has ν = n1 + n2 − 2 degrees of freedom, just as when we used the t distribution in hypothesis testing). This range is called the 100(1 − α) percent confidence interval for the difference of the means. For example, the 95% confidence interval for the true difference of the population means is

(X¯1X¯2)t.05sX¯1X¯2μ1μ2(X¯1X¯2)+t.05sX¯1X¯2

This equation defines the range that will include the true difference in the means for 95% of all possible experiments that involve drawing samples from the two populations under study.

Since this procedure to compute the confidence interval for the difference of two means uses the t distribution, it is subject to the same limitations as the t test. In particular, the samples must be drawn from populations that follow a normal distribution at least approximately.[2]

Book Chapter
9. Experiments When Each Subject Receives More Than One Treatment

9. Experiments When Each Subject Receives More Than One Treatment

In experiments in which it is possible to observe each experimental subject before and after administering a single treatment, we will test a hypothesis about the average change the treatment produces instead of the difference in average responses with and without the treatment. This approach reduces the variability in the observations due to differences between individuals and yields a more sensitive test.

Figure 9-1 illustrates this point. Figure 9-1A shows daily urine production in two samples of 10 different people each; one sample group took a placebo and the other took a drug. Since there is little difference in the mean response relative to the standard deviations, it would be hard to assert that the treatment produced an effect on the basis of these observations. In fact, t computed using the methods of Chapter 4 is only 1.33, which comes nowhere near t.05 = 2.101, the critical value for ν = npla + ndrug −2 = 10 + 10−2 = 18 degrees of freedom.

Figure 9-1 (A) Daily urine production in two groups of 10 different people. One group of 10 people received the placebo and the other group of 10 people received the drug. The diuretic does not appear to be effective. (B) Daily urine production in a single group of 10 people before and after taking a drug. The drug appears to be an effective diuretic. The observations are identical to those in panel A; by focusing on changes in each individual's response rather than the response of all the people taken together, it is possible to detect a difference that was masked by the between subjects variability in panel A.
09x01

Now consider Figure 9-1B. It shows urine productions identical to those in Figure 9-1A but for an experiment in which urine production was measured in one sample of 10 individuals before and after administering the drug. A straight line connects the observations for each individual. Figure 9-1B shows that the drug increased urine production in 8 of the 10 people in the sample. This result suggests that the drug is an effective diuretic.

By concentrating on the change in each individual that accompanied taking the drug (in Fig. 9-1B), we could detect an effect that was masked by the variability between individuals when different people received the placebo and the drug (in Fig. 9-1A).

Now, let us develop a statistical procedure to quantify our subjective impression in such experiments. The paired t test can be used to test the hypothesis that there is, on the average, no change in each individual after receiving the treatment under study. Recall that the general definition of the t statistic is

t=Parameter estimatetrue value of population parameterStandard error of parameter estimate

The parameter we wish to estimate is the average difference in response δ in each individual due to the treatment. If we let d equal the observed change in each individual that accompanies the treatment, we can use d¯ the mean change, to estimate δ. The standard deviation of the observed differences is

sd=(dd¯)2n1

So the standard error of the difference is

sd¯=sdn

Therefore,

t=d¯δsd¯

To test the hypothesis that there is, on the average, no response to the treatment, set δ = 0 in this equation to obtain

t=d¯sd¯

The resulting value of t is compared with the critical value of ν = n −1 degrees of freedom.

To recapitulate, when analyzing data from an experiment in which it is possible to observe each individual before and after applying a single treatment:

  • Compute the change in response that accompanies the treatment in each individual d.

  • Compute the mean change d¯ and the standard error of the mean change sd¯.

  • Use these numbers to compute t=d¯/sd¯.

  • Compare this t with the critical value for ν = n −1 degrees of freedom, where n is the number of experimental subjects.

Note that the number of degrees of freedom, ν, associated with the paired t test is n −1, less than the 2 (n −1) degrees of freedom associated with analyzing these data using an unpaired t test. This loss of degrees of freedom increases the critical value of t that must be exceeded to reject the null hypothesis of no difference. While this situation would seem undesirable, because of the typical biological variability that occurs between individuals this loss of degrees of freedom is virtually always more than compensated for by focusing on differences within subjects, which reduces the variability in the results used to compute t. All other things being equal, paired designs are almost always more powerful for detecting effects in biological data than unpaired designs.

Finally, the paired t test, like all t tests, is predicated on a normally distributed population. In the t test for unpaired observations developed in Chapter 4, responses needed to be normally distributed. In the paired t test, the differences (changes within each subject) associated with the treatment need to be normally distributed.

9.1.1. Cigarette Smoking and Platelet Function

Smokers are more likely to develop diseases caused by abnormal blood clots (thromboses), including heart attacks and occlusion of peripheral arteries, than nonsmokers. Platelets are small bodies that circulate in the blood and stick together to form blood clots. Since smokers experience more disorders related to undesirable blood clots than nonsmokers, Peter Levine[1] drew blood samples in 11 people before and after they smoked a single cigarette and measured the extent to which platelets aggregated when exposed to a standard stimulus. This stimulus, adenosine diphosphate, makes platelets release their granular contents, which, in turn, makes them stick together and form a blood clot.

Figure 9-2 shows the results of this experiment, with platelet stickiness quantified as the maximum percentage of all the platelets that aggregated after being exposed to adenosine diphosphate. The pair of observations made in each individual before and after smoking the cigarette is connected by straight lines. The mean percentage aggregations were 43.1% before smoking and 53.5% after smoking, with standard deviations of 15.9% and 18.7%, respectively. Simply looking at these numbers does not suggest that smoking had an effect on platelet aggregation. This approach, however, omits an important fact about the experiment: the platelet aggregations were not measured in two different (independent) groups of people, smokers and nonsmokers, but in a single group of people who were observed both before and after smoking the cigarette.

In all but one individual, the maximum platelet aggregation increased after smoking the cigarette, suggesting that smoking facilitates thrombus formation. The means and standard deviations of platelet aggregation before and after smoking for all people taken together did not suggest this pattern because the variability between individuals masked the variability in platelet aggregation that was due to smoking the cigarette. When we took into account the fact that the data consisted of pairs of observations done before and after smoking in each individual, we could focus on the change in response and so remove the variability that was due to the fact that different people have different platelet-aggregation tendencies regardless of whether they smoked a cigarette or not.

The changes in maximum percent platelet aggregation that accompany smoking are (from Fig. 9-2) 2%, 4%, 10%, 12%, 16%, 15%, 4%, 27%, 9%, −1%, and 15%. Therefore, the mean change in percent platelet aggregation with smoking in these 11 people is d¯ = 10.3%. The standard deviation of the change is 8.0%, so the standard error of the change is sd¯=8.0/11=2.41%. Finally, our test statistic is

t=d¯sd¯=10.32.41=4.27

This value exceeds 3.169, the value that defines the most extreme 1% of the t distribution with ν = n −1 = 11−1 = 10 degrees of freedom (from Table 4-1). Therefore, we report that smoking increases platelet aggregation (P <>

How convincing is this experiment that a constituent specific to tobacco smoke, as opposed to other chemicals common to smoke in general (e.g., carbon monoxide), or even the stress of the experiment produced the observed change? To investigate this question, Levine also had his subjects “smoke” an unlit cigarette and a lettuce leaf cigarette that contained no nicotine. Figure 9-3 shows the results of these experiments, together with the results of smoking a standard cigarette (from Fig. 9-2).

Figure 9-2 Maximum percentage platelet aggregation before and after smoking a tobacco cigarette in 11 people. (Adapted with permission of the American Heart Association, Inc. from Fig. 1 of Levine PH. An acute effect of cigarette smoking on platelet function: a possible link between smoking and arterial thrombosis. Circulation. 1973;48:619–623.)
09x02
Figure 9-3 Maximum percentage platelet aggregation in 11 people before and after pretending to smoke (“sham smoking”), before and after smoking a lettuce-leaf cigarette that contained no nicotine, and before and after smoking a tobacco cigarette. These observations, taken together, suggest that it was something in the tobacco smoke, rather than the act of smoking or other general constituents of smoke, that produced the change in platelet aggregation. (Redrawn with permission of the American Heart Association, Inc. from Fig. 1 of Levine PH. An acute effect of cigarette smoking on platelet function: a possible link between smoking and arterial thrombosis. Circulation. 1973;48:619–623.)
09x03

When the experimental subjects merely pretended to smoke or smoked a non-nicotine cigarette made of dried lettuce, there was no discernible change in platelet aggregation. This situation contrasts with the increase in platelet aggregation that followed smoking a single tobacco cigarette. This experimental design illustrates an important point:

In a well-designed experiment, the only difference between the treatment group and the control group, both chosen at random from a population of interest, is the treatment.

In this experiment the treatment of interest was tobacco constituents in the smoke, so it was important to compare the results with observations obtained after exposing the subjects to non tobacco smoke. This step helped ensure that the observed changes were due to the tobacco rather than smoking in general. The more carefully the investigator can isolate the treatment effect, the more convincing the conclusions will be.

There are also subtle biases that can cloud the conclusions from an experiment. Most investigators, and their colleagues and technicians, want the experiments to support their hypothesis. In addition, the experimental subjects, when they are people, generally want to be helpful and wish the investigator to be correct, especially if the study is evaluating a new treatment that the experimental subject hopes will provide a cure. These factors can lead the people doing the study to tend to slant judgment calls (often required when collecting the data) toward making the study come out the way everyone wants. For example, the laboratory technicians who measure platelet aggregation might read the control samples on the low side and the smoking samples on the high side without even realizing it. Perhaps some psychological factor among the experimental subjects (analogous to a placebo effect) led their platelet aggregation to increase when they smoked the tobacco cigarette. Levine avoided these difficulties by doing the experiments in a double blind manner in which the investigator, the experimental subject, and the laboratory technicians who analyzed the blood samples did not know the content of the cigarettes being smoked until after all experiments were complete and specimens analyzed. As discussed in Chapter 2, double-blind studies are the most effective way to eliminate bias due to both the observer and experimental subject.

In single blind studies one party, usually the investigator, knows which treatment is being administered. This approach controls biases due to the placebo effect but not observer biases. Some studies are also partially blind, in which the participants know something about the treatment but do not have full information. For example, the blood platelet study might be considered partially blind because both the subject and the investigator obviously knew when the subject was only pretending to smoke. It was possible, however, to withhold this information from the laboratory technicians who actually analyzed the blood samples to avoid biases in their measurements of percent platelet aggregation.

The paired t test can be used to test hypotheses when observations are taken before and after administering a single treatment to a group of individuals. To generalize this procedure to experiments in which the same individuals are subjected to a number of treatments, we now develop repeated measures analysis of variance.

To do so, we must first introduce some new nomenclature for analysis of variance. To ease the transition, we begin with the analysis of variance presented in Chapter 3, in which each treatment was applied to different individuals. After reformulating this type of analysis of variance, we will go on to the case of repeated measurements on the same individual.

Book Chapter
5. How to Analyze Rates and Proportions

5. How to Analyze Rates and Proportions

Before we can quantify the certainty of our descriptions of a population on the basis of a limited sample, we need to know how to describe the population itself. Since we have already visited Mars and met all 200 Martians (in Chapter 2), we will continue to use them to develop ways to describe populations. In addition to measuring the Martians' heights, we noted that 50 of them were left-footed and the remaining 150 were right-footed. Figure 5-1 shows the entire population of Mars divided according to footedness. The first way in which we can describe this population is by giving the proportion p of Martians who are in each class. In this case, pleft = 50/200 = 0.25 and pright = 150/250 = 0.75. Since there are only two possible classes, notice that pright = 1 – pleft. Thus, whenever there are only two possible classes and they are mutually exclusive, we can completely describe the division in the population with the single parameter p, the proportion of members with one of the attributes. The proportion of the population with the other attribute is always 1 – p.

Figure 5-1 Of the 200 Martians 50 are left-footed, and the remaining 150 are right-footed. Therefore, if we select one Martian at random from this population, there is a pleft = 50/200 = 0.25 = 25% chance it will be left-footed.
05x01

Note that p also is the probability of drawing a left-footed Martian if one selects one member of the population at random.

Thus p plays a role exactly analogous to that played by the population mean μ in Chapter 2. To see why, suppose we associate the value X = 1 with each left-footed Martian and a value of X = 0 with each right-footed Martian. The mean value of X for the population is

μ=ΣXN=1+1++1+0+0++0200=50(1)+150(0)200=50200=0.25

which is pleft.

This idea can be generalized quite easily using a few equations. Suppose M members of a population of N individuals have some attribute and the remaining NM members of the population do not. Associate a value of X = 1 with the population members having the attribute and a value of X = 0 with the others. The mean of the resulting collection of numbers is

μ=XN=M(1)+(NM)(0)N=MN=p

the proportion of the population having the attribute.

Since we can compute a mean in this manner, why not compute a standard deviation in order to describe variability in the population? Even though there are only two possibilities, X = 1, and X = 0, the amount of variability will differ, depending on the value of p. Figure 5-2 shows three more populations of 200 individuals each. In Figure 5-2A only 10 of the individuals are left-footed; it exhibits less variability than the population shown in Figure 5-1. Figure 5-2B shows the extreme case in which half the members of the population fall into each of the two classes; the variability is greatest. Figure 5-2C shows the other extreme; all the members fall into one of the two classes, and there is no variability at all.

Figure 5-2 This figure illustrates three different populations, each containing 200 members but with different proportions of left-footed members. The standard deviation, σ=p(1p) quantifies the variability in the population. (A) When most of the members fall in one class, σ is a small value, 0.2, indicating relatively little variability. (B) In contrast, if half the members fall into each class, σ reaches its maximum value of .5, indicating the maximum possible variability. (C) At the other extreme, if all members fall into the same class, there is no variability at all and σ = 0.
05x02

To quantify this subjective impression, we compute the standard deviation of the 1s and 0s associated with each member of the population when we computed the mean. By definition, the population standard deviation is

σ=(Xμ)2N

X = 1 for M members of the population and 0 for the remaining NM members, and μ = p; therefore

σ=(1p)2+(1p)2++(1p)2+(0p)2+(0p)2+(0p)2N=M(1p)2+(NM)p2N=MN(1p)2+(1MN)p2

But since M/N = p is the proportion of population members with the attribute,

σ=p(1p)2+(1p)p2=[p(1p)+p2](1p)

which simplifies to

σ=p(1p)

This equation for the population standard deviation produces quantitative results that agree with the qualitative impressions we developed from Figures 5-1 and 5-2. As Figure 5-3 shows, σ = 0 when p = 0 or p = 1, that is, when all members of the population either do or do not have the attribute, and σ is maximized when p = .5, that is, when any given member of the population is as likely to have the attribute as not.

Figure 5-3 The relationship between the standard deviation of a population divided into two categories varies with p, the proportion of members in one of the categories. There is no variation if all members are in one category or the other (so σ = 0 when p = 0 or 1) and maximum variability when a given member is equally likely to fall in one class or the other (σ = 0.5 when p = 0.5).
05x03

Since σ depends only on p, it really does not contain any additional information (in contrast to the mean and standard deviation of a normally distributed variable, where μ and σ provide two independent pieces of information). It will be most useful in computing a standard error associated with estimates of p based on samples drawn at random from populations such as those shown in Figures 5-1 or 5-2.

Book Chapter
11. How to Analyze Survival Data

11. How to Analyze Survival Data

The tobacco industry, having been driven farther and farther from Earth by protectors of the public health, invades Pluto and starts to promote smoking in bars. Since it is very cold on Pluto, Plutonians spend most of their time indoors and begin dropping dead from the secondhand tobacco smoke in bars. Since it would be unethical to purposely expose Plutonians to secondhand smoke, we will simply observe how long it takes Plutonians to drop dead after they begin to be exposed to secondhand smoke in bars.

Figure 11-1A shows the observations for 10 nonsmoking Plutonians selected at random and observed over the course of a study lasting for 15 Pluto months. Subjects entered the study when they started hanging out at smoky bars, and they were followed-up until they dropped dead or the study ended. As with many survival studies, individuals were recruited into the study at various times as the study progressed. Of the 10 subjects, 7 died during the period of the study (A, B, C, F, G, H, and J). As a result, we know the exact length of time that they lived after their exposure to secondhand smoke in bars. These observations are uncensored. In contrast, two of the Plutonians were still alive at the end of the study (D and I); we know that they lived at least until the end of the study, but do not know how long they lived after being exposed to secondhand smoke. In addition, Plutonian E was vaporized in a freak accident while on vacation before the study was completed, so was lost to follow-up. We do know, however, that these individuals lived at least as long as we observed them. These observations are censored.

Figure 11-1 (A) This graph shows the observations in our study of the effect of hanging out in a smoky bar on Plutonians. The horizontal axis represents calendar time, with Plutonians entering the study at various times, when tobacco smoke invades their bars. Solid points indicate known times. Lighter points indicate the time at which observations are censored. Seven of the Plutonians die during the study (A, B, C, F, G, H, and J), so we know how long they were breathing secondhand smoke when they expired. Two of the Plutonians were still alive when the study ended at time 15 (D and I), and one (E) was lost to observation during the study, so we know that they lived at least as long as we were able to observe them, but do not know their actual time of death. (B) This graph shows the same data as panel A, except that the horizontal axis is the length of time each subject was observed after they entered the study, rather than calendar time.
11x01

Figure 11-1B shows the data in another format, where the horizontal axis is the length of time that each subject is observed after starting exposure to secondhand smoke, as opposed to calendar time. The Plutonians who died by the end of the study have a solid point at the end of the line; those that were still alive at the end of the observation period are indicated with a lighter point. Thus, we know that Plutonian A lived exactly 7 months after starting to go to a smoky bar (an uncensored observation), whereas Plutonian D lived at least 12 months after hanging out in a smoky bar (a censored observation).

This study has the necessary features of a clinical follow-up study:

  • There is a well-defined starting time for each subject (date smoking started in this example or date of diagnosis or medical intervention in a clinical study).

  • There is a well-defined end point (death in this example or relapse in many clinical studies).

  • The subjects in the study are selected at random from a larger population of interest.

If all subjects were studied for the same length of time or until they reached a common end point (such as death), we could use the methods of Chapters 5 or 10 to analyze the results. These methods require researchers to assess the outcomes at a fixed time follow the intervention, then classify each subject as either having or not having the outcome of interest or not. Unfortunately, in clinical studies these situations often do not exist. The fact that the study period often ends before all the subjects have reached the end point makes it impossible to know the actual time that all the subjects reach the common end point. In addition, because subjects are recruited throughout the duration of the study, the follow-up time often varies for different subjects. These two facts require that we develop new approaches to analyzing these data that explicitly take into account the length of follow-up when assessing outcomes. The first step is to characterize the pattern of the occurrence of end points (such as death). This pattern is quantified with a survival curve. We will now examine how to characterize survival curves and test hypotheses about them.

Book Chapter
2. How to Summarize Data

2. How to Summarize Data

The heights of Martians and Venusians are known as interval data because heights are measured on a scale with constant intervals, in this case, centimeters. For interval data, the absolute difference between two values can always be determined by subtraction.[1] The difference in heights of Martians who are 35 and 36 cm tall is the same as the difference in height of Martians who are 48 and 49 cm tall. Other variables measured on interval scales include temperature (because a 1°C difference always means the same thing), blood pressure (because a 1 mmHg difference in pressure always means the same thing), height, or weight.

There are other data, such as gender, state of birth, or whether or not a person has a certain disease, that are not measured on an interval scale. These variables are examples of nominal or categorical data, in which individuals are classified into two or more mutually exclusive and exhaustive categories. For example, people could be categorized as male or female, dead or alive, or as being born in one of the 50 states, District of Columbia, or outside the United States. In every case, it is possible to categorize each individual into one and only one category. In addition, there is no arithmetic relationship or even ordering between the categories.[2]

Ordinal data fall between interval and nominal data. Like nominal data, ordinal data fall into categories, but there is an inherent ordering (or ranking) of the categories. Level of health (excellent, very good, good, fair, or poor) is a common example of a variable measured on an ordinal scale. The different values have a natural order, but the differences or “distances” between adjoining values on an ordinal scale are not necessarily the same and may not even be comparable. For example, excellent health is better than very good health, but this difference is not necessarily the same as the difference between fair and poor health. Indeed, these differences may not even be strictly comparable.

For the remainder of this chapter, we will concentrate on how to describe interval data, particularly how to describe the location and shape of the distributions.[3] Because of the similar shapes of the distributions of heights of Martians and Venusians, we will reduce all the information in Figures 2-1 and 2-2 to a few numbers, called parameters, of the distributions. Indeed, since the shapes of the two distributions are so similar, we only need to describe how they differ; we do this by computing the mean height and the variability of heights about the mean.

Book Chapter
3. How to Test for Differences between Groups

3. How to Test for Differences between Groups

To begin our experiment, we randomly select four groups of seven people each from a small town with 200 healthy adult inhabitants. All participants give informed consent. People in the control group continue eating normally; people in the second group eat only spaghetti; people in the third group eat only steak; and people in the fourth group eat only fruit and nuts. After 1 month, each person has a cardiac catheter inserted and his or her cardiac output is measured.

As with most tests of significance, we begin with the hypothesis that all treatments (diets) have the same effect (on cardiac output). Since the study includes a control group (as experiments generally should), this hypothesis is equivalent to the hypothesis that diet has no effect on cardiac output. Figure 3-1 shows the distribution of cardiac outputs for the entire population, with each individual's cardiac output represented by a circle. The specific individuals who were randomly selected for each diet are indicated by shaded circles, with different shading for different diets. Figure 3-1 shows that the null hypothesis is, in fact, true. Unfortunately, as investigators we cannot observe the entire population and are left with the problem of deciding whether or not to reject the null hypothesis from the limited data shown in Figure 3-2. There are obviously differences between the samples; the question is: Are these differences due to the fact that the different groups of people ate differently or are these differences simply a reflection of the random variation in cardiac output between individuals?

Figure 3-1 The values of cardiac output associated with all 200 members of the population of a small town. Since diet does not affect cardiac output, the four groups of seven people each selected at random to participate in our experiment (control, spaghetti, steak, and fruit and nuts) simply represent four random samples drawn from a single population.
03x01
Figure 3-2 An investigator cannot observe the entire population but only the four samples selected at random for treatment. This figure shows the same four groups of individuals as in Figure 3-1 with their means and standard deviations as they would appear to the investigator. The question facing the investigator is: Are the observed differences due to the different diets or simply random variation? The figure also shows the collection of sample means together with their standard deviation, which is an estimate of the standard error of the mean.
03x02

To use the data in Figure 3-2 to address this question, we proceed under the assumption that the null hypothesis that diet has no effect on cardiac output is correct. Since we assume that it does not matter which diet any particular individual ate, we assume that the four experimental groups of seven people each are four random samples of size 7 drawn from a single population of 200 individuals. Since the samples are drawn at random from a population with some variance, we expect the samples to have different means and standard deviations, but if our null hypothesis that the diet has no effect on cardiac output is true, the observed differences are simply due to random sampling.

Forget about statistics for a moment. What is it about different samples that leads you to believe that they are representative samples drawn from different populations? Figures 3-2, 3-3, and 3-4 show three different possible sets of samples of some variable of interest. Simply looking at these pictures makes most people think that the four samples in Figure 3-2 were all drawn from a single population, while the samples in Figures 3-3 and 3-4 were not. Why? The variability within each sample, quantified with the standard deviation, is approximately the same. In Figure 3-2, the variability in the mean values of the samples is consistent with the variability one observes within the individual samples. In contrast, in Figures 3-3 and 3-4, the variability among sample means is much larger than one would expect from the variability within each sample. Notice that we reach this conclusion whether all (Fig. 3-3) or only one (Fig. 3-4) of the sample means appear to differ from the others.

Figure 3-3 The four samples shown are identical to those in Figure 3-2 except that the variability in the mean values has been increased substantially. The samples now appear to differ from each other because the variability between the sample means is larger than one would expect from the variability within each sample. Compare the relative variability in mean values with the variability within the sample groups with that seen in Figure 3-2.
03x03
Figure 3-4 When the mean of even one of the samples (sample 2) differs substantially from the other samples, the variability computed from within the means is substantially larger than one would expect from examining the variability within the groups.
03x04

Now let us formalize this analysis of variability to analyze our diet experiment. The standard deviation or its square, the variance, is a good measure of variability. We will use the variance to construct a procedure to test the hypothesis that diet does not affect cardiac output.

Chapter 2 showed that two population parameters—the mean and standard deviation (or, equivalently, the variance)—completely describe a normally distributed population. Therefore, we will use our raw data to compute these parameters and then base our analysis on their values rather than on the raw data directly. Since the procedures, we will now develop are based on these parameters they are called parametric statistical methods. Because these methods assume that the population from which the samples were drawn can be completely described by these parameters, they are valid only when the real population approximately follows the normal distribution. Other procedures, called nonparametric statistical methods, are based on frequencies, ranks, or percentiles do not require this assumption.[1] Parametric methods generally provide more information about the treatment being studied and are more likely to detect a real treatment effect when the underlying population is normally distributed.

We will estimate the parameter population variance in two different ways: (1) The standard deviation or variance computed from each sample is an estimate of the standard deviation or variance of the entire population. Since each of these estimates of the population variance is computed from within each sample group, the estimates will not be affected by any differences in the mean values of different groups. (2) We will use the values of the means of each sample to determine a second estimate of the population variance. In this case, the differences between the means will obviously affect the resulting estimate of the population variance. If all the samples were, in fact, drawn from the same population (i.e., the diet had no effect), these two different ways to estimate the population variance should yield approximately the same number. When they do, we will conclude that the samples were likely to have been drawn from a single population; otherwise, we will reject this hypothesis and conclude that at least one of the samples was drawn from a different population. In our experiment, rejecting the original hypothesis would lead to the conclusion that diet does alter cardiac output.

Book Chapter
8. How to Test for Trends

8. How to Test for Trends

As in all other statistical procedures, we want to use a sample drawn at random from a population to make statements about the population. Chapters 3 and 4 discussed populations whose members are normally distributed with mean μ and standard deviation σ and used estimates of these parameters to design test statistics (such as F and t) that permitted us to examine whether or not some discrete treatment was likely to have affected the mean value of a variable of interest. Now, we add another parametric procedure, linear regression, to analyze experiments in which the samples were drawn from populations characterized by a mean response that varies continuously with the size of the treatment. To understand the nature of this population and the associated random samples, we return again to Mars, where we can examine the entire population of 200 Martians.

Figure 2-1 showed that the heights of Martians are normally distributed with a mean of 40 cm and a standard deviation of 5 cm. In addition to measuring the heights of each Martian, let us also weigh each one. Figure 8-1 shows a plot in which each point represents the height x and weight y of one Martian. Since we have observed the entire population, there is no question that tall Martians tend to be heavier than short Martians.

Figure 8-1 The relationship between height and weight in the population of 200 Martians, with each Martian represented by a circle. The weights at any given height follow a normal distribution. In addition, the mean weight of Martians at any given height increases linearly with height, and the variability in weight at any given height is the same regardless of height. A population must have these characteristics to be suitable for linear regression or correlation analysis.
08x01
Figure 8-2 The line of means for the population of Martians in Figure 8-1.
08x02

There are a number of things we can conclude about the heights and weights of Martians as well as the relationship between these two variables. As noted in Chapter 2, the heights are normally distributed with mean μ = 40 cm and standard deviation σ = 5 cm. The weights are also normally distributed with mean μ = 12 g and standard deviation σ = 2.5 g. The most striking feature of Figure 8-1, however, is that the mean weight of Martians at each height increases as height increases.

For example, the Martians who are 32 cm tall weigh 7.1, 7.9, 8.3, and 8.8 g, so the mean weight of Martians who are 32 cm tall is 8 g. The eight Martians who are 46 cm tall weigh 13.7, 14.5, 14.8, 15.0, 15.1, 15.2, 15.3, and 15.8 g, so the mean weight of Martians who are 46 cm tall is 15 g. Figure 8-2 shows that the mean weight of Martians at each height increases linearly as height increases.

This line does not make it possible, however, to predict the weight of an individual Martian if you know his or her height. Why not? There is variability in weights among Martians at each height. Figure 8-1 reveals that standard deviation of weights of Martians with any given height is about 1 g. We need to distinguish this standard deviation from the standard deviation of weights of all Martians computed without regard for the fact that mean weight varies with height.

8.1.1. The Population Parameters

Now, let us define some new terms and symbols so that we can generalize from Martians to other populations with similar characteristics. Since we are considering how weight varies with height, call height the independent variable x and weight the dependent variable y. In some instances including the example at hand, we can only observe the independent variable and use it to predict the expected mean value of the dependent variable. (There is variability in the dependent variable at each value of the independent variable). In other cases, including controlled experiments, it is possible to manipulate the independent variable to control, with some uncertainty, the value of the dependent variable. In the first case, it is only possible to identify an association between the two variables, whereas in the second case it is possible to conclude that there is a causal link.[2]

For any given value of the independent variable x, it is possible to compute the value of the mean of all values of the dependent variable corresponding to that value of x. We denote this mean μy·x to indicate that it is the mean of all the values of y in the population at a given value of x. These means fall along a straight line given by

μyx=α+βx

in which α is the intercept and β is the slope[3] of the line of means. For example, Figure 8-2 shows that, on the average, the average weight of Martians increases by 0.5 g for every 1 cm increase in height, so the slope β of the μy·x versus x line is 0.5 g/cm. The intercept α of this line is −8 g. Hence,

μyx=8 g+(0.5 g/cm)x

There is variability about the line of means. For any given value of the independent variable x, the values of y for the population are normally distributed with mean μy·x and standard deviation σy·x. This notation indicates that σy·x is the standard deviation of weights (y) computed after allowing for the fact that mean weight varies with height (x). As noted above, the residual variation about the line of means for our Martians is 1 g; σy·x =1 g. The amount of this variability is an important factor in determining how useful the line of means is for predicting the value of the dependent variable, for example, weight, when you know the value of the independent variable, for example, height. The methods we develop below require that this standard deviation be the same for all values of x. In other words, the variability of the dependent variable about the line of means is the same regardless of the value of the independent variable.

In summary, we will be analyzing the results of experiments in which the observations were drawn from populations with these characteristics:

  • The mean of the population of the dependent variable at a given value of the independent variable increases (or decreases) linearly as the independent variable increases.

  • For any given value of the independent variable, the possible values of the dependent variable are distributed normally.

  • The standard deviation of population of the dependent variable about its mean at any given value of the independent variable is the same for all values of the independent variable.

The parameters of this population are α and β, which define the line of means, the dependent-variable population mean at each value of the independent variable, and σy·x, which defines the variability about the line of means.

Now let us turn our attention to the problem of estimating these parameters from samples drawn at random from such populations.