Filter results by Topics

Your search for all content returned 32 results

Save search You must be logged in as an individual save a search. Log-in/register
Book
Primer of Biostatistics, 7th Edition

by Stanton A. Glantz

A concise, engagingly written introduction to understanding statistics as they apply to medicine and the life sciences

CD-ROM performs 30 statistical tests

Don't be afraid of biostatistics anymore! Primer of Biostatistics, 7th Edition demystifies this challenging topic in an interesting and enjoyable manner that assumes no prior knowledge of the subject. Faster than you thought possible, you'll understand test selection and be able to evaluate biomedical statistics critically, knowledgeably, and confidently.

With Primer of Biostatistics, you'll start with the basics, including analysis of variance and the t test, then advance to multiple comparison testing, contingency tables, regression, and more. Illustrative examples and challenging problems, culled from the recent biomedical literature, highlight the discussions throughout and help to foster a more intuitive approach to biostatistics.

The companion CD-ROM contains everything you need to run thirty statistical tests of your own data. Review questions and summaries in each chapter facilitate the learning process and help you gauge your comprehension. By combining whimsical studies of Martians and other planetary residents with actual papers from the biomedical literature, the author makes the subject fun and engaging.

Coverage includes:

• How to summarize data

• How to test for differences between groups

• The t test

• How to analyze rates and proportions

• What does "not significant" really mean?

• Confidence intervals

• How to test for trends

• Experiments when each subject receives more than one treatment

• Alternatives to analysis of variance and the t test based on ranks

• How to analyze survival data

Book
Schaum's Outline of Statistics, 6th Edition

by The late Dr. Murray R. Spiegel, Larry J. Stephens

Tough test questions? Missed lectures? Not enough time? Fortunately, there's Schaum's!

More than 40 million students have trusted Schaum's to help them succeed in the classroom and on exams. Schaum's is the key to faster learning and higher grades in every subject. Each Outline presents all the essential course information in an easy-to-follow, topic-by-topic format. Helpful tables and illustrations increase your understanding of the subject at hand.

Schaum's Outline of Statistics, Sixth Edition, includes more than 500 fully solved problems, examples, and practice exercises to sharpen your problem-solving skills. Plus, you will have access to 25 detailed videos featuring math instructors who explain how to solve the most commonly tested problems—it's just like having your own virtual tutor! You'll find everything you need to build confidence, skills, and knowledge for the highest score possible.This powerful resource features:

• Over 500 problems, solved step by step

• Information on frequency distribution, elementary probability theory, elementary sampling theory, statistical decision theory, and analysis of variance

• Updated content to match the latest curriculum

• An accessible format for quick and easy review

• Clear explanations for key concepts

• Access to revised Schaums.com website with 25 problem-solving videos, and more

Book Chapter
10. Alternatives to Analysis of Variance and the t test Based on Ranks

10. Alternatives to Analysis of Variance and the t test Based on Ranks

As already noted, analysis of variance is called a parametric statistical method because it is based on estimates of the two population parameters, the mean and standard deviation (or variance), that completely define a normal distribution. Given the assumption that the samples are drawn from normally distributed populations, one can compute the distributions of the F or t test statistics that will occur in all possible experiments of a given size when the treatments have no effect. The critical values that define a value of F or t can then be obtained from that distribution. When the assumptions of parametric statistical methods are satisfied, they are the most powerful tests available.

If the populations the observations were drawn from are not normally distributed (or are not reasonably compatible with other assumptions of a parametric method, such as equal variances in all the treatment groups), parametric methods become quite unreliable because the mean and standard deviation, the key elements of parametric statistics, no longer completely describe the population. In fact, when the population substantially deviates from normality, interpreting the mean and standard deviation in terms of a normal distribution can produce a very misleading picture.

For example, recall our discussion of the distribution of heights of the entire population of Jupiter. The mean height of all Jovians is 37.6 cm in Figure 2-3A and the standard deviation is 4.5 cm. Rather than being equally distributed about the mean, the population is skewed toward taller heights. Specifically, the heights of Jovians range from 31 to 52 cm, with most heights around 35 cm. Figure 2-3B shows what the population of heights would have been if, instead of being skewed toward taller heights, they had been normally distributed with the same mean and standard deviation as the actual population (in Figure 2-3A). The heights would have ranged from 26 to 49 cm, with most heights around 37 to 38 cm. Simply looking at Figure 2-3 should convince you that envisioning a population on the basis of the mean and standard deviation can be quite misleading if the population does not, at least approximately, follow the normal distribution.

The same thing is true of statistical tests that are based on the normal distribution. When the population the samples were drawn from does not at least approximately follow the normal distribution, these tests can be quite misleading. In such cases, it is possible to use the ranks of the observations rather than the observations themselves to compute statistics that can be used to test hypotheses. By using ranks rather than the actual measurements it is possible to retain much of the information about the relative size of responses without making any assumptions about how the population the samples were drawn from is distributed. Since these tests are not based on the parameters of the underlying population, they are called nonparametric or distribution-free methods.[1] All the methods we will discuss require only that the distributions under the different treatments have similar shapes, but there is no restriction on what those shapes are.[2]

When the observations are drawn from normally distributed populations, the nonparametric methods in this chapter are about 95% as powerful as the analogous parametric methods. As a result, power for these tests can be estimated by computing the power of the analogous parametric test. When the observations drawn from populations that are not normally distributed, nonparametric methods are not only more reliable but also more powerful than parametric methods.

Unfortunately, you can never observe the entire population. So how can you tell whether the assumptions such as normality are met, to permit using the parametric tests such as analysis of variance? The simplest approach is to plot the observations and look at them. Do they seem compatible with the assumptions that they were drawn from normally distributed populations with roughly the same variances, that is, within a factor of 2 to 3 of each other? If so, you are probably safe in using parametric methods. If, on the other hand, the observations are heavily skewed (suggesting a population such as the Jovians in Fig. 2-3A) or appear to have more than one peak, you probably will want to use a nonparametric method. When the standard deviation is about the same size or larger than the mean and the variable can take on only positive values, this is an indication that the distribution is skewed. (A normally distributed variable would have to take on negative values.) In practice, these simple rules of thumb are often all you will need.

There are two ways to make this procedure more objective. The first is to plot the observations as a normal probability plot. A normal probability plot has a distorted vertical scale that makes normally distributed observations plot as a straight line (just as exponential functions plot as a straight line on a semilogarithmic graph). Examining how straight the line is will show how compatible the observations are with a normal distribution. One can also construct a χ2 statistic to test how closely the observed data agree with those expected if the population is normally distributed with the same mean and standard deviation. Since in practice simply looking at the data is generally adequate, we will not discuss these approaches in detail.[3]

Unfortunately, none of these methods is especially convincing one way or the other for the small sample sizes common in biomedical research, and your choice of approach (i.e., parametric versus nonparametric) often has to be based more on judgment and preference than hard evidence.

One informal approach is to do the analysis with both the applicable parametric and nonparametric methods, then compare the results. If the data are from a normal population, then the parametric method should be more sensitive (and so provide a lower P value), whereas if there is substantial nonnormality then the nonparametric method should be more sensitive (and so provide the lower P value). If the data are only slightly nonnormal, the two approaches should give similar results.

Things basically come down to the following difference of opinion: Some people think that in the absence of evidence that the data were not drawn from a normally distributed population, one should use parametric tests because they are more powerful and more widely used. These people say that you should use a nonparametric test only when there is positive evidence that the populations under study are not normally distributed. Others point out that the nonparametric methods discussed in this chapter are 95% as powerful as parametric methods when the data are from normally distributed populations and more reliable when the data are not from normally distributed populations. They also believe that investigators should assume as little as possible when analyzing their data. They therefore recommend that nonparametric methods be used except when there is positive evidence that parametric methods are suitable. At the moment, there is no definitive answer stating which attitude is preferable. And there probably never will be such an answer.

Book Chapter
16. Analysis of Variance

16. Analysis of Variance

In Chapter 8 we used sampling theory to test the significance of differences between two sampling means. We assumed that the two populations from which the samples were drawn had the same variance. In many situations there is a need to test the significance of differences between three or more sampling means or, equivalently, to test the null hypothesis that the sample means are all equal.

EXAMPLE 1.

Suppose that in an agricultural experiment four different chemical treatments of soil produced mean wheat yields of 28, 22, 18, and 24 bushels per acre, respectively. Is there a significant difference in these means, or is the observed spread due simply to chance?

Problems such as this can be solved by using an important technique known as analysis of variance, developed by Fisher. It makes use of the F distribution already considered in Chapter 11.

Book Chapter
7. The Binomial, Normal, and Poisson Distributions

7. The Binomial, Normal, and Poisson Distributions

If p is the probability that an event will happen in any single trial (called the probability of a success) and q = 1 – p is the probability that it will fail to happen in any single trial (called the probability of a failure), then the probability that the event will happen exactly X times in N trials (i.e., X successes and NX failures will occur) is given by

p(X)=(NX)pXqNX=N!X!(NX)!pXqNX

(1)

where X = 0, 1, 2,…, N; N! = N(N – 1)(N – 2)···1; and 0! = 1 by definition (see Problem 6.34).

EXAMPLE 1.

The probability of getting exactly 2 heads in 6 tosses of a fair coin is

(62)(12)2(12)62=62!4!(12)6=1564

using formula (1) with N = 6, X = 2, and p = q = 12.

Using EXCEL, the evaluation of the probability of 2 heads in 6 tosses is given by the following: =BINOMDIST(2,6,0.5,0), where the function BINOMDIST has 4 parameters.

The first parameter is the number of successes, the second is the number of trials, the third is the probability of success, and the fourth is a 0 or 1. A zero gives the probability of the number of successes and a 1 gives the cumulative probability. The function =BINOMDIST(2,6,0.5,0) gives 0.234375 which is the same as 15/64.

EXAMPLE 2.

The probability of getting at least 4 heads in 6 tosses of a fair coin is

(64)(12)4(12)64+(65)(12)5(12)65+(66)(12)6(12)66=1564+664+164=1132

The discrete probability distribution (1) is often called the binomial distribution since for X = 0, 1, 2,…, N it corresponds to successive terms of the binomial formula, or binomial expansion,

(q+p)N=qN+(N1)qN1p+(N2)qN2p2++pN

(2)

where 1, (N1), (N2),… are called the binomial coefficients.

Using EXCEL, the solution is =1-BINOMDIST(3,6,0.5,1) or 0.34375 which is the same as 11/32. Since Pr{X≥ 4} = 1 – Pr{X≤ 3}, and BINOMDIST(3,6,0.5,1) = Pr{X ≤ 3}, this computation will give the probability of at least 4 heads.

EXAMPLE 3.

(q+p)4=q4+(41)q3p+(42)q2p2+(43)qp3+p4=q4+4q3p+6q2p2+4qp3+p4

Some properties of the binomial distribution are listed in Table 7.1.

Table 7.1 Binomial Distribution

Mean

μ=Np

Variance

σ2=Npq

Standard deviation

σ=Npq

Moment coefficient of skewness

α3=qpNpq

Moment coefficient of kurtosis

α4=3+16pqNpq

EXAMPLE 4.

In 100 tosses of a fair coin the mean number of heads is μ=Np=(100)(12)=50; this is the expected number of heads in 100 tosses of the coin. The standard deviation is σ=Npq=(100)(12(12)=5.

Book Chapter
1. Biostatistics and Clinical Practice

1. Biostatistics and Clinical Practice

Suppose researchers believe that administering some drug increases urine production in proportion to the dose and to study it they give different doses of the drug to five different people, plotting their urine production against the dose of drug. The resulting data, shown in Figure 1-2A, reveal a strong relationship between the drug dose and daily urine production in the five people who were studied. This result would probably lead the investigators to publish a paper stating that the drug was an effective diuretic.

Figure 1-2 (A) Results of an experiment in which researchers administered five different doses of a drug to five different people and measured their daily urine production. Output increased as the dose of drug increased in these five people, suggesting that the drug is an effective diuretic in all people similar to those tested. (B) If the researchers had been able to administer the drug to all people and measure their daily urine output, it would have been clear that there is no relationship between the dose of drug and urine output. The five specific individuals who happened to be selected for the study in panel A are shown as shaded points. It is possible, but not likely, to obtain such an unrepresentative sample that leads one to believe that there is a relationship between the two variables when there is none. A set of statistical procedures called tests of hypotheses permits one to estimate the chance of getting such an unrepresentative sample.
01x02

The only statement that can be made with absolute certainty is that as the drug dose increased, so did urine production in the five people in the study. The real question of interest, however, is: How is the drug likely to affect all people who receive it? The assertion that the drug is effective requires a leap of faith from the limited experience, shown in Figure 1-2A, to all people.

Now, pretend that we knew how every person who would ever receive the drug would respond. Figure 1-2B shows this information. There is no systematic relationship between the drug dose and urine production! The drug is not an effective diuretic.

How could we have been led so far astray? The dark points in Figure 1-2B represent the specific individuals who happened to be studied to obtain the results shown in Figure 1-2A. While they are all members of the population of people we are interested in studying, the five specific individuals we happened to study, taken as a group, were not really representative of how the entire population of people responds to the drug.

Looking at Figure 1-2B should convince you that obtaining such an unrepresentative sample of people, though possible, is not very probable. One set of statistical procedures, called tests of hypotheses, permit you to estimate the likelihood of concluding that two things are related as Figure 1-2A suggests when the relationship is really due to bad luck in selecting people for study, and not a true effect of the drug. In this example, one can estimate that such a sample of people will turn up in a study of the drug only about 5 times in 1000 when the drug actually has no effect.

Of course it is important to realize that although statistics is a branch of mathematics, there can be honest differences of opinion about the best way to analyze a problem. This fact arises because all statistical methods are based on relatively simple mathematical models of reality, so the results of the statistical tests are accurate only to the extent that the reality and the mathematical model underlying the statistical test are in reasonable agreement.

Book Chapter
12. The Chi-Square Test

12. The Chi-Square Test

As we have already seen many times, the results obtained in samples do not always agree exactly with the theoretical results expected according to the rules of probability. For example, although theoretical considerations lead us to expect 50 heads and 50 tails when we toss a fair coin 100 times, it is rare that these results are obtained exactly.

Suppose that in a particular sample a set of possible events E1, E2, E3,…, Ek (see Table 12.1) are observed to occur with frequencies o1, o2, o3,…, ok, called observed frequencies, and that according to probability rules they are expected to occur with frequencies e1, e2, e3,…, ek, called expected, or theoretical, frequencies. Often we wish to know whether the observed frequencies differ significantly from the expected frequencies.

Table 12.1  

Event

E1

E2

E3

Ek

Observed frequency

o1

o2

o3

ok

Expected frequency

e1

e2

e3

ek

Book Chapter
7. Confidence Intervals

7. Confidence Intervals

In Chapter 4, we defined the t statistic to be

t=Difference of sample meansStandard error of difference of sample means

then computed its value for the data observed in an experiment. Next, we compared the result with the value tα that defined the most extreme 100α percent of the possible values to t that would occur (in both tails) if the two samples were drawn from a single population. If the observed value of t exceeded tα (given in Table 4-1), we reported a “statistically significant” difference, with P <>α As Figure 4-4 showed, the distribution of possible values of t has a mean of zero and is symmetric about zero.

On the other hand, if the two samples are drawn from populations with different means, the distribution of values of t associated with all possible experiments involving two samples of a given size is not centered on zero; it does not follow the t distribution. As Figures 6-3 and 6-5 showed, the actual distribution of possible values of t has a nonzero mean that depends on the size of the treatment effect. It is possible to revise the definition of t so that it will be distributed according to the t distribution in Figure 4-4 regardless of whether or not the treatment actually has an effect. This modified definition of t is

t=           Difference of sample means true difference in population meansStandard error of difference of sample means

Notice that if the hypothesis of no treatment effect is correct, the difference in population means is zero and this definition of t reduces to the one we used before. The equivalent mathematical statement is

t=(X¯1X¯2)(μ1μ2)sX¯1X¯2

In Chapter 4 we computed t from the observations, then compared it with the critical value for a “big” value of t with ν = n1 + n2 − 2 degrees of freedom to obtain a P value. Now, however, we cannot follow this approach since we do not know all the terms on the right side of the equation. Specifically, we do not know the true difference in mean values of the two populations from which the samples were drawn, μ1μ2. We can, however, use this equation to estimate the size of the treatment effect, μ1μ2.

Instead of using the equation to determine t, we will select an appropriate value of t and use the equation to estimate μ1μ2. The only problem is that of selecting an appropriate value for t.

By definition, 100α percent of all possible values of t are more negative than −tα or more positive than +tα. For example, only 5% of all possible t values will fall outside the interval between −t.05 and +t.05, where t.05 is the critical value of t that defines the most extreme 5% of the t distribution (tabulated in Table 4-1). Therefore, 100(1 − α) percent of all possible values of t fall between −tα and +tα. For example, 95% of all possible values of t will fall between −t.05 and +t.05.

Every different pair of random samples we draw in our experiment will be associated with different values of, X¯1X¯2 and sx¯1x¯2 and 100(1 − α) percent of all possible experiments involving samples of a given size will yield values of t that fall between −tα and +tα. Therefore, for 100(1 − α) percent of all possible experiments

tα(X¯1X¯2)(μ1μ2)sX¯1X¯2+tα

Solve this equation for the true difference in sample means

(X¯1X¯2)tαsX¯1X¯2μ1μ2(X¯1X¯2)+tαsX¯1X¯2

In other words, the actual difference of the means of the two populations from which the samples were drawn will fall within ta standard errors of the difference of the sample means of the observed difference in the sample means. (ta has ν = n1 + n2 − 2 degrees of freedom, just as when we used the t distribution in hypothesis testing). This range is called the 100(1 − α) percent confidence interval for the difference of the means. For example, the 95% confidence interval for the true difference of the population means is

(X¯1X¯2)t.05sX¯1X¯2μ1μ2(X¯1X¯2)+t.05sX¯1X¯2

This equation defines the range that will include the true difference in the means for 95% of all possible experiments that involve drawing samples from the two populations under study.

Since this procedure to compute the confidence interval for the difference of two means uses the t distribution, it is subject to the same limitations as the t test. In particular, the samples must be drawn from populations that follow a normal distribution at least approximately.[2]

Book Chapter
14. Correlation Theory

14. Correlation Theory

In Chapter 13 we considered the problem of regression, or estimation, of one variable (the dependent variable) from one or more related variables (the independent variables). In this chapter we consider the closely related problem of correlation, or the degree of relationship between variables, which seeks to determine how well a linear or other equation describes or explains the relationship between variables.

If all values of the variables satisfy an equation exactly, we say that the variables are perfectly correlated or that there is perfect correlation between them. Thus the circumferences C and radii r of all circles are perfectly correlated since C = 2πr. If two dice are tossed simultaneously 100 times, there is no relationship between corresponding points on each die (unless the dice are loaded); that is, they are uncorrelated. Such variables as the height and weight of individuals would show some correlation.

When only two variables are involved, we speak of simple correlation and simple regression. When more than two variables are involved, we speak of multiple correlation and multiple regression. This chapter considers only simple correlation. Multiple correlation and regression are considered in Chapter 15.

Book Chapter
13. Curve Fitting and the Method of Least Squares

13. Curve Fitting and the Method of Least Squares

Very often in practice a relationship is found to exist between two (or more) variables. For example, weights of adult males depend to some degree on their heights, the circumferences of circles depend on their radii, and the pressure of a given mass of gas depends on its temperature and volume.

It is frequently desirable to express this relationship in mathematical form by determining an equation that connects the variables.