MathCS.org: Intro to Statistics

back | next

8.1 Statistical Testing

In this chapter we will introduce hypothesis testing to enable us to answer questions such as the following:

A company is labeling their product to weigh, on average 10 oz. However, the last time we bought that product it only weighed 8.5 oz so we suspect the company is cheating and puts less product in the package that it is putting on the label. We want to determine whether our suspicion is true or not.
A new medical drug is supposed to work better in lowering a person's cholesterol level than currently existing drugs. From past experiments we know that the existing drugs lower colesterol levels by 10 units, on average (I made up the numbers -:). We want to determine whether the new drug really is more effective than the existing ones.

In general:

We are interested in testing a particular hypothsis and we want to decide whether it is true or not. Moreoever, we want to associate a probability with our decisison so that we know how certain (or uncertain) we are that our decision is correct.

We will approach this problem like a trial. Recall that in a standard trial in front of a judge or jury there are two mutually exclusive hypothesis:

The defendent is either guilty or not guilty

During the trial evidence is collected and weighed either in favor of the defendent being guilty (the job of the DA) or in favor of the defendent being not guilty (the job of the Defense Lawyer). At the end of the trial the judge (or jury) decides between the two alternatives and either convicts the defendent (if he/she was assumed to be proven guilty beyond a reasonable doubt) or lets them go (if there was sufficient doubt in the defendent's guilt).

Note that a defendent is "innocent until proven guilty". If the judge (or jury) decides a defendent is not guilty, that does not necessarily mean he/she is innocent. It simply means there was not enough evidence for a conviction.

In general, a statistical test involves four elements to a statistical test:

Null Hypothesis (written as H₀): The "tried and true situation", or "the status quo", or "innocent until proven guilty"
Alternative Hypothesis (written as H_a): This is what you suspect (or hope) is really true, the new situation, "guilty" - in general it is the opposite of the null hypothesis
Test Statistics: Collecting evidence - in our case we usually select a random sample and compute some number based on the sample data
Rejection Region: Do we reject the null hypothesis (and therefore accept the alternative), or do we declare our test inconclusive, and if we do decide to reject the null hypothesis, what is the probability that our decision is incorrect.

Please note that our final conclusion is always one of two options: we either reject the null hypothesis or we declare the test invalid. We never conclude anything else, such as accepting the null hypothesis.

Rejecting the null hypothesis when in fact it is true is called a Type I - Error. That's exactly the error we will be computing in the procedure above when we reject the null hypothesis. It should, of course, be small so that we can be confident in our decision to reject the null hypothesis.
Accepting the null hypothesis when in fact it is false is called a Type II - Error. This type of probability is not covered by our procedure (which is why we will never accept the null hypothesis, we rather declare our test inconclusive if necessary)

Example: A new antihypertensive drug is tested. It is supposed to lower blood pressure more than other drugs. Other drugs have been found to lower the pressure by 10 mmHg on average, so we suspect (or hope) that our drug will lower blood pressure by more than 10 mmHg. To collect evidence, we select a random sample of size n = 62 (say), which was found to have a sample mean of 11.3 and a sample standard deviation of 5.1. Is the new drug better than the old drugs, i.e. does the new drug lower blood pressure more than other drugs?

Since the sample mean is 11.3, which is more than other drugs, it looks like this sample mean supports the claim (because the mean from our sample is indeed bigger than 10). But - knowing that we can never be 100% certain - we must compute a probability and associate that with our conclusion, if indeed we want to make that conclusion.

In other words, we need to setup the four components of a statistical test: the population is the amount of decrease in blood pressure in people who have been given the new drug.

The Null Hypothesis is the "tried and true" assumption that all drugs are about the same and the new drug has about the same effect as all other drugs. Thus, the null hypothesis is that the average decrease in blood pressure (the population mean) is 10 mmHg, just as for all other drugs.
The Alternative Hypothesis is what we hope to be true, i.e. that the new drug results in higher decrease than the traditional dugs. Thus, the alternative hypothesis is that the average decrease in blood pressure (the population mean) is more than 10 mmHg.
For our Test Statistics we collect evidence in form of our random sample. We found that for this random sample the sample mean is 11.3 mmHg, the sample standard deviation is 5.1 mmHg, and the sample size N is 62. These figures are converted into a single number (as described in the next chapter). In this case the test statistics will turn out to be

z = 2.01
Rejection Region: Finally we use the test statistics z = 2.01 to compute the probability p of committing an error in deciding that the null hypothesis is true (the type-1 error). If that error is small, we do indeed decide to reject the null hypothesis, otherwise we will declare the test to be invalid. In this case the probability will turn out to be (see next chapter):

p = 2*P(z > 2.01) = 0.044 or 4.4%

So, if we decide to reject the null hypothesis, that decision is invalid with a probability of about 4% (or correct with a probability of 96%). That's good enough for us so we decide indeed to reject the null hypothesis. Since we reject the null hypothesis we automatically accept the alternative, and thus we think there is sufficient evidence that the new drug is better than the existing drugs in lowering blood pressure.

So, how do we compute the above numbers to arrive at this decision ... read the next section -:)

MathCS.org - Statistics

8.1 Statistical Testing