MathCS.org - Statistics

back | next

5.2 Percentage Tables

So far we have analyzed data one variable at a time. We have seen how to compute mean, mode, median, and variance, but each formula only applied to one variable at a time. Now we want to investigate two (or more) variables simultaneously. Usually, a typical question about two variables is:

Is there some relation between one variable and another one, and if so, how can one use knowledge about one variable to predict, approximately, the other.

Answers to such questions can be very useful:

In most cases the "if" part is the difficult one to determine, i.e. it is not so easy to find out whether two variables (for example smoking and cancer) are indeed related, and even if they are related, it is even harder to determine which is cause and which is result (if smokers have higher cancer rates, does smoking cause cancer, or does having cancer cause you to smoke). In general, correlation does not necessarily imply causation. Here are two examples of incorrectly infering causation from correlation (see http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation):

But let's start at the beginning. We will start our investigation about relationships between variables by taking a closer look at representing data in tables. We won't worry about correlation or causation for now.

Example: The residents of Green Township were asked what their opinion about a new Zoning Ordinance was. The answers were broken down by age of the people who were questioned. The result of the survey is summarized in the following table:

age
50 or under
age
over 50
Total
For Zoning 92 87 179
Against or no opinion 158 75 233
Total 250 162 412

This table can, of course, be entered into Excel directly, and using a few tools that Excel provides the data entry and formatting is quick and effortless.

Note that some labels may not be completely visible - we will rectify that later automatically.

If your table does not look like this it is fine, but it should contain the appropriate row and column totals.

Row and Column Percenta

The above table looks nice, but is not very helpful. It is not meaningful, for example, to know that 92 people age 50 or under are for zoning. What would be useful, of course, would be to know percentages instead of actual figures. So, to better analyze the data we will convert each number to percent. However, for each entry there are three possible percentages to compute:

Row and column percentages are the most useful, and we will first show how to generate each of them, then discuss when to use which. 

Let's convert each number into the appropriate row percentages, i.e. we will use the row total to convert a number into percent:

First, we will copy the original table to a new location just below it:
  1. select all cells from the table
  2. choose "Copy" from the "Edit" menu
  3. position the cursor in the first column a few cells below the original table
  4. select "Paste" from the "Edit" menu.

A second copy of the original table will appear.

At this point you have a choice:

Excel would do the computations for you, of course, but in this case the power of Excel might actually be overkill. So, we will first convert the figures by hand, using a normal calculator, and then - as an appendix for the Excel enthusiast - we'll show how to use Excel if you insist.

Converting to Row Percentage

In cell B8 the 'raw data' is 92, indicating that there were 92 people age 50 or under who were 'for' zoning. The total in row 8 is 179, indicating that a total of 179 people were 'for' zoning. Thus, the row percentage of people 'for' zoning who are age 50 or under is

92 / 179 = 0.514 = 51.4%

Thus, 0.514 (or 51.4%) of those people 'for' zoning were age 50 or under, and therefore we manually enter the number 0.514 into cell B8, replacing the original value of 92.

Similarly, in cell C8 the raw data is 87, which we convert into row percentage as follows

87 / 179 = 0.486

Thus, 0.486 (or 48.6%) of those people 'for' zoning were over 50 years of age, so we manually enter the number 0.486, replacing the value of 87.

We repeat similar calculations to turn all of the raw data in rows 9 (dividing by 233) and 10 (dividing by 412) into row percentages, and we replace the row totals by 1 (or 100%). Our converted table should look like this, showing the converted numbers but not in percentages:

Finally, it is simple to get Excel to format the numbers in cells B8:D10 as percentages:

  1. Mark the cells containing the ratios in the second table (B8 to D10)
  2. Click the "%" symbol on the "Home" ribbon, or choose "Percentage" from the drop down menu in the "Number" group.

Converting to Column Percentage

Similarly, we can create a third table, containing column percentages. The details are left as an exercise, but for the calculations we need to convert the raw data values by dividing by the column totals instead of the row totals. If we do everything correctly our three tables (properly formatted) should look like this:

When to use Row Percentages, when Column Percentages

Now that we know how to create either row or column percentages, let's try to determine when to use which. For that, consider two similar but very much different questions:

These questions seem similar: we are looking at the intersection between the row "For Zoning" and the column "Age 50 or under". From the first table we know that 92 people fall into that category, but that number is not in percent. On the other hand, there are two candidates for the percentage number, 52.4% from the row percentages or 36.8% from the column percentages. Which one answers which question ?

From that example we see that the key to answer questions such as these is which group is considered the "total" group for the particular question:

It seems that generating these percentage tables is a fair amount of work. Of course Excel provides an easier method for generating such tables from actual data, which we will explore soon.

Appendix: Converting to Percentages using Excel

In this case it is simpler doing the proper computations using a regular calculator, but we could use Excel just as well. I recommmend you check out the video below, or follow the step-by-step instructions after that, but the video contaiins a few fun and helpful Excel tricks - you don't want to miss them.

If you prefer written instructions, try these: our starting point is the original table with the raw data values, copied to a second version as in this picture:

Now position the cursor into cell B8 (second table, containing the value 92). We will replace this "raw" value by a computed one as follows:

  1. type "=" (the equal sign)
  2. click once in the first table on the value 92 (cell B3)
  3. type "/" (the division symbol)
  4. click once in the first table on the value 179 (cell D3)
  5. hit ENTER
The value 92 in the second table will now be replaced by the appropriate ration 0.5139 (which is not in percent, but do not worry about that right now).

We repeat this calculation for the next cell, C9 (second table, containing the value 87). We compute row percentages:
  1. type "=" (the equal sign)
  2. click once in the first table on the value 87 (cell C3)
  3. type "/" (the division symbol)
  4. click once in the first table on the value 179 (cell D3)
  5. hit ENTER

If we keep going this way, we can convert all numbers into row percentages. Actually, all numbers so far will be decimal values, but we can easily format them as percentages as outlined above.

And it should be possible, now, to convert the raw data into column percentages, using Excel - see above for the final tables.

The advantages of this approach is that - while more work than using a calculator - the percentage table(s) will automatically update if the raw data values change if we use Excel formulas.

Practicing

To practice, consider the following table of raw numbers, relating sex (or gender) with puls rate (the numbers are from a survey for this class):

low pulse rate high pulse rate Totals
Male 1 1 2
Female 9 6 15
Totals 10 7 17

Convert the table into (a) row percentages and (b) column percentages, then use the appropriate figures to answer the following questions: