- Statistics

back | next

4.1 What is Numerical Data Representation

As we have seen in chapter 1, statistics is the study of making sense of data and consists of four components: collecting, summarizing, analyzing, and presenting data. In the second and third chapter we focused on summarizing data graphically; in this chapter we will concern ourselves with summarizing data numerically.

While charts are certainly very nice and often convincing, they do have at least one major draw-back: they are not very "portable". In other words, if you conduct an experiment measuring cholesterol levels of male and female patients it is certainly great to create appropriate histograms to illustrate the outcome of your experiment. However, if you are asked to summarize your results, for example for a radio show or just during a conversation, these charts will not help much.

Instead you need a simple, short, and easy-to-memorize summary of your data that - despite being short and simple - is meaningful to others with whom you might share your results.

For example, in our study of levels of cholesterol we could condense the results by stating that the "average" level of cholesterol for men is X, while the average for women is Y, and most people would understand.  Of course, when we condense data in this way, some level of detail is lost, but we gain the ease of summarizing the data quickly.

This chapter will discuss some "statistics" that can be used to summarize data numerically while still trying to capture much of the detailed structure hidden in the data. Among the descriptive statistics we will study are the mean, mode, and median, the range, variance, and standard deviation, and more detailed descriptors such as percentiles and skewness. Towards the end of the chapter we will learn about the "box plot" that combines many of the numerical descriptors in one picture.