Chapter 15
Descriptive Statistics

 


An overview of the field of statistics is shown in Figure 15.1 (also shown below). As you can see, the field of statistics can be divided into descriptive statistics and inferential statistics (and there are further subdivisions under inferential statistics which is the topic of the next chapter).

 

 

 

This chapter is about descriptive statistics (i.e., the use of statistics to describe, summarize, and explain or make sense of a given set of data).

 

 

Frequency Distributions

One useful way to view the data of a variable is to construct a frequency distribution (i.e., an arrangement in which the frequencies, and sometimes percentages, of the occurrence of each unique data value are shown).

 

 

 

 

 

 

 

 

 

 

Graphic Representations of Data

 

Another excellent way to describe your data (especially for visually oriented learners) is to construct graphical representations of the data (i.e., pictorial representations of the data in two-dimensional space).

 

Bar Graphs

A bar graph uses vertical bars to represent the data.

 

 

 

 

Histograms

A histogram is a graphic that shows the frequencies and shape that characterize a quantitative variable.

 

 

 

Line Graphs

A line graph uses one or more lines to depict information about one or more variables.

 

 

 

 

Scatterplots 

A scatterplot is used to depict the relationship between two quantitative variables.

 

 

 

 

 

Measures of Central Tendency

 

Measures of central tendency provide descriptive information about the single numerical value that is considered to be the most typical of the values of a quantitative variable.

 

The mode is simply the most frequently occurring number.

 

The median is the center point in a set of numbers; it is also the fiftieth percentile.

·        Rule One. If you have an odd number of numbers, the median is the center number (e.g., three is the median for the numbers 1, 1, 3, 4, 9).

·        Rule Two. If you have an even number of numbers, the median is the average of the two innermost numbers (e.g., 2.5 is the median for the numbers 1, 2, 3, 7).
 

The mean is the arithmetic average (e.g., the average of the numbers 2, 3, 3, and 4, is equal to 3).
 

A Comparison of the Mean, Median, and Mode

The mean, median, and mode are affected by what is called skewness (i.e., lack of symmetry) in the data.

 

 

mean < median < mode.

mean > median > mode.

You can use the following two rules to provide some information about skewness even when you cannot see a line graph of the data (i.e., all you need is the mean and the median):

1.      Rule One. If the mean is less than the median, the data are skewed to the left.

2.      Rule Two. If the mean is greater than the median, the data are skewed to the right.

 

Measures of Variability

 

Measures of variability tell you how "spread out" or how much variability is present in a set of numbers. They tell you how different your numbers tend to be. Note that measures of variability should be reported along with measures of central tendency because they provide very different but complementary and important information. To fully interpret one (e.g., a mean), it is helpful to know about the other (e.g., a standard deviation).

 

An easy way to get the idea of variability is to look at two sets of data, one that is highly variable and one that is not very variable.

 

For example, which of these two sets of numbers appears to be the most spread out, Set A or Set B?

 

If you said Set B is more spread out, then you are right! The numbers in set B are more "spread out"; that is, they are more variability. 

 

All of the measures of variability should give us an indication of the amount of variability in a set of data. We will discuss three indices of variability: the range, the variance, and the standard deviation.

 

Range

A relatively crude indicator of variability is the range (i.e., which is the difference between the highest and lowest numbers).

 

Variance and Standard Deviation

Two commonly used indicators of variability are the variance and the standard deviation.

Table 15.4 shows you how to easily calculate, by hand, the variance and standard deviation.

 

Virtually everyone in education is already familiar with the normal curve (a picture of one is shown in Figure 15.7 on page 449).

 

If data are normally distributed, then an easy rule to apply to the data is what we call “the 68, 95, 99.7 percent rule." That is . . .

 

 

Measures of Relative Standing

Measures of relative standing are used to provide information about where a particular score falls in relation to the other scores in a distribution of data. Two commonly used measures of relative standing are percentile ranks and Z-scores. 

 

Here is Figure 15.8 which shows these and some additional types of standard scores. You can determine the mean of the type of standard scores below by simply looking under Mean. You can determine the standard deviation by looking at how much the scores increase as you move from the mean to 1 SD.

 


 

Percentile Ranks

A percentile rank tells you the percentage of scores in a reference group (i.e., in the norming group) that fall below a particular raw score.

Z-Scores

A z-score tells you how many standard deviations (SD) a raw score falls from the mean.

 

To transform a raw score into z-score units, just use the following formula:

 
                     Raw score - Mean
Z-score =     ------------------------
                    Standard Deviation

 

For example, you know that the mean for IQ scores is 100 and the standard deviation for IQ scores is 15 (because we told you this in the book and because you can see it by examining Figure 15.8).

 

Therefore, if your IQ is 115, you can get your z-score...

 

                       115   -   100               15
  Z-score   =   ---------------   =    --------   =    1
                              15                       15

 

An IQ of 115 falls one standard deviation above the mean.

 

Note that once you have a set of z-scores, you can convert to any other scale by using this formula: New score = Z-score(SD of new scale) + mean of the new scale.

 

Examining Relationships Among Variables

 

We have been talking about relationships among variables throughout your textbook. For example, we have already talked about correlation (e.g., see Figure 2.2 on page 44), partial correlation (e.g., see page 341), analysis of variance which is used for factorial designs (e.g., see pages 286-291), and analysis of covariance (e.g., see pages 274-275 and pages 341-342).

 

At this point in this chapter on descriptive statistics, I introduce two additional techniques that you also can use for examining relationships among variables: contingency tables and regression analysis.

 

Contingency Tables

When all of your variables are categorical, you can use contingency tables to see if your variables are related.

 

When interpreting a contingency table, remember to use the following two rules:

 

 

Regression Analysis

Regression analysis is a set of statistical procedures used to explain or predict the values of a quantitative dependent variable based on the values of one or more independent variables.

 

On pages 455-459, I show you the components of the regression equations (e.g., the Y-intercept and the regression coefficients). Here are the important definitions:

 

 

 

Ŷ = 9,234.56 + 7,638.85 (X)

 

 

On pages 458-459, I show a multiple regression equation with two independent variables.