Chapter 16
Inferential Statistics

 

(REMINDER: as you read the lectures, it’s a good idea to also look at the concept map for each  chapter. The concept maps help to give you the big picture and see how the concepts are related. Here is the link to all of the concept maps; just select the one for this chapter: http://www.southalabama.edu/coe/bset/johnson/dr_johnson/2conceptmaps.htm)

 

This is probably the most challenging chapter in your book. However, you can understand it. It just takes attention and effort. After you carefully study the material, it will become clear to you. I will also be available to answer any questions you have.

 

Please start this chapter by taking a look (again) at the divisions in the field of statistics that were shown in Figure 15.1 (p. 434) and also shown in the previous lecture.

 

Inferential statistics is defined as the branch of statistics that is used to make inferences about the characteristics of a  populations based on sample data.

 

Looking at Table 16.1 (p.464 and shown below) you can see that statisticians use Greek letters to symbolize population parameters (i.e., numerical characteristics of populations, such as means and correlations) and English letters to symbolize sample statistics (i.e., numerical characteristics of samples, such as means and correlations).

 

For example, we use the Greek letter mu (i.e., µ) to symbolize the population mean and the Roman/English letter X with a bar over it,  (called X bar), to symbolize the sample mean. 

 

Sampling Distributions

One of the most important concepts in inferential statistics is that of the sampling distribution. That's because the use of a sampling distributions is what allows us to make "probability" statements in inferential statistics.

Although I just described the sampling distribution of the mean, it is important to remember that a sampling distribution can be obtained for any statistic. For example, you could also obtain the following sampling distributions:

 

The standard deviation of a sampling distribution is called the standard error. In other words, the standard error is just a special kind of standard deviation and you learned what a standard deviation was in the last chapter.

 

It is important to understand that researchers do not actually empirically construct sampling distributions! When conducting research, researchers typically select only one sample from the population of interest; they do not collect all possible samples.

So please remember that the idea of sampling distributions (i.e., the idea of probability distributions obtained from repeated sampling) underlies our ability to make probability statements in inferential statistics.

 

Now, I'm going to cover the two branches of inferential statistics (i.e., estimation and hypothesis testing) that were shown in Figure 15.1: estimation and hypothesis testing.

 

 

Estimation

The key estimation question is "Based on my random sample, what is my estimate of the population parameter?"

 

There are actually two types of estimation.

In other words, a point estimate is a single number, and an interval estimate is a range of numbers.

In the above example, you used the value of the sample mean as the estimate of the population mean.

Oftentimes, we like to put an interval around our point estimates so that we realize that the actual population value is somewhat different from our point estimate because sampling error is always present in sampling.

 

 

 

You might ask: So why don’t we just use 99% confidence intervals rather than 95% intervals, since you will make fewer mistakes? 

 

 

Hypothesis Testing

Hypothesis testing is the branch of inferential statistics that is concerned with how well the sample data support a null hypothesis and when the null hypothesis can be rejected in favor of the alternative hypothesis.

 

To get the idea of null hypothesis testing in your head, reread Exhibit 16.1 (p. 473 and shown below).

 

Exhibit 16.1  An Analogy From Jurisprudence

The United States criminal justice system operates on the assumption that the defendant is innocent until proven guilty beyond a reasonable doubt. In hypothesis testing, this assumption is called the null hypothesis. That is, researchers assume that the null hypothesis is true until the evidence suggests that it is not likely to be true. The researcher's null hypothesis might be that a technique of counseling does not work any better than no counseling. The researcher is kind of like a prosecuting attorney. The prosecuting attorney brings someone to trial when he or she believes there is some evidence against the accused, and the researcher brings a null hypothesis to "trial" when he or she believes there is some evidence against the null hypothesis (i.e., the researcher actually believes that the counseling technique does work better than no counseling). In the courtroom, the jury decides what constitutes reasonable doubt, and they make a decision about guilt or innocence. The researcher uses inferential statistics to determine the probability of the evidence under the assumption that the null hypothesis is true. If this probability is low, the researcher is able to reject the null hypothesis and accept the alternative hypothesis. If this probability is not low, the researcher is not able to reject the null hypothesis. No matter what decision is made, things are still not completely settled because a mistake could have been made. In the courtroom, decisions of guilt or innocence are sometimes overturned or found to be incorrect. Similarly, in research, the decision to reject or not reject the null hypothesis is based on probability, so researchers sometimes make a mistake. However, inferential statistics gives researchers the probability of their making a mistake.

 

 

Now take a look at the research questions and the null and alternative hypotheses shown below and in Table 16.2 (p.474).

 

 

You may be wondering, when do you actually reject the null hypothesis and make the decision to tentatively accept the alternative hypothesis?


This full process of hypothesis testing is summarized in Table 16.3 (p.480) and shown below.

 

Here is Table 16.3, in case you don't have your book handy.

 

 

 

Step 5 shows that you must decide what the results of your research study actually mean.

 

The next idea is for you to realize that you will either make a correct decision about statistical significance or you will make an error whenever you conduct a hypothesis test.

 

 
 

 

Hypothesis Testing in Practice

In this last section of the chapter, I apply the process of hypothesis testing (which is also called "significance testing") to the data set given in Table 15.1 (p. 435) and shown again here (below).

 

 

 

 

(The answers to the earlier questions about the two types of errors are in the first case a Type I error was made and in the second case a Type II error was made.)
 

 

Note that in all of the following examples I will be doing the same thing. I will get the p-value and compare it to my preset significance level of .05 to see if the relationship is statistically significant. And then I will also interpret the results by looking at the data, looking at an effect size indicator, and by thinking about the practical importance of the result.

 

t-Test for Independent Samples
One frequently used statistical test is called the t-test for independent samples. We do this when we want to determine if the difference between two groups is statistically significant.

 

Here is an example of the t-test for independent samples using our recent college graduate data set:

 

 

 

 


The probability value was .048 (I got this off of my SPSS printout).

 

One-Way Analysis of Variance
One-way analysis of variance is used to compare two or more group means for statistical significance.

 

Here is an example using our “recent college graduate” data set:

 

 

 

 

The probability value was .001 (I got this off of my SPSS printout).

 

Post Hoc Tests in Analysis of Variance
Here are the three average starting salaries for the three groups examined in the previous analysis of variance (i.e., these are the three sample means):

 

The question in post hoc testing is "Which pairs of means are significantly different?"

 

In this case that results in three post hoc tests that need to be conducted:

  1. First, is the difference between education and arts and sciences significantly different"

 

 

  1. Second, is the difference between education and business significantly different?

 

  1. Third, is the difference between arts and sciences and business significantly different?


In short, based on my post hoc tests, I have found that two of the differences in starting salary were statistically significant, and, in my view, these differences were also practically significant. 

 

The t-Test for Correlation Coefficients
This test is used to determine whether an observed correlation coefficient is statistically significant.

 

Here is an example using our “recent college graduate” data set:

 

                                                                                              in the population)

                                                                                                  the population)

·        The observed correlation in the sample was .63.

·        The probability value was .001.

·        Since .001 is < .05, I reject the null hypothesis.

·        The observed correlation was statistically significant.

·        I conclude that GPA and starting salary are correlated in the population.

·        If you square the correlation coefficient you obtain a “variance accounted for” effect size indicator: .63 squared is .397 which means that almost 40 percent of the variance in starting salary is explained or accounted for by GPA

·        Because the effect size is large and because GPA is something that students can control through studying, I conclude that this statistically significant correlation is also practically significant.

 

 

The t-Test for Regression Coefficients
This test is used to determine whether a regression coefficient is statistically significant.

 

The multiple regression equation analyzed in the last chapter is shown here again, but this time we will test each of the two regression coefficients for statistical significance.

 

 = 3,890.05  +  4,675.41 (X1)  +  26.13(X2)

            where, 

             is predicted starting salary

          3,890.05 is the Y intercept (or predicted starting salary when GPA and

                                           GRE Verbal are zero)

4,675.41 is the regression coefficient for grade point average

X1 is grade point average (GPA)

X2 is GRE Verbal

 

Research Question One: Is there a statistically significant relationship between starting salary (Y) and GPA (X1) controlling for GRE Verbal (X2)? That is, is the first regression coefficient statistically significant?

 

 

 

 

Research Question Two: Is there a statistically significant relationship between starting salary (Y) and GRE Verbal (X2), controlling for GPA (X1)? That is, is the second regression coefficient statistically significant?

 

 

 


The Chi-Square Test for Contingency Tables
This test is used to determine whether a relationship observed in a contingency table is statistically significant.


Believe it or not, we are done. My goal in this last section was to show that every single time we do one of these tests, you do the same thing. You get your probably value, compare it to your significance level, and, finally, you make a decision.

 

You have now come a long way toward understanding the logic of significance testing. Remember, when reading journal articles look out for those probability values (to see if they are less than .05), and also look for effect sizes and statements about whether a finding is practically significant

 

Congratulations!