Chapter 16
Inferential Statistics

(REMINDER: as you read the lectures, it’s a good idea to also look at the concept map for each  chapter. The concept maps help to give you the big picture and see how the concepts are related. Here is the link to all of the concept maps; just select the one for this chapter: http://www.southalabama.edu/coe/bset/johnson/dr_johnson/2conceptmaps.htm)

This is probably the most challenging chapter in your book. However, you can understand it. It just takes attention and effort. After you carefully study the material, it will become clear to you. I will also be available to answer any questions you have.

Please start this chapter by taking a look (again) at the divisions in the field of statistics that were shown in Figure 15.1 (p. 434) and also shown in the previous lecture.

• This shows the "big picture."
• As you can see, inferential statistics is divided into estimation and hypothesis testing, and estimation is further divided into point and interval estimation.

Inferential statistics is defined as the branch of statistics that is used to make inferences about the characteristics of a  populations based on sample data.

• The goal is to go beyond the data at hand and make inferences about population parameters.
• In order to use inferential statistics, it is assumed that either random selection or random assignment was carried out (i.e., some form of randomization must is assumed).

Looking at Table 16.1 (p.464 and shown below) you can see that statisticians use Greek letters to symbolize population parameters (i.e., numerical characteristics of populations, such as means and correlations) and English letters to symbolize sample statistics (i.e., numerical characteristics of samples, such as means and correlations).

For example, we use the Greek letter mu (i.e., µ) to symbolize the population mean and the Roman/English letter X with a bar over it,  (called X bar), to symbolize the sample mean.

Sampling Distributions

One of the most important concepts in inferential statistics is that of the sampling distribution. That's because the use of a sampling distributions is what allows us to make "probability" statements in inferential statistics.

• A sampling distribution is defined as "The theoretical probability distribution of the values of a statistic that results when all possible random samples of a particular size are drawn from a population." (For simplicity you can view the idea of "all possible samples" as taking a million random samples. That is, just view it as taking a whole lot of samples!)
• A one specific type of sampling distribution is called the sampling distribution of the mean. If you wanted to generate this distribution through the laborious process of doing it by hand (which you would NOT need to do in practice),  you would randomly select a sample, calculate the mean, randomly select another sample, calculate the mean, and continue this process until you have calculated the means for all possible samples. This process will give you a lot of means, and you can construct a line graph to depict your sampling distribution of the mean (e.g., see Figure 16.1 on page 468).
• The sampling distribution of the mean is normally distributed (as long as your sample size is about 30 or more for your sampling).
• Also, note that the mean of the sampling distribution of the mean is equal to the population mean! That tells you that repeated sampling will, over the long run, produce the correct mean. The spread or variance shows you that sample means will tend to be somewhat different from the true population mean in most particular samples.

Although I just described the sampling distribution of the mean, it is important to remember that a sampling distribution can be obtained for any statistic. For example, you could also obtain the following sampling distributions:

• Sampling distribution of the percentage (or proportion).
• Sampling distribution of the variance.
• Sampling distribution of the correlation.
• Sampling distribution of the regression coefficient.
• Sampling distribution of the difference between two means.

The standard deviation of a sampling distribution is called the standard error. In other words, the standard error is just a special kind of standard deviation and you learned what a standard deviation was in the last chapter.

• The smaller the standard error, the less the amount of variability present in a sampling distribution.

It is important to understand that researchers do not actually empirically construct sampling distributions! When conducting research, researchers typically select only one sample from the population of interest; they do not collect all possible samples.

• The computer program that a researcher uses (e.g., SPSS and SAS) uses the appropriate sampling distribution for you.
•  The computer program will look at the type of statistical analysis you select (and also consider certain additional information that you have provided, such as the sample size in your study), and then the statistical program selects the appropriate sampling distribution.
• (It's kind of like the Greyhound Bus analogy: Leave the driving to us...SPSS will take care of generating the appropriate sampling distribution for you if you give it the information it needs.)

So please remember that the idea of sampling distributions (i.e., the idea of probability distributions obtained from repeated sampling) underlies our ability to make probability statements in inferential statistics.

Now, I'm going to cover the two branches of inferential statistics (i.e., estimation and hypothesis testing) that were shown in Figure 15.1: estimation and hypothesis testing.

Estimation

The key estimation question is "Based on my random sample, what is my estimate of the population parameter?"

• The basic idea is that you are going to use your sample data to provide information about the population.

There are actually two types of estimation.

• They can be first understood through the following analogy: Let's say that you take your car to your local car dealer's service department and you ask the service manager how much it will cost to repair your car. If the manager says it will cost you \$500 then she is providing a point estimate. If the manager says it will cost somewhere between \$400 and \$600 then she is providing an interval estimate.

In other words, a point estimate is a single number, and an interval estimate is a range of numbers.

• A point estimate is the value of your sample statistic (e.g., your sample mean or sample correlation), and it is used to estimate the population parameter (e.g., the population mean or the population correlation).
• For example, if you take a random sample from adults living an the United States and you find that the average income for the people in your sample is \$45,000, then your best guess or your point estimate for the population of adults in the U.S. will be \$45,000.

In the above example, you used the value of the sample mean as the estimate of the population mean.

• Again, whenever you engage in point estimation, all you need to do is to use the value of your sample statistic as your "best guess" (i.e., as your estimate) of the (unknown) population parameter.

Oftentimes, we like to put an interval around our point estimates so that we realize that the actual population value is somewhat different from our point estimate because sampling error is always present in sampling.

• An interval estimate (also called a confidence interval) is a range of numbers inferred from the sample that has a known probability of capturing the population parameter over the long run (i.e., over repeated sampling).
• See Figure 16.2, p.471, for a picture of twenty different confidence intervals randomly jumping around the population mean from sample to sample.) Here it is for your convenience:

• The "beauty" of confidence intervals is that we know their probability (over the long run) of including the true population parameter. (You can't do this with a point estimate.)
• Specifically, if you have the computer provide you with a 95 percent confidence interval (based on your data), then you will be able to be "95% confident" that it will include the population parameter. That is, your “level of confidence” is 95%.
• For example, you might take the point estimate of annual income of U.S. adults of \$45,000 (used earlier as a point estimate) and surround it by a 95% confidence interval. You might find that the confidence interval is \$43,000 to \$47,000. In this case, you can be "95% confident" that the average income is somewhere between \$43,000 and \$47,000.
• If you have the computer program give you a 99% confidence interval, then you can be "99% confident" that the confidence interval provided will include the population parameter (i.e., it will capture the true parameter 99% of the time in the long run).

You might ask: So why don’t we just use 99% confidence intervals rather than 95% intervals, since you will make fewer mistakes?

• The answer is that for a given sample size, the 99% confidence interval will be wider (i.e., less precise) than a 95% confidence interval. For example, the interval \$40,000 to 50,000 is wider than the interval \$43,000 to \$47,000.
• 95% confidence intervals are popular with many researchers. However, you may, at times, want to use other confidence intervals (e.g.,  90% confidence intervals or 99% confidence intervals).

Hypothesis Testing

Hypothesis testing is the branch of inferential statistics that is concerned with how well the sample data support a null hypothesis and when the null hypothesis can be rejected in favor of the alternative hypothesis.

• First note that the null hypothesis is usually the prediction that there is no relationship in the population.
• The alternative hypothesis is the logical opposite of the null hypothesis and says there is a relationship in the population.
• We use hypothesis testing when we expect a relationship to be present; in other words, we usually hope to “nullify” the null hypothesis and tentatively accept the alternative hypothesis. (Note: if you expect the null to be true, you can use the estimation approach described in this chapter; several additional procedures for this special case are discussed in Shadish, Cook, and Campbell’s book Experimental and Quasi-Experimental Designs, 2002, pp. 52-53)
• Here is the key question that is answered in hypothesis testing: "Is the value of my sample statistic unlikely enough (assuming that the null hypothesis is true) for me to reject the null hypothesis and tentatively accept the alternative hypothesis?"
• Note that it is the null hypothesis that is directly tested in hypothesis testing (not the alternative hypothesis).

To get the idea of null hypothesis testing in your head, reread Exhibit 16.1 (p. 473 and shown below).

Exhibit 16.1  An Analogy From Jurisprudence

The United States criminal justice system operates on the assumption that the defendant is innocent until proven guilty beyond a reasonable doubt. In hypothesis testing, this assumption is called the null hypothesis. That is, researchers assume that the null hypothesis is true until the evidence suggests that it is not likely to be true. The researcher's null hypothesis might be that a technique of counseling does not work any better than no counseling. The researcher is kind of like a prosecuting attorney. The prosecuting attorney brings someone to trial when he or she believes there is some evidence against the accused, and the researcher brings a null hypothesis to "trial" when he or she believes there is some evidence against the null hypothesis (i.e., the researcher actually believes that the counseling technique does work better than no counseling). In the courtroom, the jury decides what constitutes reasonable doubt, and they make a decision about guilt or innocence. The researcher uses inferential statistics to determine the probability of the evidence under the assumption that the null hypothesis is true. If this probability is low, the researcher is able to reject the null hypothesis and accept the alternative hypothesis. If this probability is not low, the researcher is not able to reject the null hypothesis. No matter what decision is made, things are still not completely settled because a mistake could have been made. In the courtroom, decisions of guilt or innocence are sometimes overturned or found to be incorrect. Similarly, in research, the decision to reject or not reject the null hypothesis is based on probability, so researchers sometimes make a mistake. However, inferential statistics gives researchers the probability of their making a mistake.

• Here is the main point: In the United States System of Jurisprudence, a defendant is "presumed innocent" until evidence calls this assumption into question. That is, the jury is told to assume that a person is innocent until they have heard all of the evidence and can make a decision. Likewise, in hypothesis testing, the null hypothesis is assumed to be true (i.e., it is assumed that there is no relationship) until evidence clearly calls this assumption into question.
• In jurisprudence, the jury rejects the claim of innocence (rejects the null) in the face of strong evidence to the contrary and makes the opposite conclusion that the defendant is guilty. Likewise, in hypothesis testing, the researcher rejects the null hypothesis in the face of strong evidence to the contrary.
• In hypothesis testing, "strong evidence to the contrary" is found in a small probability value, which says the research result is unlikely if the null hypothesis is true. When the researcher rejects the null hypothesis (i.e., rejects the assumption of no relationship), he or she tentatively accepts the alternative hypothesis (i.e., which says there is a relationship in the population).
• In short  .  .  .   in the procedure called hypothesis testing the researcher states the null and alternative hypotheses. Then if the probability value is small, the researcher rejects the null hypothesis and goes with the alternative hypothesis and makes the claim that statistical significance has been found.

Now take a look at the research questions and the null and alternative hypotheses shown below and in Table 16.2 (p.474).

• When you look at the table be sure to notice that the null hypothesis has the equality sign in it and the alternative hypothesis has the "not equals" sign in it.
• You can also see in the table that hypotheses can be tested for many different kinds of research questions such as questions about means, correlations, and regression coefficients.

You may be wondering, when do you actually reject the null hypothesis and make the decision to tentatively accept the alternative hypothesis?

• Earlier I mentioned that you reject the null hypothesis when the probability of your result assuming a true null is very small. That is, you reject the null when the evidence would be unlikely under the assumption of the null.
• In particular, you set a significance level (also called the alpha level) to use in your research study, which is the point at which you would consider a result to be very unlikely. Then, if your probability value is less than or equal to your significance level, you reject the null hypothesis.
• It is essential that you understand the difference between the probability value (also called the p-value) and the significance level (also called the alpha level).
• The probability value is a number that is obtained from the SPSS computer printout. It is based on your empirical data, and it tells you the probability of your result or a more extreme result when it is assumed that  there is no relationship in the population (i.e., when you are assuming that the null hypothesis is true which is what we do in hypothesis testing and in jurisprudence).
• The significance level is just that point at which you would consider a result to be "rare." You are the one who decides on the significance level to use in your research study. A significance level is not an empirical result; it is the level that you set so that you will know what probability value will be small enough for you to reject the null hypothesis.
• The significance level that is usually used in education is .05.
• It boils down to this: if your probability value is less than or equal to the significance level (e.g., .05) then you will reject the null hypothesis and tentatively accept the alternative hypothesis. If not (i.e., if it is > .05) then you will fail to reject the null. You just compare your probability value with your significance level.
• You must memorize the definitions of probability value and significance level right away because they are at the heart of hypothesis testing. At the most simple level, the process just boils down to seeing whether you probability value is less than (or equal to) your significance level. If it is, you are happy because you can reject the null hypothesis and make the claim of statistical significance. (Still don’t forget the last step of determining practical significance.)

This full process of hypothesis testing is summarized in Table 16.3 (p.480) and shown below.

• Be sure to note the final step shown in the table, because after conducting a hypothesis test, you must interpret your results, make a substantive, real-world decision, and determine the practical significance of your result.

Here is Table 16.3, in case you don't have your book handy.

Step 5 shows that you must decide what the results of your research study actually mean.

• Statistical significance does not tell you whether you have practical significance. At the end of step four you will know whether your result is statistically significant.
• If a finding is statistically significant then you can claim that the evidence suggests that the observed result (e.g., your observed correlation or your observed difference between two means) was probably not just due to chance. That is, there probably is some non-zero relation present in the population.
• An effect size indicator can aid in your determination of practical significance and should always be examined to help interpret the strength of a statistically significant relationship. An effect size indicator is defined as a measure of the strength of a relationship.
• A finding is practically significant when the difference between the means or the size of the correlation is big enough, in your opinion, to be of practical use. For example, a correlation of .15 would probably not be practically significant, even if it was statistically significant. On the other hand, a correlation of .85 would probably be practically significant.
• Practical significance requires you to make a non-quantitative decision and to think about many different factors such as the size of the relationship, whether an intervention would transfer well to the real world, the costs of using a statistically significant intervention in the real world, etc. It is a decision that YOU make.

The next idea is for you to realize that you will either make a correct decision about statistical significance or you will make an error whenever you conduct a hypothesis test.

• This idea is shown below and in Table 16.5 (p. 482) and here for your convenience.

• Looking at the top of the table (i.e., above the two columns) you will see that the null hypothesis is either true or not true in the empirical world.
• If you look at the side of the table (i.e., beside the two rows) you will see that you must make a decision to either fail to reject or to reject the null hypothesis.
• When the null is false you want to reject it, but when it is true you do not want to reject it.
• The four logical possibilities of hypothesis testing are shown in the table.
• When the null hypothesis is true you can make the correct decision (i.e., fail to reject the null) or you can make the incorrect decision (rejecting the true null). The incorrect decision is called a Type I error or a "false positive" because you have erroneously concluded that there is an effect or relationship in the population.
• When the null hypothesis is false you can also make the correct decision (i.e., rejecting the false null) or you can make the incorrect decision (failure to reject the false null). The incorrect decision is called a Type II error or a "false negative" because you have erroneously concluded that there is no effect or relationship in the population.
• You need to memorize the definitions of Type I and Type II errors, and after working with many examples of hypothesis testing they will become easier to ponder.
• Exercise: In law, a person is presumed to be innocent (i.e., that is the null hypothesis). Explain the idea of Type I and Type II errors here. Which error has occurred when an innocent person is found guilty?  Which error has occurred when a guilty person is found innocent by the jury?  (The answers are below.)

Hypothesis Testing in Practice

In this last section of the chapter, I apply the process of hypothesis testing (which is also called "significance testing") to the data set given in Table 15.1 (p. 435) and shown again here (below).

• Since we are now using this data set for inferential statistics, we will assume that the 25 people were randomly selected.
• Note that there are three quantitative variables and two categorical variables (can you list them?).
• Also note that I will use the significance level of .05 for all of my statistical tests below.

(The answers to the earlier questions about the two types of errors are in the first case a Type I error was made and in the second case a Type II error was made.)

• Before I test some hypotheses, I want to point out the reason WHY we use hypothesis or significance testing: We do it because researchers do not want to interpret findings that are not statistically significant because these findings are probably nothing but a reflection of chance fluctuations.

Note that in all of the following examples I will be doing the same thing. I will get the p-value and compare it to my preset significance level of .05 to see if the relationship is statistically significant. And then I will also interpret the results by looking at the data, looking at an effect size indicator, and by thinking about the practical importance of the result.

• Again, after practice, significance becomes very easy because you do the same procedure every single time. Determining the practical significance is probably the hardest part.

t-Test for Independent Samples
One frequently used statistical test is called the t-test for independent samples. We do this when we want to determine if the difference between two groups is statistically significant.

Here is an example of the t-test for independent samples using our recent college graduate data set:

• Research Question: Is the difference between average starting salary for males and the average starting salary for females significantly different?

• Here the hypotheses (note that they are stated in terms of population parameters):

• Null Hypothesis                        Ho: µM = µF  (i.e., the population mean for males equals the population mean for females)

• Alternative Hypothesis              H1: µM    µF (i.e., the population mean for males does not equal the population mean for females)

The probability value was .048 (I got this off of my SPSS printout).

• Since my probability value of .049 is less than my significance level of .05, I reject the null hypothesis and accept the alternative.
• I conclude that the difference between the two means is statistically significant.
• Now I would need to look at the actual means and interpret them for substantive and practical significance.
• The males’ mean is \$34,333.33 and the females’ mean is \$31,076.92.
• I can simply look at these means and see how different they are.
• To help in judging how different the means are, I also calculated an effect size indicator called eta-squared which was equal to .16. This tells me that gender explains 16% of the variance in starting salary in my data set.
• I conclude that males earn more than females, and because this is an important issue in society, I also conclude that this difference is practically significant.

One-Way Analysis of Variance
One-way analysis of variance is used to compare two or more group means for statistical significance.

Here is an example using our “recent college graduate” data set:

• Research Question: Is there a statistically significant difference in the starting salaries of education majors, arts and sciences majors, and engineering majors?

• Here the hypotheses (note that they are stated in terms of population parameters):

• Null Hypothesis.                       Ho: µE = µA&S = µB  (i.e., the population means for education students, arts and sciences students, and business students are all the same)

• Alternative Hypothesis.             H1: Not all equal (i.e., the population means are not all the same)

The probability value was .001 (I got this off of my SPSS printout).

• Since .001 is less than .05, I reject the null hypothesis and accept the alternative. I conclude that at least two of the means are significantly different.
• The effect size indicator, eta-squared, was equal to .467 which say that almost 47 percent in the variance of starting salary was explained or accounted for by differences in college major.
• Now I need to find out which of the three means are different.
• In order to decide which of these three means are significantly different, I must follow the “post hoc testing” procedure explained in the next. Notice that is I had done an ANOVA with an independent variable that was composed of only two groups, I would not need follow-up tests (which are only needed when there are three or more groups).

Post Hoc Tests in Analysis of Variance
Here are the three average starting salaries for the three groups examined in the previous analysis of variance (i.e., these are the three sample means):

• Education: \$29,500
• Arts and Sciences: \$32,300

The question in post hoc testing is "Which pairs of means are significantly different?"

In this case that results in three post hoc tests that need to be conducted:

1. First, is the difference between education and arts and sciences significantly different"
• Here are the null and alternative hypotheses for this first post hoc test:
• Null Hypothesis                        Ho: µE = µA&S  (i.e., the population mean for education majors equals the population mean for arts and sciences majors)

• Alternative Hypothesis              H1: µE    µA&S (i.e., the population mean for education majors does not equal the population mean for arts and sciences majors)
• The Bonferroni "adjusted" p-value, which I got off the SPSS printout, was .233.
• Since .233 is  > .05, I fail to reject the null that the population means for education and arts and sciences are equal.
• In short, this difference was not statistically significant.

1. Second, is the difference between education and business significantly different?
• Here are the null and alternative hypotheses for this first post hoc test:
• Null Hypothesis                        Ho: µE = µB  (i.e., the population mean for education majors equals the population mean for business majors)
• Alternative Hypothesis              H1: µE    µB (i.e., the population mean for education majors does not equal the population mean for business majors)
• The adjusted p-value was .001.
• Since .001 is < .05, I reject the null that the two population means are equal.
• I make the claim that the difference between the means is statistically significant.
• I also claim that the salaries are higher for business than for education students in the populations from which they were randomly selected.
• Because this finding could affect many students’ choices about majors and because it may also reflect the nature of salary setting by the private versus public sectors, I also conclude that this difference is practically significant.

1. Third, is the difference between arts and sciences and business significantly different?
• Here are the null and alternative hypotheses for this first post hoc test:
• Null Hypothesis                        Ho: µB = µA&S  (i.e., the population mean for business majors equals the population mean for arts and sciences majors)
• Alternative Hypothesis              H1: µB    µA&S (i.e., the population mean for business majors does not equal the population mean for arts and sciences majors)
• The adjusted p-value was .031.
• Since .031 is < .05, I reject the null hypothesis that the two population means are significantly different.
• I make the claim that this difference between the means is statistically significant.
• I also claim that the salaries are higher form arts and sciences than for education students in the populations from which they were randomly selected.
• Because this finding could affect students’ choices about majoring in business versus arts and sciences, I believe that this finding is practically significant.

In short, based on my post hoc tests, I have found that two of the differences in starting salary were statistically significant, and, in my view, these differences were also practically significant.

The t-Test for Correlation Coefficients
This test is used to determine whether an observed correlation coefficient is statistically significant.

Here is an example using our “recent college graduate” data set:

• Research Question: Is there a statistically significant correlation between GPA (X) and starting salary (Y)?
• Here are the hypotheses:

• Null Hypothesis.                       H0ΡXY = 0  (i.e., there is no correlation

in the population)

• Alternative Hypothesis.             H1: ΡXY  0  (i.e., there is a correlation in

the population)

·        The observed correlation in the sample was .63.

·        The probability value was .001.

·        Since .001 is < .05, I reject the null hypothesis.

·        The observed correlation was statistically significant.

·        I conclude that GPA and starting salary are correlated in the population.

·        If you square the correlation coefficient you obtain a “variance accounted for” effect size indicator: .63 squared is .397 which means that almost 40 percent of the variance in starting salary is explained or accounted for by GPA

·        Because the effect size is large and because GPA is something that students can control through studying, I conclude that this statistically significant correlation is also practically significant.

The t-Test for Regression Coefficients
This test is used to determine whether a regression coefficient is statistically significant.

The multiple regression equation analyzed in the last chapter is shown here again, but this time we will test each of the two regression coefficients for statistical significance.

= 3,890.05  +  4,675.41 (X1)  +  26.13(X2)

where,

is predicted starting salary

3,890.05 is the Y intercept (or predicted starting salary when GPA and

GRE Verbal are zero)

4,675.41 is the regression coefficient for grade point average

X1 is grade point average (GPA)

X2 is GRE Verbal

Research Question One: Is there a statistically significant relationship between starting salary (Y) and GPA (X1) controlling for GRE Verbal (X2)? That is, is the first regression coefficient statistically significant?

• Here are the hypotheses:
• Null Hypothesis.                       H0: βYX1.X2  =  0  (i.e., the population regression coefficient expressing the relationship between starting salary and GPA, controlling for GRE Verbal is equal to zero; that is, there is no relationship)
• Alternative Hypothesis.             H1 : βYX1.X2  ≠   0  (i.e., the population regression coefficient expressing the relationship between starting salary and GPA, controlling for GRE Verbal is NOT equal to zero; that is, there IS a relationship)

• The observed regression coefficient was 4,496.45.
• The probability value was .035
• Since .035 is < .05, I conclude that the relationship expressed by this regression coefficient is statistically significant.
• A good measure of effect size for regression coefficients is the semi-partial correlation squared (sr2) . In this case it is equal to .10, which means that 10% of the variance in starting salary is uniquely explained by GPA
• Because GPA is something we can control and because the effect is explains a good amount of variance in starting salary, I conclude that the relationship expressed by this regression coefficients is practically significant.

Research Question Two: Is there a statistically significant relationship between starting salary (Y) and GRE Verbal (X2), controlling for GPA (X1)? That is, is the second regression coefficient statistically significant?

• Here are the hypotheses:
• Null Hypothesis.                       H0: βYX2.X1  =  0  (i.e., the population regression coefficient expressing the relationship between starting salary and GRE Verbal, controlling for GPA is equal to zero; that is, there is no relationship)
• Alternative Hypothesis.             H1 : βYX2.X1  ≠   0  (i.e., the population regression coefficient expressing the relationship between starting salary and GRE Verbal, controlling for GPA is NOT equal to zero; that is, there IS a relationship)

• The observed regression coefficient was 26.13.
• The probability value was .014
• Since .014 is < .05, I conclude that the relationship expressed by this regression coefficient is statistically significant.
• A good measure of effect size for regression coefficients is the semi-partial correlation squared (sr2) . In this case it is equal to .15, which means that 15% of the variance in starting salary is uniquely explained by GRE Verbal
• Because GRE Verbal is also something we can work at (as well as take preparation programs for) and because the effect is explains15% of the variance in starting salary, I conclude that the relationship expressed by this regression coefficient is practically significant.

The Chi-Square Test for Contingency Tables
This test is used to determine whether a relationship observed in a contingency table is statistically significant.

• Research Question: Is the observed relationship between college major and gender statistically significant?
• The probability value was .046.
• Since .046 is < .05, I conclude that the observed relationship in the contingency table shown in Table 16.6 (p.492) is statistically significant.
• The effect size indicator used for this contingency table is Cramer’s V. It was equal to .496, which tells us that the relationship is moderately large.
• Because the effect size indicator suggested a moderately large relationship and because of the importance of these variables in real world politics, I would also conclude that this relationship is practically significant.

Believe it or not, we are done. My goal in this last section was to show that every single time we do one of these tests, you do the same thing. You get your probably value, compare it to your significance level, and, finally, you make a decision.

You have now come a long way toward understanding the logic of significance testing. Remember, when reading journal articles look out for those probability values (to see if they are less than .05), and also look for effect sizes and statements about whether a finding is practically significant

Congratulations!