Chapter 5

5.1.      What is measurement?

Measurement is the act of measuring by assigning symbols or numbers to something according

to a specific set of rules. It involves identifying the dimensions, quantity, capacity, type, or degree of something.

5.2.      What are the four different levels or scales of measurement and what are the essential characteristics of each one?

The four levels of measurement are nominal, ordinal, interval, and ratio scales. Note that the first

letters spell NOIR (which means black in French).

·        The most basic level of measurement is the nominal level which simply involves assigning symbols or names to identify the groups or categories of something (e.g., gender and college major are nominal variables).

·        The next level of measurement is the ordinal level in which the levels take on the new property of rank order (e.g., students’ ranks on an exam, 1st, 2nd, 3rd, etc. is an ordinal variable).

·        The next level of measurement is the interval level which takes on the new property that the distances between adjacent points is the same (in addition to having the property of rank ordering). An example is the Fahrenheit temperature scale, where the difference between 70 and 75 degrees is the same as the difference between 75 and 80 degrees. Note however that you cannot say that 80 degrees is twice as hot as 40 degrees because the zero point on an interval scale is arbitrary.

·        The highest level of measurement is the ratio scale which has the properties of rank order and equal distances and it has the new property of having an absolute or true zero point. You have a true zero point when zero means none of the property being measured. Annual income and height are examples. Note now that a person who is six feet tall is twice as tall as a person who is three feet tall. Unlike with interval scales, we can make these types of ratio statements with ratio scales (e.g., 50/25=2).

5.3.      What are the twelve assumptions underlying testing and measurement?

Note that it takes a lot of hard work to make the 12 assumptions happen in practice. I will list the 12 assumptions here:

1.      Psychological traits and states exist (they are social constructions that name phenomena of interest to researchers as they attempt to understand the world)

2.      Psychological traits and states can be quantified and measured

3.      Various approaches to measuring aspects of the same thing can be useful

4.      Assessment can provide answers to some of life’s most momentous questions

5.      Assessment can pinpoint phenomena that require further attention or study

6.      Various sources of data enrich and are part of the assessment process

7.      Various sources of error are always part of the assessment process

8.      Tests and other measurement techniques have strengths and weaknesses

9.      Test-related behavior predicts non-test-related behavior

10.  Present-day behavior sampling predicts future behavior

11.  Testing and assessment can be conducted in a fair and unbiased manner

12.  Testing and assessment benefit society.

Also be sure to know the three definitions included in this section traits (distinguishable, relatively enduring ways in which one individual differs from another), states, (less enduring ways in which individuals vary), and error (the difference between a person’s true score and the person’s observed score).

5.4.      What is the difference between reliability and validity? Which is more important?

Reliability refers to the consistency or stability of the test scores; validity refers to the accuracy of the inferences or interpretations you make from the test scores. Both of these characteristics are important. Note also that reliability is a necessary but not sufficient condition for validity (i.e., you can have reliability without validity, but in order to obtain validity you must have reliability).

5.5.      What are the definitions of reliability and reliability coefficient?

Reliability refers to the consistency or stability of a set of test scores. The reliability coefficient is a correlation coefficient that is used as an index of reliability.

Note that there are several different forms of reliability. First is test-retest reliability (the consistency of a group of individuals’ scores over time). The second type is equivalent-forms reliability (consistency of a group of individuals’ scores on two equivalent forms of a test). The third type or reliability is internal consistency reliability (consistency of items in measuring a single construct). The two subtypes of internal consistency are split-half reliability and coefficient alpha. The fourth major type of reliability is inter-scorer reliability (consistency or degree of agreement between two or more scorers, judges, or raters).

5.6. What are the different ways of assessing reliability?

Most of the types of reliability are assessed with simple correlation coefficients (called reliability coefficients). Test-retest reliability is the correlation between a group’s scores on the same test given at two different times (i.e., give a set of people a test twice and see if the two sets of scores are correlated). Equivalent-forms reliability is the correlation between a group’s scores on two forms of the same test (i.e., give everyone in a group two forms of the same test and correlate those two sets of scores). Split-half reliability is the correlation between a group’s scores on two halves of the same test (everyone in the group takes the test once and you give everyone a score on both of the two halves of the test; then you correlate those two sets of scores). Coefficient alpha can be viewed as the average of the correlations of all of the items on a test with each other (e.g., if a test only had 3 items it would be the average of the correlation between items 1 and 2, 1 and 3, and 2 and 3). It tells you if the items tend to be related. The basic inter-scorer reliability is the correlation between two raters’ ratings of a set of objects (e.g., a set of essay questions).

5.7.      Under what conditions should each of the different ways of assessing reliability be used?

Test-retest is used to determine consistency of the scores on a test over time.

Equivalent forms reliability is used to see if different forms of a test give consistent results.

Internal consistency reliability is used to see if the different items on a test give consistent results. Inter-scorer reliability is used to see if two raters of a set of items give consistent results.

5.8.      What are the definitions of validity and validation?

Validity is the accuracy of the inferences, interpretations, or actions made on the basis of test scores. Validation is the process of gathering evidence that supports the inferences made on the basis of test scores.

5.9.      What is meant by the unified view of validity?

It means that all validity can be viewed as part of construct validity. That’s because to be discussing measurement validity, there has to be something that we intend to measure. The term “construct” simply refers to what we want to measure whether it be age, gender, IQ, knowledge.

5.10.    What are the characteristics of the different ways of obtaining validity evidence?

The three major types of evidence include:

(1)   Evidence based on content.

(2)   Evidence based on internal structure of the test.

(3)   Evidence based on relations to other variables.

This is summarized in Table 5.6.

5.11.    What are the purposes and key characteristics of the major types of tests discussed in your textbook?

The major types of tests discussed are:

·      Intelligence tests (goal is to measure one or more types of intelligence)

·      Personality tests (goal is to measure one or more dimensions of personality)

·      Educational assessment tests (including preschool assessment tests for identifying “at risk” children, achievement tests for measuring learning from formal learning experiences, aptitude tests for measuring informal learning that goes on in life, and diagnostic tests for identifying academic difficulties in students).

5.12.    What is a good example of each of the major types of tests that are discussed in this chapter?

·        Some examples intelligence tests are the Stanford-Binet Intelligence Test, the Wechsler Adult Intelligence Scale, the Slosson Intelligence Test.

·        Some examples of personality tests are the Minnesota Multiphasic Personality Inventory, the California Psychological Inventory, Work Values Inventory, Minnesota School Attitude Survey, and the Thematic Apperception Test.

·        Some examples of educational assessment tests are Peabody Individual Achievement Test, Nelson Reading Skills Tests, and the Basic English Skills Test.

For more information on these tests and for more tests go to the companion website under bonus materials for Chapter 5.