Chapter 5
Standardized Measurement and Assessment

(For the concept map that goes with this chapter, click here.)

 

Defining Measurement

When we measure, we attempt to identify the dimensions, quantity, capacity, or degree of something.

 

Measurement can be categorized by the type of information that is communicated by the symbols or numbers assigned to the variables of interest. In particular, there are four levels or types of information are discussed next in the chapter. They are called the four "scales of measurement."

 

Scales of Measurement

 

1.  Nominal Scale.
This is a nonquantitative measurement scale.

 

2.  Ordinal Scale.

This level of measurement enables one to make ordinal judgments (i.e., judgments about rank order).


3.  Interval Scale.

 

4.  Ratio Scale.
This is a scale with a true zero point.

 

Assumptions Underlying Testing and Measurement

 

Before I list the assumptions, note the difference between testing and assessment. According to the definitions that we use:


In this section of the text, we also list the twelve assumptions that Cohen, et al. Consider basic to testing and assessment:

 

1. Psychological traits and states exist.

 

2.  Psychological traits and states can be quantified and measured.

 

3.  Various approaches to measuring aspects of the same thing can be useful.

 

4.  Assessment can provide answers to some of life's most momentous questions.

 

5.  Assessment can pinpoint phenomena that require further attention or study.

 

6.  Various sources of data enrich and are part of the assessment process.

 

7.  Various sources of error are always part of the assessment process.

 

8.  Tests and other measurement techniques have strengths and weaknesses.

 

9.  Test-related behavior predicts non-test-related behavior.

 

10.  Present-day behavior sampling predicts future behavior.

 

11.  Testing and assessment can be conducted in a fair and unbiased manner.

 

12.  Testing and assessment benefit society.

 

 

Identifying A Good Test or Assessment Procedure

 

As mentioned earlier in the chapter, good measurement us fundamental for research. If we do not have good measurement then we cannot have good research. That’s why it’s so important to use testing and assessment procedures that are characterized by high reliability and high validity.

 

Overview of Reliability and Validity

As an introduction to reliability and validity and how they are related, note the following:

 

Reliability

Reliability refers to consistency or stability. In psychological and educational testing, it refers to the consistency or stability of the scores that we get from a test or assessment procedure.

 

There are four primary ways to measure reliability.

 

1.      The first type of reliability is called test-retest reliability.

·        This refers to the consistency of test scores over time.

·        It is measured by correlating the test scores obtained at one point in time with the test scores obtained at a later point in time for a group of people.

·        A primary issue is identifying the appropriate time interval between the two testing occasions.

·        The longer the time interval between the two testing occasions, the lower the reliability coefficient tends to be.

 

2.      The second type of reliability is called equivalent forms reliability.

 

 

3.      The third type of reliability is called internal consistency reliability

 

4.      The fourth and last major type of reliability is called inter-scorer reliability.  

 

Validity

Validity refers to the accuracy of the inferences, interpretations, or actions made on the basis of test scores.

 

Validation refers to gathering evidence supporting some inference made on the basis of test scores.

 

There are three main methods of collecting validity evidence.

 

1.  Evidence Based on Content

Content-related evidence is based on a judgment of the degree to which the items, tasks, or questions on a test adequately represent the domain of interest. Expert judgment is used to provide evidence of content validity.

 

To make a decision about content-related evidence, you should try to answer these three questions: