Chapter 5
Standardized Measurement and Assessment

(For the concept map that goes with this chapter, click here.)


Defining Measurement

When we measure, we attempt to identify the dimensions, quantity, capacity, or degree of something.


Measurement can be categorized by the type of information that is communicated by the symbols or numbers assigned to the variables of interest. In particular, there are four levels or types of information are discussed next in the chapter. They are called the four "scales of measurement."


Scales of Measurement


1.  Nominal Scale.
This is a nonquantitative measurement scale.


2.  Ordinal Scale.

This level of measurement enables one to make ordinal judgments (i.e., judgments about rank order).

3.  Interval Scale.


4.  Ratio Scale.
This is a scale with a true zero point.


Assumptions Underlying Testing and Measurement


Before I list the assumptions, note the difference between testing and assessment. According to the definitions that we use:

In this section of the text, we also list the twelve assumptions that Cohen, et al. Consider basic to testing and assessment:


1. Psychological traits and states exist.


2.  Psychological traits and states can be quantified and measured.


3.  Various approaches to measuring aspects of the same thing can be useful.


4.  Assessment can provide answers to some of life's most momentous questions.


5.  Assessment can pinpoint phenomena that require further attention or study.


6.  Various sources of data enrich and are part of the assessment process.


7.  Various sources of error are always part of the assessment process.


8.  Tests and other measurement techniques have strengths and weaknesses.


9.  Test-related behavior predicts non-test-related behavior.


10.  Present-day behavior sampling predicts future behavior.


11.  Testing and assessment can be conducted in a fair and unbiased manner.


12.  Testing and assessment benefit society.



Identifying A Good Test or Assessment Procedure


As mentioned earlier in the chapter, good measurement us fundamental for research. If we do not have good measurement then we cannot have good research. That’s why it’s so important to use testing and assessment procedures that are characterized by high reliability and high validity.


Overview of Reliability and Validity

As an introduction to reliability and validity and how they are related, note the following:



Reliability refers to consistency or stability. In psychological and educational testing, it refers to the consistency or stability of the scores that we get from a test or assessment procedure.


There are four primary ways to measure reliability.


1.      The first type of reliability is called test-retest reliability.

·        This refers to the consistency of test scores over time.

·        It is measured by correlating the test scores obtained at one point in time with the test scores obtained at a later point in time for a group of people.

·        A primary issue is identifying the appropriate time interval between the two testing occasions.

·        The longer the time interval between the two testing occasions, the lower the reliability coefficient tends to be.


2.      The second type of reliability is called equivalent forms reliability.



3.      The third type of reliability is called internal consistency reliability


4.      The fourth and last major type of reliability is called inter-scorer reliability.  



Validity refers to the accuracy of the inferences, interpretations, or actions made on the basis of test scores.


Validation refers to gathering evidence supporting some inference made on the basis of test scores.


There are three main methods of collecting validity evidence.


1.  Evidence Based on Content

Content-related evidence is based on a judgment of the degree to which the items, tasks, or questions on a test adequately represent the domain of interest. Expert judgment is used to provide evidence of content validity.


To make a decision about content-related evidence, you should try to answer these three questions:


2.  Evidence Based on Internal Structure

Some tests are designed to measure one general construct, but other tests are designed to measure several components or dimensions of a construct. For example, the Rosenberg Self-Esteem Scale is a 10 item scale designed to measure the construct of global self-esteem. In contrast, the Harter Self-Esteem Scale is designed to measure global self-esteem as well as several separate dimensions of self-esteem.


3.  Evidence Based on Relations to Other Variables

This form of evidence is obtained by relating your test scores with one or more relevant criteria. A criterion is the standard or benchmark that you want to predict accurately on the basis of the test scores. Note that when using correlation coefficients for validity evidence we call them validity coefficients.


There are several different kinds of relevant validity evidence based on relations to other variables.


The first is called criterion-related evidence which is validity evidence based on the extent to which scores from a test can be used to predict or infer performance on some criterion such as a test or future performance. Here are the two types of criterion-related evidence:


Here are three more types of validity evidence researchers should provide:


Now, to summarize these three major methods for obtaining evidence of validity, look again at Table 5.6 (also shown below).


Please note that, if you think we have spent a lot of time on validity and measurement, the reason is because validity is so important in empirical research. Remember, without good measurement we end up with GIGO (garbage in, garbage out).





Using Reliability and Validity Information


You must be careful when interpreting the reliability and validity evidence provided with standardized tests and in empirical research journal articles. 




Educational and Psychological Tests


Three primary types of educational and psychological tests are discussed in your textbook: intelligence tests, personality tests, and educational assessment tests.


1)  Intelligence Tests

Intelligence has many definitions because a single prototype does not exist. Although far from being a perfect definition, here is our definition: intelligence is the ability to think abstractly and to learn readily from experience.  


For some examples of intelligence tests, click here.


2)   Personality Tests.

Personality is a construct similar to intelligence in that a single prototype does not exist. Here is our definition: personality is the relatively permanent patterns that characterize and can be use to classify individuals.


For some examples of personality tests, click here.


3)   Educational Assessment Tests.

There are four subtypes of educational assessment tests:


--These are typically screening tests because the predictive validity of many of these tests is weak.


--These are designed to measure the degree of learning that has taken place after a

person has been exposed to a specific learning experience. They can be teacher

constructed or standardized tests.


For some examples of achievement tests, click here.


--These focus on information acquired through the informal learning that goes

on in life.
--They are often used to predict future performance whereas achievement tests are used to measure current performance.

Sources of Information about Tests


The two most important main sources of information about tests are the Mental Measurements Yearbook (MMY) and Tests in Print (TIP). Some additional sources are provided in Table 5.7. Also, here are some useful internet links (from Table 5.8):