July 3rd, 2009
The majority of states now require some form of paper-and-pencil test for teacher licensure (although the content of these tests varies greatly). The idea of these tests is presumably to make sure that teachers coming into the system meet some minimum requirement. An empirical question, however, is whether these tests predict teaching performance.
Over the years, there have been numerous studies looking into this question. Recently, D’Agostino & Powers in the American Educational Research Journal conducted a review of these studies using meta-analysis techniques that allow for statistical summaries of findings across studies. They examined research dating back to 1906 that investigated the relationship between teacher test scores and/or GPA and teaching performance. Various studies measured teaching performance by: supervising professor or teacher ratings, principal ratings, student evaluations, and student achievement scores.
The authors report the following:
“Among indicator types, it is evident that GPAs yielded larger effects than any type of teacher test. The average weighted GPA effect of .25 was greater than the content and professional knowledge test effects (.17 for both) and the basic skill test effect (.09)…”
In addition, examination of prediction based on different measures of teaching performance indicated that professor and teacher ratings were most correlated with test scores, followed by principal ratings, and relationships with student achievement were smallest.
The authors conclude that,
“Those involved in the teacher hiring and selection process probably should focus as much or more on students’ grades than scores on the tests used for licensure purposes.”
The researchers nicely coded multiple aspects of the studies summarized and were able to explore different aspects of the relationships between the independent and dependent variables. The overarching question here though presents some interesting questions about the purpose of the teacher assessments.
If the purpose of the assessment is to ensure that teachers meet a minimum criterion, they would not necessarily be designed in a way that would allow score variation to predict teaching performance. If I were going to design an assessment to make sure teachers met a minimum criteria, I would go administer my potential items to a group of teachers or prospective teachers who have already been judged to meet or not meet the criteria. I would then do my item analysis and focus on selecting items that specifically discriminated between those who met the criteria and those who didn’t. If I had an item that only the very best teachers answered correctly, I probably wouldn’t include it because it won’t help me discriminate between people on either side of that criterion line.
Alternately, if I wanted to make a test to predict teaching ability at all levels, I would make a test that includes items that discriminate at differing levels of teaching ability. Some items would discriminate between the very best teachers and those below them. I would include items with differing probabilities of getting the correct answer across the whole teaching effectiveness scale.
So I know we are all SHOCKED that perhaps an assessment is being used for purposes other than that for which it was intended. However, I think it is worth making explicit that the actual design process for creating these test may inhibit them from predicting teacher effectiveness across the spectrum.
Reference: D’Agostino, J. V. & Powers, S. J. (2009). Predicting teacher performance with test scores and grade point average: A meta-analysis. American Educational Research Journal, 46, 146-182.
Posted in Assessment, Teachers | Comments (0)