New teacher evals fail the test

By W. James Popham

April 13, 2014

W. James Popham, a UCLA Emeritus Professor, is the author of “Evaluating America’s Teachers: Mission Possible?”

In almost all 50 states including Georgia, teachers will soon be subjected to annual, high-stakes evaluations of their instructional competence. Unlike previous teacher evaluations that were aimed at improving teachers’ instructional skills, these new teacher evaluations are much more likely to lead to a teacher’s dismissal. America’s teachers are, with good reason, concerned.

The trouble is that state officials have swung from one extreme to the other. Stung by criticisms about the subjectivity of previous teacher-evaluation systems, they have instituted what they claim are wholly objective evaluations based on quantitative data. Such by-the-numbers, supposedly scientific teacher evaluations, however, are destined to fail for two reasons.

One of those reasons is the enormous diversity in different teachers’ instructional situations. Teachers differ in what they teach, who they teach, the effectiveness of their students’ previous instruction, and a host of other salient educational variables. To quantitatively evaluate a state’s teachers as though they were functioning in identical instructional settings is flat-out foolish. Yet many of today’s judgment-free teacher evaluations attempt to do precisely that.

The second obstacle faced by teacher evaluators is the variety and quality of the evidence being used to arrive at an evaluation of a teacher’s quality. The most common kinds of evidence employed to evaluate teachers are students’ scores on standardized or teacher-made tests, classroom observations of teachers in action, administrative ratings, and students’ evaluations of their teachers. But the quality of those different sorts of evidence varies profoundly from state to state, district to district, school to school, and even within a particular school.

Nonetheless, many of our nation’s current teacher-evaluation procedures mistakenly assume that the performances of a teacher’s students on one kind of test are the same as their performances would be on similar tests. Yet many of the standardized tests now being employed to judge teachers have not been demonstrated to be suitable for this significant task. That is, most of those tests are unaccompanied by any evidence indicating those tests can distinguish between well taught and badly taught students.

Striking differences can also be found in the quality of evaluative data drawn from classroom observations, administrative ratings and student evaluations. To illustrate, we often see profound differences in the way that classroom observers have been trained to carry out their observations. Properly trained observers who rely on a research-rooted observation system, and who observe sufficient numbers of a teacher’s classes, can provide compelling evaluative evidence. Badly trained observers who rely on a makeshift observation system to observe teachers only a few times are apt to provide evaluators with shoddy evidence.

Teacher-evaluation systems that attempt to resolve these two problems by using pre-formed numerical templates will inaccurately evaluate far more teachers than is necessary. To cope sensibly with such diversities, properly trained and carefully monitored human beings must supply nuanced judgments that fairly and accurately address those diversities.

What’s involved in any common-sense approach to teacher evaluation is not dramatically unlike what we now see in many states’ recently refurbished teacher evaluations — but with one important exception. Most states’ current teacher evaluations attempt to rely on a cookie-cutter, quantitative process in which human judgment plays a minor or nonexistent role.

Will reliance on the judgments of properly prepared evaluators, even if those judgments are transparent and well documented, lead to the errorless evaluation of teachers? Of course not. As with the evaluation of most workers, especially those engaged in complex endeavors, mistakes will be made. But because judgmentally rooted appraisals can cope better with anomalies in teachers’ instructional settings as well as variations in the worthiness of evaluative evidence, a common-sense judgmental approach is the only defensible way to evaluate our nation’s teachers.

About the Author

W. James Popham

Alerts

New teacher evals fail the test

More Stories

Why the FBI search warrant affidavit for Fulton’s 2020 ballots matters

The Works is a landmark Atlanta project. It’s about to get much bigger.

Victims and politics collide as Raffensperger spotlights First Liberty fallout