Opinion: Our tests themselves are failing
Dr. Angelika Pohl is founder and president of the Atlanta-based Better Testing & Evaluations.
High-stakes tests have indeed turned out to have high stakes: They have caused seasoned educators to be sent to prison.
The main problem with high-stakes tests is that they are shrouded in secrecy. Teachers are not allowed to see the tests that have serious consequences for them and their students. Of course, let’s not kid ourselves: Teachers do look at the tests to see what their students are being asked — and they are often appalled. They see questions with poor grammar, confusing wording, misspelled words, trivial and ambiguous questions, flawed graphs, and visually confusing charts and questions about facts and concepts they haven’t taught.
When teachers see these flaws, they cannot speak up, because that would reveal they violated test security regulations. No wonder teachers have doubts about the validity of these tests that have recently become high-consequence measures of their competence.
Tests are not inherently bad. It is quite possible to write test questions and answer choices most people would agree are fair measures of what a student has learned. But it costs money. And expertise.
I did not grow up wanting to be a testing expert, but it happened. After finishing my doctorate, I got a job with a big test development company in Massachusetts that produced teacher certification tests. They hired me because I was an eclectic type that seemed to know a little about a wide range of fields and enjoyed everything from philosophy to statistics.
It was impressed upon me from Day One that questions on these tests had to be absolutely flawless, so that they would stand up in a legal challenge to their validity. Teachers could be denied a livelihood on the basis of these tests.
Before a question (usually termed an “item”) could appear on a test, it was subjected to numerous reviews. Three or four editors would tweak and fine-tune the wording, and a copy editor would subject the item to various tests of factual and linguistic accuracy. Once items were deemed flawless, they were presented to experienced teachers for careful scrutiny and extensive discussion. As a result, many items were thrown out or revised.
When I moved to Georgia, I began work with the Department of Education and was given responsibility for implementing the then-new high school graduation tests. The department had contracted with a test development firm to write the test questions.
When the contractor submitted tests for approval, I was appalled. Items had all the flaws listed above. I would send items back for revision, but that rarely resulted in great improvement. It became clear to me that we were receiving first drafts, rather than carefully edited items.
My years of training for precision and clarity would not let me accept these items, so I spent hours editing them myself. My colleagues and my director were of the opinion that items could just be thrown into a pilot test, submitted to hapless tens of thousands of students across Georgia, and then checked for statistical results.
If the stats met established criteria, the item was a go. Not human readers, but psychometrics had the last word. To be sure, statistical validity is a necessary criterion, but it is by no means a sufficient one.
Most of the state-level standardized tests given to students these days are poorly constructed. Contractors don’t want to spend the money to develop carefully constructed items; bureaucrats are intimidated or enamored by the psychometrics and lack editorial and pedagogical sensitivity.
As a result, the tests are crude measures of learning and do not invite the respect of students or teachers. High-quality tests are possible, and sorely needed, but the higher-ups need to know and care enough to insist on them.
