Is going back to the drawing board on testing the right answer?

The state Board of Education recently approved three five-year pilots of new tests, one or more of which could eventually replace the Georgia Milestones.

The state Board of Education recently approved three five-year pilots of new tests, one or more of which could eventually replace the Georgia Milestones.

When studies showed Georgia set the lowest bar in the country for student proficiency in math and reading, the state abandoned the Criterion-Referenced Competency Test and introduced the more demanding Milestones, given for the first time in 2015.

Three years later, it’s the Milestones tests now under the gun. The state Board of Education recently approved three five-year pilots of new tests, one or more of which could eventually replace the Milestones.

Why?

That’s a complicated question and one that involves some lingo from psychometrics, the science of measuring mental abilities and processes. Students take two kinds of tests. Formative or diagnostic tests – often teacher-created — provide ongoing feedback on what students know and don’t know and enable teachers to modify instruction. They include the weekly quizzes or monthly unit-tests that kids take.

Summative tests, such as the Georgia Milestones, measure outcomes and answer whether students mastered what they were supposed to learn in a stark pass/fail judgment. Summative tests typically come at the end of the year, too late to drive instructional changes, and are thus often described as autopsies

To harness testing to improve instruction, all three of the Georgia pilots will rely on formative exams that will then be rolled into a summative score comparable to the Milestones, which will be daunting, according to the experts.

"No state has successfully taken their formative test scores and rolled them up in a summative test score that is psychometrically sound," said Amber Northern, senior vice president for research at the Thomas B. Fordham Institute.

The conservative-leaning Fordham Institute is worried this zeal for customized alternatives to summative testing will undermine comparability and equity and ultimately accountability. Standardized summative exams allow comparisons across schools and identify which kids are excelling and which are stumbling.

But teachers and school districts maintain low standardized scores have been used to attack and label them as failures rather than expose the more intractable factors influencing student achievement – family backgrounds and incomes.

In a statement on what he deems the out-sized role of testing in grading Georgia schools and districts, state School Superintendent Richard Woods said, “Most Georgians agree that our students – and schools – are more than a score. Yet we still have a metric that gives roughly 80 percent of its weight to high-stakes tests, and only the remaining 20-some percent to opportunities like fine arts, world languages, physical education, career pathways, dual enrollment, AP/IB, work-based learning, apprenticeships, attendance and graduation rates.”

Through its Innovative Assessment Demonstration program, the U.S. Department of Education is allowing up to 10 states to develop and try out new assessments on a small scale, with the possibility of expanding the approaches statewide. Georgia is seeking federal approval of the three testing pilots, which will involve 21 districts,  including Cobb, Marietta City, Clayton and Fayette.

“The danger here is that you dial back expectations,” said Northern. “If you are going to have three different tests in Georgia, you have to make sure each is equally rigorous and equally valid. Because districts will figure out which is the easiest test.”

Dana Rickman, policy and research director for the Georgia Partnership for Excellence in Education, says a lot of unknowns still surround the pilots. Her hope is the pilots spark "a larger conversation on how we are using assessments, what are they for and how they can improve student outcomes."

Northern questions why states including Georgia feel compelled to reinvent the wheel when they can draw on the PARCC and Smarter Balanced assessments, which were designed to reflect the Common Core curriculum and which Fordham evaluated as the best at allowing both at-risk and high-achieving students to demonstrate what they know and can do.

“This impulse to have better tests is right on target,” said Northern, “But it drives me nuts when states act like they don’t have a model when Smarter Balanced and PARCC have shown what it looks like to have better tests. They have performance test components that they worked on for years to get right — states should take advantage of these items already developed and vetted across the nation. These tests blew even the best state tests out of the water.”