This is the second in a series of stories about test-tampering across the nation. The Atlanta Journal-Constitution published the first installment March 25.
For this story, the newspaper tracked test-score changes over several years for 605 Blue Ribbon elementary and middle school winners from 2009, 2010 and 2011 nationwide. The analysis looked at average test scores for students in grades three through eight as each class moved from one grade to the next in a school.
The newspaper used statistical techniques, including standard deviation and regression, to look for extreme gains or drops in scores that could signal test-tampering. The most extreme score gains involved changes in scores that exceeded three standard deviations. The newspaper found 27 Blue Ribbon schools over the three years had at least one score change at this level.
The newspaper also looked at the pattern of unusual score gains and drops in schools before and after they won the award. For this, the AJC examined scores using a linear regression model, weighted by the number of students in the class, and compared the average score for a class with the score predicted by the model based on the previous year’s average score. It then calculated a p-value — an estimated probability that such a difference would occur by chance — using standardized residuals and the “T” probability distribution, which adjusts the probability upward for classes with fewer students.
Classes with scores rising or dropping with a probability of less than 0.05 were flagged as unusual. The AJC found that overall, the number of unusual increases in scores among Blue Ribbon winners grew sharply in the years leading up to the award, leveled off, then dropped after the award. At the same time, the number of unusual score declines rose after schools won their awards.
Some grades and schools were excluded from the analysis because of small class sizes, rezonings or enrollment changes of greater than 25 percent. The newspaper did not have individual student test data. The analysis used approximate cohorts of students based on test-score averages across grades for each school.
Experts say there are few reasons that scores would shift dramatically for entire grades of students, which in this analysis typically involved dozens of students in multiple classrooms. Research has not shown that instruction can achieve such a feat. To rule out other explanations in the 27 schools highlighted, such as dramatic changes in the makeup of schools, the newspaper reviewed each school’s Blue Ribbon application and contacted school officials. The newspaper also reviewed data for these schools extensively, to look for other indications of unusual score shifts that could be due to test-tampering.
When the newspaper published its first report, some school district officials complained that classes were being flagged because of high student mobility — essentially, that a change in students from year-to-year was skewing results. The AJC conducted additional analysis and found no relationship between mobility and unusual scores. However, most of the 27 Blue Ribbon winners highlighted in the story reported low levels of mobility.
Statistics alone don’t prove cheating. As was the case for the AJC’s original story in this series, experts said statistical analyses of test scores are only screenings that should prompt a deeper look by local officials.