After The Atlanta Journal-Constitution’s analysis of test scores led to the state investigation and 2011 findings of widespread cheating in Atlanta schools, a national testing expert suggested we could do the same thing on a nationwide scale. We were intrigued.

The federal No Child Left Behind act requires each state to give a statewide standardized test to all students in grades 3 through 8 to measure performance in reading and math. In Georgia, that is the Criterion-Referenced Competency Test.

A team of three reporters and two database specialists spent five months collecting databases of standardized test scores in those grades for 69,000 schools, in 14,743 districts in 49 states. (The 50th, Nebraska, didn’t have usable data because it didn’t give a statewide standardized test until last year.)

The law requires school districts to give parents an annual “report card” on school performance, and all states have laws requiring disclosure of public information. We thought that would expedite data collection.

Some states, including Texas and California, post online the data we needed. Most states sent data within days or even hours. A few were more challenging. We called state education departments and made formal open records requests. Some states required months of negotiating and multiple requests before they sent data.

New Mexico said the request was “burdensome” and took two months to send data.

Nevada called it an “annoyance” and took almost three months. When a reporter told an assistant attorney general that Nevada was the only state that hadn’t provided data, the attorney quoted TV’s “Seinfeld”: “Yada yada yada.”

Alabama education officials insisted they had posted the scores online. When they realized that was untrue, they offered to provide the data for $3,200, but finally sent it without charge two months after the original request. In the end, no state charged for the data.

District of Columbia education officials didn’t answer many of our weeks of daily phone calls; emails describing the data requested were repeatedly shuffled to other employees. After three months, officials sent incomplete data. The district is not in our analysis because of methodology issues (see “Analysis limitations” below).

With the data in hand, we used a method similar to the analysis used to find suspicious test scores in Atlanta. It compares test scores achieved by a “cohort” of students: That is, when a third-grade class in a school moves on to fourth grade, the group is likely to remain similar and so test scores won’t vary a lot.

By plotting large changes in scores for a cohort, for better or worse, an analyst can identify test results that are highly unlikely to happen by chance. When scores go up that much, it suggests some intervention, such as cheating, to change the expected results.

Scores that drop are meaningful, too. Test scores can rise or fall dramatically because one teacher cheats and the next one does not, or vice versa.

In addition, patterns in test scores may show dramatic declines, as they did in Atlanta, after cheating is exposed or investigated. In both cases, scores can drop because cheating stops.

Then, we did another level of analysis, gauging the likelihood that abnormal score changes would occur in “clusters” of grades in one district. We calculated the probability of districts achieving huge changes in test scores in a lot of classes, compared to the probability statewide. In some cases, the probability was less than one in 1 trillion.

We showed our methodology and results to independent experts on testing and data analysis to confirm our findings.

Our analysis identified districts nationwide with clusters of suspicious score changes. So our team visited schools and parents in a half-dozen urban districts on that list, while we presented our findings by phone and email to officials in problematic districts for response.

We talked to executives and testing specialists in those districts and states. When district officials raised concerns we couldn’t immediately answer, we went back to our data to check our results. In the meantime, we talked to national experts and decision makers on testing and education policy.