Test-fraud detectives drawing scrutiny

When Caveon Test Security began its statistical detective work in Atlanta, it promised swift, enlightening results.

“I’m good. I’m fast,” Caveon’s then-President John Fremer told the business and civic leaders who hired the private, for-profit consulting firm in March 2010 to look into cheating in Atlanta schools.

Caveon delivered quickly, but it met spectacular failure. Gov. Sonny Perdue rejected the firm’s findings last year and publicly accused it of seeking to “confine and constrain the damage” from rampant cheating. This July, state investigators who conducted a deep, 10-month examination of Atlanta schools said Caveon vastly underrepresented the extent of test-tampering.

Now, Caveon is defending its work in Washington, D.C., too. Authorities are re-investigating the district’s 2009 test scores despite a Caveon examination of eight schools that proclaimed no proof of cheating.

For years, the small Utah firm has been the go-to contractor for test-fraud detection. But Caveon’s work in Atlanta and D.C. suggests the company is gaining a reputation more for clearing schools than catching cheaters, an Atlanta Journal-Constitution investigation has found.

Caveon officials reject the suggestion they would minimize cheating to please clients.

“Our results are very, very solid,” Fremer said recently.

Caveon’s secrecy about its methods has elicited complaints, too.

The firm has staunchly resisted disclosing the inner workings of its test-data analyses, making it impossible for others in the field to validate or critique the results. Because Caveon’s approach is proprietary, public agencies that hire it can’t fully grasp how it reaches conclusions about who is or isn’t cheating.

The questions raised by Caveon’s work underscore the pitfalls that await school districts and state education departments that seek to combat fraud in their standardized test scores.

Most school systems and states perform few checks on the scores, despite a rash of cheating scandals nationwide and the rising importance of test scores as a measure of student and teacher performance.

Agencies that want to do validity reviews find that standard approaches are scarce, the work is highly technical and the federal government, which requires the tests, has little advice.

Exhaustive investigations involving complex data analysis and interviews with educators, such as the state’s work in Atlanta, are expensive and time-consuming, often prohibitively so in all but the most egregious cases. Less thorough reviews risk accusing the innocent or missing signs of wrongdoing.

And, with controversies simmering in multiple states, public pressure to curtail cheating has risen sharply over the past year.

Secretive security

Caveon (pronounced KAV-ee-on) has been the most prominent of a handful of private firms, nonprofits and testing companies who offer to crunch data and review test-security policies for a fee. Since its founding in 2003, the firm has worked with at least 10 districts or states concerned about the safety of their tests.

Secrecy has been a hallmark of the Caveon approach. In materials it provides clients, the firm boasts of a “patent-pending” method and “proprietary detection services.”

Caveon officials describe their statistical approach generally, but will not disclose the equations and algorithms that reveal exactly what the company is doing with clients’ data.

“We would not want to give somebody else, except under very special circumstances, the exact recipe to do these analyses,” said Fremer, now president of a division of Caveon.

That attitude has given Caveon a competitive edge and also invited criticism.

Texas education officials grew frustrated with Caveon’s lack of transparency when they hired the firm to investigate suspicious test scores in 2005 following reports in the Dallas Morning News of unbelievable gains at some schools.

“That’s one of the problems we had when we used them,” spokeswoman Suzanne Marchman wrote in an email. “Their methodology was proprietary so we were unable to replicate the findings.”

As a result, Marchman said, the state couldn’t thoroughly explain to schools and the public why schools were under suspicion.

Testing expert Gregory Cizek said another drawback to Caveon’s secrecy is that its methods have not been subject to scrutiny by peers in professional journals or at conferences. Without such openness, no one knows the rate at which Caveon analyses miss real instances of cheating or wrongly identify innocent classrooms.

“We don’t have good information on their method,” said Cizek, a professor at the University of North Carolina – Chapel Hill, who has been urging the firm to be more open.

Walt Haney, another testing expert, said he, too, is concerned about the lack of peer review for Caveon’s approach.

“There could be flaws in their methodology, there could be unexamined errors in their methodology,” said Haney, a professor at Boston College. “For them to claim that their statistical methodology is proprietary, it seems to me, is a clear violation of professional standards regarding testing and evaluation.”

The company often cites a “Caveon Index” as its main measure of fraud. The Caveon Index, however, changes from place to place.

Cizek, whom Georgia’s investigators consulted during their inquiry, said it’s unclear whether such changes are wise.

“Refining your procedure is the right and good thing to do,” he said. “But if it changes idiosyncratically from instance to instance, then I don’t think that’s a good thing.”

Analyzing the analysis

The Caveon Index calculated for Atlanta brought the company a heap of trouble.

Caveon took on a tough assignment when it agreed to work for the commission appointed to examine cheating allegations at 58 Atlanta schools.

The state ordered the inquiry after finding high numbers of suspicious erasures on state tests. But district leaders, including Superintendent Beverly Hall, continued to question whether cheating occurred.

Caveon wasn’t the commission’s only option. Its leaders had expressed significant interest in a large nonprofit called the American Institutes for Research.

AIR’s last-minute withdrawal over data concerns, however, foreshadowed Caveon’s later difficulties.

AIR Chief Scientist Gary Phillips learned district leaders’ take on the allegations soon after arriving in Atlanta: Officials paraded their own charts and analysis before him, state investigators said, in an effort to convince him the erasures didn’t indicate widespread cheating.

Phillips challenged the district’s points. Afterward, Superintendent Hall pressed further, telling him that test-taking strategies used by the district could explain the erasures.

“My policy advice to her,” Phillips said recently, was “to take leadership and begin a thorough investigation and reaction to this and not treat this as an additional data question.”

As commission staff negotiated with Phillips, a Metro Chamber executive working with the group called a consultant in Washington, D.C. The consultant sent her information on Caveon, emails show. Four days later, AIR withdrew from consideration because Phillips believed he wouldn’t get the detailed, statewide data to support a deep analysis.

By the end of that week, the commission hired Caveon, which was eventually paid $150,000 for analysis and other consulting work.

Caveon pitched an approach that suggested it would likely paint a less bleak picture than the state’s erasure analysis.

The firm wanted to distinguish between classrooms with a few tests with high erasures and classrooms where pervasive erasures boosted scores, chief scientist Dennis Maynes wrote in a March 10 memo. “If we can do this,” he wrote, “we can clear a lot of classrooms and their associated schools.”

Fremer said Caveon does not aim to perform such a comprehensive search for cheats that none go undetected. Instead, the firm tries to identify the most egregious so authorities can make examples of them.

But the implications of Caveon’s memo concerned Kathleen Mathers, executive director of the Governor’s Office of Student Achievement — the agency that ordered the erasure analysis. She warned the district not to try to use data to dismiss the state’s findings: “additional data analyses should not be designed to clear classrooms and associated schools,” she wrote in a memo.

The district, commission and Caveon, however, pushed on.

The firm would later complain it did not obtain all the data it needed from the state.

Recently, however, Fremer said the firm also made a mistake in marking some key data as “optional” in a request to the state.

Mathers said her office provided what Caveon listed on a data request as essential. She said she had no idea about AIR’s interest, or its data concerns, until after it dropped out.

Index and investigation

Caveon devised a Caveon Index for Atlanta schools using the data it had.

But, in a move that would draw sharp criticism, Caveon passed on using statewide statistics as points of comparison for Atlanta schools.

Instead, the firm compared schools only to other Atlanta schools on the list of 58 identified as suspect. Because that group had so many erasures, some problem schools looked less severe than when they were compared to schools statewide.

Company officials later told investigators they knew their approach was limited, but felt it was appropriate because the commission wanted the “worst of the worst.” Fremer said a bigger comparison pool is better, but he disagrees with experts that it mattered in the end.

“Statewide was never on the table as something that would be available to us,” he said. “I don’t even remember a discussion about it. People tell us what data we have to work with and we do the best job we can.”

Caveon concluded 12 schools were most problematic. They were the same 12 the state had already identified as the worst.

In public appearances, however, Hall touted Caveon’s analysis as proof that most of the 58 schools had no cheating problem.

“There are 33 schools that they cleared, and I think that’s important to be said,” she told radio host Denis O’Hayer on WABE-FM (90.1) in 2010.

Days later, a frustrated Perdue castigated Caveon and the district for what he said was a flawed investigation and appointed investigators with subpoena power to try again.

The investigators traveled to Utah to talk to Caveon’s Maynes.

During the conversation, Bob Wilson, one of the three state investigators, said Maynes was unable to walk his visitors through the steps he took to examine Atlanta’s test results.

“In the end, he was not able to tell us how he did it,” Wilson said. “All he did was put [the numbers] in a pot stir it up and come back with the same thing that went in.”

Fremer said the problem was not Maynes, but investigators’ lack of knowledge of the complex field of testing.

He said no one from the Atlanta district pushed Caveon to soften its findings. “There wasn’t any attempt to sweeten it,” he said.

In July, the state investigators reported they had confirmed test-tampering in 44 of 56 schools they examined, involving nearly 200 educators. The investigators had sharp words for Caveon.

“Because of the manner by which Caveon calculated its index, and the contaminated statistical universe it used,” their report said, “many schools for which there was strong statistical evidence of cheating were not flagged.”

The findings prompted criticism from within the testing field, too.

“I think it was a huge mistake what they did in Atlanta,” Cizek said. “To sort of stake your work and your reputation and your findings on what you know are not the best data to be using I think is not a good idea.”

In contrast, Cizek praised an investigation in Pennsylvania by the Minnesota-based Data Recognition Corp.

DRC provided detailed equations and explanations in its report on dozens of schools it flagged for possible cheating in 2009. Such information allows other researchers to replicate the analysis to test its integrity.

Other clients

That sort of detail is hard to come by among Caveon’s public-sector clients.

Florida did not provide the AJC documents on Caveon’s methods and findings despite requests over several weeks. Caveon identified about 7,000 tests as suspicious this year in Florida. Officials said their investigation is ongoing.

The firm’s most recent report to Mississippi, where it has crunched testing data since 2006, contains no equations and only broad descriptions of the analysis used. A report Caveon provided Minnesota in 2009 is similarly vague.

Colorado education officials said Caveon looked into suspicious scores at a charter school and found no cheating, but did see some problems with test administration. In general, Assistant Education Commissioner Jo O’Brien said, Caveon seemed “thorough” and “neutral,” yet “cautious” about jumping to conclusions about cheating.

In Washington, D.C., Caveon didn’t analyze scores or erasures. Instead, it looked individually at schools’ scores and interviewed potential witnesses at eight schools where the district had suspicions.

“Many of the gains and high erasures are plausible, given the evaluation process by Caveon,” the firm wrote in its report on one school that later became the focus of suspicions of cheating because of high erasures and complaints.

After a USA Today report this year raised questions about erasures and the earlier inquiries, however, the district’s chancellor asked the local inspector general to re-investigate the eight schools. The federal inspector general office is also investigating, school officials have said.

Fremer said he had “no reason to believe that our investigations were judged to be inadequate” in D.C.

On Thursday, after years of mostly silence on how states and districts should protect their tests, the U.S. Department of Education convened a closed-door meeting of experts to seek advice on policies to ensure the tests’ integrity.

As for Caveon’s methodology, Fremer said the company will likely publish more about its work and participate in professional conferences.

It won’t, however, reveal its proprietary algorithms.

He said Caveon’s clients, such as state education departments, don’t need more detail.

“One of John Fremer’s theories about life — if you have a graph with more than one line on it, the audience probably doesn’t understand it,” he said. “Equations are a joke, people don’t understand equations.”