Fairness of new teacher evaluation system in question

Teachers in high-poverty schools more likely to get lowest scores

Digging deep

This summer, the Georgia Department of Education released school-by-school results for 2013 from its measurement of student growth. That offers a preview of how the educators will fare when the student data is used to evaluate their effectiveness.

The Atlanta Journal-Constitution decided to compare the school results to the percentage of students at each school eligible for free or reduced-price lunches that year, a commonly used gauge of parental income. We focused on elementary and middle schools, where there’s a wider range of students receiving such benefits. The analysis showed that the most impoverished schools rarely got top growth scores.

That prompted us to dig deeper into research the state had already done to learn more about how teachers were faring under its approach. State research showed that teacher growth scores also were related to the economic status of students.

Finally, to get a sense of how teachers might have been rated under a different evaluation system, the AJC compared the state’s results for primary schools in Atlanta to the results from Atlanta Public Schools’ statistical model, developed by the Value-Added Research Center at the University of Wisconsin. That model takes students’ economic status directly into account in estimating a school or educator’s effectiveness.


How growth scores are calculated

The state calculates a growth score for teachers by averaging together the growth scores of his or her students, known as “student growth percentiles,” or SGPs.

SGPs are calculated by a sophisticated formula that compares each student’s test performance in a single subject to that of other students who scored similarly to him or her on standardized tests in that subject in the recent past. For example, a growth score of 75 for a student in fifth-grade reading who got test results of 750 in third grade and 790 in fourth grade essentially means that student outperformed 75 percent of the other students with similar scores historically.

Those student scores are then averaged to get a mean student growth percentile, or mean growth percentile, for their teacher, while a median growth percentile of all students is calculated for their school.

Teachers working with lower-income students could be more likely than their peers to be denied pay raises or to lose their jobs under the new teacher evaluation system the state is implementing, an Atlanta Journal-Constitution examination found.

That finding could fuel criticism of a system intended to provide an even playing field for rating teachers all across Georgia. It is supposed to accomplish that by relying in large part on a mathematical formula that scores teachers based on how their students perform on standardized tests from year to year.

Georgia Department of Education officials said they are aware that their “growth model” tends to result in lower scores for teachers in higher-poverty schools. But they don’t know yet whether that is because their approach puts those teachers at a disadvantage or if the state’s lower-income students actually may get less-effective teachers.

That uncertainty is a real concern for education officials as well as for teachers, whose growth scores are available to schools and districts to inform personnel decisions, such as merit pay.

In many districts across metro Atlanta that participated in the Obama administration's Race to the Top program, teachers will receive bonuses based on the new evaluations. And in Fulton County, district leaders are looking to offer $20,000 stipends to teachers with high growth scores to work in low-performing schools.

“It really scares the hell out of me,” said Daniel Sobczak, an economics teacher at Southwest DeKalb High School and a district leader in the Georgia Association of Educators, a teacher advocacy group. “This is my livelihood, it’s what I am going to do for the next 20 years, and it could get caught up in this system not working.”

Some experts who examined the type of model the state is using as part of its Teacher Keys Effectiveness System also are worried. They say lower ratings could create powerful incentives to deter teachers from working with the students who need high-quality teachers the most.

“A big concern for me is making it harder for teachers in tough circumstances to show up as doing well,” said Cory Koedel, an economist at the University of Missouri. “I think teachers in those circumstances already face a lot of challenges, and I hate to do something else that stacks the deck against them.”

The state also produces school growth scores, which the AJC analyzed to preview how the system would affect teachers, whose scores are not made public. That analysis showed that the most impoverished elementary and middle schools, measured by the percentage of students eligible for free or reduced-price lunches, rarely got growth scores among the top 10 percent of schools statewide. On the flip side, the most affluent schools rarely fell on the lowest rung.

The pattern also shows up in teacher growth scores, according to an August 2013 DOE research report. It showed that teachers working in the highest-poverty classrooms averaged significantly lower marks than their counterparts in the lowest-poverty classrooms.

The state has made some changes since then that should lessen the relationship between student poverty and teacher growth scores, said Allison Timberlake, program manager of Georgia’s growth model.

For now, though, the state has decided not to adopt measures that could drastically lower or even eliminate the poverty link, as some other states and school districts have done. DOE is conducting further research to determine if additional changes will be made.

“We’re working with the best of the best in the field,” said Melissa Fincher, the state DOE’s deputy superintendent of assessment and accountability. “We’re trying to be very cautious and to proceed very carefully so that decisions that are made and ultimately, at the end of the day, when a teacher gets a classification, that’s not arbitrary.”

Lower-income students historically tend to score lower on standardized tests. Research has shown that using the growth model, rather than test scores themselves, can go a long way to putting teachers on the same footing in performance evaluations, whether their students are rich or poor, English proficient or still learning the language.

But the model does not directly take into account any of those student factors or information about students’ classmates. Such things are outside of a teacher’s control but may affect students’ growth on standardized tests. Many researchers with whom the AJC spoke favored accounting for at least some of those factors when evaluating teacher effectiveness.

Yet doing so could risk blinding policy makers to the possibility that more effective teachers may actually be in higher-income schools, said Derek Briggs, a professor at the University of Colorado who is advising the state on its evaluation system.

“If [more-affluent schools] are the ones that have the best teachers, I don’t want to hide that,” Briggs said. “I want people to see that, in fact, there are real inequities in the quality of teachers that students get as a function of where they happen to be in school.”

The growth scores from the model are used only for teachers in subjects where students take standardized exams, covering about 30 percent of all teachers. For those educators, the score makes up about half of their evaluation rating. The remainder is based on classroom observations by principals or other administrators.

School districts are supposed to use growth scores to evaluate teachers starting next school year, but the state has asked the federal government for a one-year delay, a request that DOE spokesman Matt Cardoza said is not related to the growth model.

State law stipulates that teachers won’t receive scheduled pay raises if they are found to be ineffective one year or in need of development two years in a row on their evaluations.

Also, a 2012 state law requires performance evaluations to be the primary consideration if a district downsizes its teaching staff.

Fincher said that it would be extremely rare for a teacher to be dismissed on the basis of his or her growth score alone. “Usually, there’s other factors that are taken into consideration when such a drastic decision is made,” she said.

Effects of poverty

Across the country, other states are also implementing systems that tie student test results to teacher pay and retention. That’s largely because the Race to the Top program linked large federal grants during the recession to states’ willingness to adopt such systems.

Political and business leaders had pushed for years to link test scores to teacher evaluation systems, saying they would provide an objective way to reward great teachers and root out poor performers.

To those who research and build statistical models to evaluate teachers, however, objectivity is a thorny issue. It’s difficult to tease out the impact that a teacher has on a student from other things going on within a school or within a child’s life. Not everything in a classroom or a community can be quantified and lined up against a test score.

Georgia teachers and administrators from across the state who spoke with the AJC tended to agree.

Many expressed frustration with the focus that the new evaluation system places upon testing. They said it narrows the curriculum, leaving little room to engage students’ imaginations in the material.

Several educators also cited unique hurdles disadvantaged students face — everything from lack of computer access, to parents working multiple jobs, to a narrowed vocabulary that can result from missed experiences like dining regularly in a sit-down restaurant or going to a museum. Other teachers spoke of harsher realities.

“We have kids who are coming in with other challenges outside of the classroom that you would never imagine a small child would have to deal with,” said Courtney Spraggins, a math teacher at Atlanta’s Cascade Elementary, where nearly every child is eligible for a subsidized lunch. “…A lot of them see things in their neighborhoods that I as an adult have not seen and experienced.”

“We have to take all those things into account,” said Spraggins, who also noted that seeing her students’ growth data helped her know where to focus instruction.

One North Georgia high school teacher, who like many teachers spoke to the AJC on the condition of anonymity, said she tries to keep her ears open for what’s going on in her students’ lives. “You hear things like, ‘Momma didn’t come home last night, so I had to take care of my little brother.’” She recalled one student who cried when she gave him a pencil and told her that he didn’t get food unless he came to school, and that he was scared.

Sobczak, the DeKalb economics teacher, said he wasn’t surprised to hear that teachers working with lower-income students were receiving lower growth scores.

“It’s very difficult for me to teach a student who comes to school hungry,” Sobszak said. “They are not thinking, ‘OK, do I answer a, b, c or d on this test.’ They are thinking, ‘I’m hungry.’”

Damian Betebenner, a researcher at the National Center for the Improvement of Education Assessment who developed Georgia’s growth model and is helping to implement it, acknowledges the difficulty of creating a single system for evaluating teachers working with widely different students.

“I think that when you have a profession as diverse as teaching, that it is extremely difficult, if not impossible, to come up with a singularly uniform evaluation system that kind of applies to everybody that is within that profession,” Betebenner said in an interview earlier this year.

“I’m not sure that legislation in the states recognizes just how nuanced evaluations should be, given how diverse kids are and given how diverse the working situations that teachers encounter are,” he said.

Nonetheless, Betebenner said that the scores from his model provide a piece of evidence that can be used to decide if a teacher should stay or go.

Stamping teachers’ foreheads?

Atlanta Public Schools is among the districts nationwide employing statistical models that drastically lower the relationship between student poverty and teacher effectiveness ratings. Like the state model, it uses changes in student test scores from one year to the next. Unlike the state, the APS model also takes into account student factors such as free or reduced lunch status, disabilities and English proficiency.

Many high-poverty schools rank higher among their APS peers under the district’s model than they do under the state’s, an AJC comparison shows. It’s a result that experts say should translate to teachers evaluated under the different systems as well.

For example, nearly all students at Cascade Elementary and Fickett Elementary are eligible for subsidized lunches. Those schools scored near the middle of the pack for the district under the state’s model. But under APS’ model, they came out much closer to the top.

On the other side of the city, Warren T. Jackson Elementary, which has the lowest proportion of lower-income students in APS, scored 8th highest out of 74 primary schools under the state’s model. It dropped to 47th under the district’s model. APS uses its model, developed by the Value-Added Research Center, for identifying best practices; it will have to use the state’s system for teacher evaluations.

Betebenner worries that a model like APS’ statistically sweeps the problem—disadvantaged kids having less access to educational opportunities—under the rug. His system, he said, shines a bright light on the issue.

But Betebenner and Briggs both expressed concerns about using data from any model to mechanically categorize teachers as effective or ineffective for high-stakes purposes.

“Data should be a flashlight, not a hammer,” Betebenner said.

Both largely agreed with a sentiment expressed by Harvard researcher Andrew Ho, who said he is more interested in how growth data can be used to improve student learning than in, “stamping good or bad on every teacher’s forehead as if it is some permanent mark.”

“That I think is more worrisome and deserves more constructive discussion” than which model is used, he said.

Ho said simplistic labels can detract from the professionalism of teachers, a notion that resonated with one south Georgia high school teacher concerned about the emphasis the new system places on determining teacher quality from growth scores. “A lot of times, all you are doing is breaking it down like kids are widgets that are being pushed out of a factory.”

“I’d like to spend more time doing what I love,” he said. “Which is teaching kids.”