What a study of educator evaluation taught us about professional learning

Professional learning programs — designed right — can make a positive difference without requiring much time from teachers.

A decade and a half ago, there was a national push to reform the way educators are evaluated. Policy makers believed that better performance information and feedback would help educators learn on the job. The enhanced performance information also would allow better human resource decisions — decisions about tenure, dismissal, or bonuses, for example. The end result would be better instruction and improved student achievement.

Did policy makers place their faith in a good strategy? What was working and what wasn’t? To address these questions, researchers launched various studies. Among those was a randomized controlled trial we led — one of the largest such trials we have ever conducted. We published our study findings in a peer-reviewed journal article and two technical reports (Song et al., 2021; Garet et al., 2017; Wayne et al., 2016). As we explain here, what began for us as a study of educator evaluation reform became a study of teacher professional learning. Its findings exemplify one of the main lessons of the past decade about what it takes for teacher professional learning to be effective.

A study to inform educator evaluation

We launched the study in 2012, during the first Barack Obama administration, under a contract with the U.S. Department of Education. At the time, several new federal policies and programs shared the premise that the teacher evaluation system in the U.S. was broken. The general belief was that administrators observed teachers too infrequently, the ratings they gave were inflated and not objective, and the ratings didn’t vary enough across teachers. New federal policies incentivized states and districts to make observations more frequent and objective and to rate teachers in part based on data showing their contributions to student achievement. The policy efforts also called for simultaneous reform of the evaluation of school principals.

To inform these efforts, we designed a study of the impact of providing high-quality performance feedback to educators (i.e., teachers and principals) as a supplement to what the districts already had been providing. The performance information was to be shared with educators and their supervisors, and the districts could use it to inform human resource decisions. The participating schools were to implement three evaluation system components over two years. In Figure 1, each of these components appears as a row (e.g., feedback on teacher classroom practice), and the cells show when each instance of feedback occurred for that component (1st instance, 2nd instance, etc.).

The first component — teacher classroom practice — was the most intensive and aimed directly at teaching and student learning. Each district chose one of two rubrics to use to observe teachers and give feedback: either the Classroom Assessment and Scoring System (CLASS) or Charlotte Danielson’s Framework for Teaching (FfT). The rubric vendors trained the observers, and all observers had to pass a test of rating skill. Teachers received a half-day training to orient them to the aspects of classroom practice captured in the rubric.

Each teacher was then observed four times per year — once by a school administrator and three times by a study-hired local observer who had experience teaching and giving feedback. After each observation, the observer wrote a report with both ratings and narrative feedback and then met with the teacher to discuss it.

For the second component, each teacher received information on their contribution to student achievement growth in the form of value-added scores, which indicated how well the teacher’s students performed during a school year compared with similar students in the same district. The study produced value-added reports on three occasions, each time using student achievement data from previous years.

Finally, the third component was principal leadership, assessed twice a year based on a validated 360-degree survey called the Vanderbilt Assessment of Leadership in Education. After each survey was administered, the principal and their supervisor met to discuss the findings summarized in an automatically generated report.

To carry out the study, we worked with a total of 127 elementary and middle schools in eight districts that had not adopted educator evaluation reforms. Half of those schools implemented all three educator performance feedback components for two years (2012-13 and 2013-14), while the other half continued with their normal practices.

A shift to professional learning

In most ways, the stage was set. During the study, we would document how schools implemented the three feedback components and how districts used the performance information. We would then see how the program affected the outcome measures: classroom practice, measured with video recordings in the spring of the second study year; student achievement, measured by state tests each spring; and principal leadership, measured with a teacher survey each spring.

What we didn’t anticipate was that none of the eight districts wanted to use the performance information for any purpose beyond educators’ professional learning. Our technical assistance staff visited the districts in summer 2012 to discuss the many ways districts could use educator performance information. The districts embraced the idea that better feedback could help educators build skills and be more effective in the classroom, but they did not feel ready to use the new performance information in making human resource decisions (e.g., about tenure or performance bonuses). That remained true across the two years.

So the study didn’t answer the question we had planned to answer: What would happen if better measures and feedback practices were used as an educator evaluation system, connected to human resource decisions? Instead, it tested the impact of what was essentially a professional learning program consisting of repeated, low-stakes, supplemental performance feedback for educators.

Positive impacts that shed new light

Even though the feedback was not tied to human resource decisions, it did have impacts. Comparing treatment and control schools, we found positive impacts on aspects of each of the three main outcomes: one of two measures of classroom practice (CLASS scores); one of two measures of student achievement (mathematics); and both measures of principal leadership (instructional leadership and teacher-principal trust).

These findings took some time to sink in. Before this project, we had dedicated several years to testing the impact of seemingly promising teacher professional learning programs that required teachers to participate for many hours. Each time, there was no statistically significant positive impact on any measure of student achievement. We knew from other studies that some programs had impacts, but to see impacts from a program that took teachers so little time was really a surprise.

For each teacher, the program took only about 20 hours, spread over two years. To put that into perspective, another two-year professional leaning program we had recently studied took each teacher 113 hours. That one incorporated the features recommended by experts and researchers, combining summer institutes, during-the-year workshops, and coaching. And yet it had no statistically significant impact on student achievement (Garet et al., 2011). So, imagine failing to crack a walnut with a sledgehammer, and then splitting it with a gentle tap.

Since publishing the findings from this study, we’ve wondered: Which of the three components caused the greatest impact on student achievement? Though the feedback on classroom practice was the most intensive feedback component, and it directly focused on teaching and student learning, we don’t really know if it led to the positive impact on student achievement. The study can only shed light on the impact of the three components delivered as a package.

If we had to guess, we would attribute at least some of the impact to the feedback on classroom practice, based on the evidence we gathered about implementation. We started to see an impact on achievement during spring testing in the first year. At the time, teachers had received about three rounds of feedback on classroom practice. That alone could lead to an impact on achievement, according to another experimental study of feedback on classroom practice (Steinberg & Sartain, 2015).

If we’re right, repeated formative feedback that is of high quality is a better use of teachers’ time than many other professional learning activities.

Could we rule out the possibility that the feedback on principal leadership and on teachers’ contribution to student achievement made a difference? No, but principals received their first round of feedback between December and January of the first year, which did not leave much time for that feedback to translate into improved leadership, and, in turn, improved instruction and student achievement by the spring. Likewise, the first feedback on teachers’ contribution to student achievement growth came between February and April of the first year, and fewer than half (39%) of the teachers who were assigned value-added scores logged in to view their reports. (For later reporting periods, we also provided hard copies.) It seems unlikely that feedback on principal leadership or on teachers’ contributions to student growth could account for the positive impact on student achievement in the first year.

What the impact evidence implies

To us, the positive difference made by providing teachers with repeated formative feedback suggests that school systems may underinvest in high-quality formative feedback, relative to other professional learning activities. In fact, it was rare for teachers in control schools to receive observation-based feedback that included a report with ratings and a written narrative. When surveyed each spring, many teachers hadn’t received any such feedback that year. The average teacher in a control school received such feedback 0.7 times in the first year and 0.2 times in the second.

The other takeaway is that a program can work even if it doesn’t take teachers a lot of time. We were not the first to discover a positive impact on student achievement from a professional learning program that took relatively little time from teachers. For example, the initial trial of MyTeachingPartner-Secondary, in which coaches repeatedly select video clips from teachers’ classrooms and prompt teachers to reflect on them (Allen et al., 2011) found a positive result even though the program only took teachers 20 hours spread across 13 months. At the time, that finding seemed hard to believe.

Today, many more studies have accumulated, and the pattern of findings found in meta-analyses leads to the same conclusion. Among the body of programs that have been tested, there is no link between program impact and the amount of time it took teachers to complete the program (Garrett, Citkowicz, & Williams, 2019; Kraft, Blazar, & Hogan, 2018). That doesn’t mean program duration isn’t important. Rather, it suggests that beyond a certain point, more is not necessarily better. And intuitively, more could be worse, if time-intensive programs distract teachers from important responsibilities that affect student outcomes.

That takeaway is as important for school districts and program developers as it is for researchers. Teachers want to make the best use of their time, and sometimes well-intentioned professional learning programs get in the way. School system staff seem to understand that, especially in the aftermath of COVID. When we approach them about participating in impact studies, they tell us they want programs that support teachers without demanding too much of their time.

We are now excited to study professional learning programs that take relatively little time — what we’re calling streamlined, high-quality programs. The quality of professional learning has to be high, so that every minute is productive enough to offset the loss of time for other priorities. This strikes us as perhaps the most important focus for future studies of teacher professional learning, and it fits best with school system needs.

Small doses, positive impact

We didn’t answer our initial questions about the reform movement that took root 15 years ago, but we learned some lessons about teacher professional learning that are useful today. Specifically, we learned that repeated formative feedback on educator performance made a positive difference for both educators and students. We suspect the feedback on classroom practice caused the impact on student achievement, though we need another trial to confirm that hypothesis. If we’re right, repeated formative feedback that is of high quality (i.e., provided by trained observers using validated rubrics and including both ratings and narratives) is a better use of teachers’ time than many other professional learning activities.

The more general lesson is important, too, especially in the wake of the pandemic. There is strong interest in how to design meaningful professional learning for teachers who feel overwhelmed (e.g., Durham, 2023). And our study and others show that it’s possible for professional learning programs — designed right — to make a positive difference without requiring much time from teachers.

References

Allen, J.P., Pianta, R.C., Gregory, A., Mikami, A.Y., & Lun, J. (2011). An interaction-based approach to enhancing secondary school instruction and student achievement. Science, 333 (6045), 1034-1037.

Durham, A. (2023, March 30). Educators respond: Designing professional learning when teachers are overwhelmed. Learning Forward.

Garet, M.S., Wayne, A.J., Brown, S., Rickles, J., Song, M., & Manzeske, D. (2017). The impact of providing performance feedback to teachers and principals (NCEE 2018-4001). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.

Garet, M., Wayne, A., Stancavage, F., Taylor, J., Eaton, M., Walters, K., Song, M., Brown, S., Hurlburt, S., Zhu, P., Sepanik, S., & Doolittle, F. (2011). Middle school mathematics professional development impact study: Findings after the second year of implementation (NCEE 2011-4024). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.

Garrett, R., Citkowicz, M., & Williams, R. (2019). How responsive is a teacher’s classroom practice to intervention? A meta-analysis of randomized field studies. Review of Research in Education, 43 (1), 106-137.

Kraft, M.A., Blazar, D., & Hogan, D. (2018). The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of Educational Research, 88 (4), 547-588.

Song, M., Wayne, A., Garet, M.S., Brown, S., & Rickles, J. (2021). Impact of providing teachers and principals with performance feedback on their practice and student achievement: Evidence from a large-scale randomized experiment. Journal of Research on Educational Effectiveness, 14 (5), 1-26.

Steinberg, M. & Sartain, L. (2015). Does teacher evaluation improve school performance? Experimental evidence from Chicago’s Excellence in Teaching Project. Education Finance and Policy, 10 (4), 535-572.

Wayne, A.J., Garet, M.S., Brown, S., Rickles, J., Song, M., & Manzeske, D. (2016). Early implementation findings from a study of teacher and principal performance measurement and feedback: Year 1 report (NCEE 2017-4004). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.

This article appears in the March 2024 issue of Kappan, Vol. 105, No. 6, p. 42-46