Published on: **Mar 3, 2016**

Source: www.slideshare.net

- 1. Why it’s time we stopped managing schools like baseball teams part …for the most John Cronin, Ph.D. Senior Director, the Kingsbury Center at Northwest Evaluation Association You can view this presentation at slideshare: http://www.slideshare.net/NWEA/schools-cant
- 2. How does it work in baseball? In baseball, the contribution of players to the success of the team can be measured (value-added). In baseball, general managers have complete control over the acquisition and deployment of players.
- 3. How does it work in baseball? Sabermetricians estimate the number of wins a player contributes to his team. It’s calculated by estimating the number of runs contributed by a player and adding the number of runs denied by that player’s defensive contributions.
- 4. So what are the issues? • We’ve confused players with managers. • The metrics are problematic. • We’ve chosen the wrong focus for policy.
- 5. Baseball hasn’t found a We assume the statistics methodology to effectively applied to players (teachers) apply sabermetrics to can be applied to their managers. (principals). managers
- 6. How does it work in classrooms? Brian’s projection gains on this for A gain students’ is estimated spring’s tests are compared to this his students. This projection may projection. If the gains exceed the take into account his student’s projection, we say Brian poverty past performance, their state Brian’s students took theproduced “value-added”. rate, and a variety of other factors. exam last spring Value-added methodologies attempt to isolate a teacher’s contribution to learning by measuring student growth while controlling or eliminating factors that influence growth but are outside the teacher’s control, such as student poverty. Last spring This spring
- 7. How does it work in classrooms? + .25 Brian’s students’ gains on this Brian’s gain is compared to that of spring’s tests and he is typically other teachersare compared to this projection. If score”, a exceed the assigned a “z the gains metric that projection, we say Brian produced shows where he stands relative to “value-added”. other teachers in the state. Last spring This spring
- 8. How are principals different? • They don’t directly deliver instruction to students. • Their impact cannot easily be measured within a school year Source: Lipscomb, S.; Teh, B.; Gill, B.; Chiang, H.; Owens, A (2010, Sept.). Teacher and Principal ValueAdded: Research Findings and Implementation Practices. Cambridge, MA. Mathematica Policy Research.
- 9. Three schools value-added math and reading results – who is the better principal? Math Reading 2 1.5 1 Many state assessment systems use a single year of data for principal evaluation. 0.5 0 -0.5 -1 -1.5 Langston Hughes Elem Scott Joplin Elem Lewis Latimer Elem
- 10. Langston Hughes Elementary Math Reading 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 2009-10 High Growth but not improving 2010-11 2011-12 2012-13
- 11. Scott Joplin Elementary Math Reading 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 2009-10 Below average growth, improving but decelerating 2010-11 2011-12 2012-13
- 12. Lewis Latimer Elementary Math Reading 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 2009-10 Below average growth, improving and accelerating 2010-11 2011-12 2012-13
- 13. So what are the issues? • We’ve confused players with managers. • The metrics are problematic. • The metrics are problematic. • We’ve chosen the wrong focus for policy.
- 14. How does it work in baseball? In baseball, each player creates his own metrics by getting on base, stealing bases, or making catches. The metrics directly reflect their performance.
- 15. Issues in the use of growth and valueadded measures Differences among value-added models Los Angeles Times Study Los Angeles Times Study #2
- 16. Issues in the use of value-added measures Control for statistical error All models attempt to address this issue. Nevertheless, many teachers value-added scores will fall within the range of statistical error.
- 17. Issues in the use of growth and valueadded measures Control for statistical error New York City New York City #2
- 18. What Makes Schools Work Study - Mathematics Value-added indexwithin Group 15.0 Year 2 10.0 5.0 0.0 -5.0 -10.0 -10.0 -8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 Year 1 Data used represents a portion of the teachers who participated in Vanderbilt University’s What Makes Schools Work Project, funded by the federal Institute of Education Sciences
- 19. Metrics matter NCLB metrics influenced educator behavior for a decade.
- 20. Metrics drive behavior The term “bubble kid” had a different meaning prior to 2000.
- 21. One district’s change in 5th grade math performance relative to Kentucky cut scores Mathematics Number of Students No Change Down Up Fall RIT
- 22. Metrics drive behavior Race to the Top changes the focus.
- 23. Number of students who achieved the normal mathematics growth in that district Mathematics Failed growth target Number of Students Met growth target Student’s score in fall
- 24. Gaming distorts results Testing conditions may be gamed to inflate results
- 25. Test duration and math growth between two terms in one school’s fifth grade Number of minutes for each student’s first testtest. 50 second 140 100 Minutes 30 80 20 60 10 Scale score growth The white line represents the average duration of the 40 second test. 120 40 0 20 The scale score growth attained by each child. 0 -10 Test 1 Duration Test 2 Duration Scale Score Gain The yellow line represents the average growth for fifth graders in this district
- 26. Test duration and math growth between two terms in all fifth grades in a district 90.0 25.0 80.0 60.0 Minutes 15.0 50.0 40.0 10.0 30.0 20.0 5.0 10.0 0.0 0.0 Test 1 Duration Test 2 Duration Scale Score Growth Scale score growth 20.0 70.0
- 27. Test duration and math growth between two terms in all fifth grades in a district 120.00 18.0 16.0 100.00 Minutes 80.00 12.0 10.0 60.00 8.0 40.00 6.0 4.0 20.00 2.0 0.00 0.0 Test 1 Duration Test 2 Duration Scale Score Growth Scale score growth 14.0
- 28. The problem with spring-spring testing Student’s spring to spring growth trajectory Teacher 1 3/12 4/12 5/12 Summer 6/12 7/12 8/12 Teacher 2 9/12 10/12 11/12 12/12 1/13 2/13 3/13
- 29. Metrics do not provide a complete picture of the classroom They don’t capture important noncognitive factors that impact learning.
- 30. The intangibles In baseball, the employment of sabermetrics has reduced the impact that a player’s intangibles has on personnel decisions. These intangibles may include leadership qualities, locker room presence, and other personality traits that may contribute to team success.
- 31. Non-cognitive factors In education, value-added Jackson (2012) argues that measurement have more teachers may has focused In on non-cognitive policy-makers on the impact baseball, the employment of teacher’s contribution to to factors that are essential sabermetrics has academic success, as focused student success like general managers reflected in test scores.on the attendance, grades, and player’s suspensions.contribution to the measures that These are not the only ultimately matter measures that matter in the sport, however. runs and wins.
- 32. Non-cognitive factors • Lowered the average student absenteeism by 7.4 days. • Improved the probability that students would enroll Employing value-added methodologies, in the next grade by teachers had a Jackson found that 5 percentage points. • Reduced the likelihood of suspension by substantive effect on non-cognitive 2.8% outcomes that was independent of their .05 • Improved the average GPA by .09 (Algebra) or effect on test scores (English) Source: Jackson, K. (2013). Non-Cognitive Ability, Test Scores and Teacher Quality: Evidence from 9th Grade Teachers in North Carolina. Northwestern University and NBER
- 33. So what are the issues? • We’ve confused players with managers. • The metrics are problematic. • We’ve chosen the wrong focus for • We’ve chosen the wrong focus for policy. policy.
- 34. Policy has focused on dismissal rather than retention. In baseball, exceptional players are much rarer than average ones. Thus it is vital for a team to keep its best players.
- 35. Employment of Elementary Teachers 20072012 1538000 The elementary school NUMBER OF TEACHERS teacher workforce shrunk by 178,000 teachers (11%) between May, 2007 and 1544300 1544270 May, 2012. 1485600 Source 1415000 1360380 2007 2008 2009 2010 2011 2012 Source: (2012, May) Bureau of Labor Statistics – Occupational Employment Statistics Numbers exclude special education and kindergarten teachers
- 36. The impact of seniority based layoffs on school quality In a simulation study of implementation of a layoff of 5% of teachers using New York City data, reliance on seniority based layoffs resulted would: • Result in 25% more teachers laid off. • Teachers laid off wouldSource standard deviations be .31 more effective (using a value-added criterion) than those lost using an effectiveness criterion. • 84% of teachers with unsatisfactory ratings would be retained. Source: Boyd, L., Lankford, H., Loeb, S., and Wycoff, J. (2011). Center for Education Policy. Stanford University.
- 37. If evaluators do not We must identify also differentiate their identify the the and protect least ratings, then all to effective teachers most effective differentiation with gain credibility teachers to improve comes from the test. public. the profession.
- 38. Results of Tennessee Teacher Evaluation Pilot 60% 53% 50% 40% 40% 30% 24% 20% 23% 22% 16% 12% 8% 10% 2% 0% 0% 1 2 Value-added result 3 4 Observation Result 5
- 39. Results of Georgia Teacher Evaluation Pilot Evaluator Rating 1% 2% 23% ineffective Minimally Effective Effective Highly Effective 75%
- 40. Ratings under new Florida teacher evaluation regulations Florida Teacher Evaluation Rating 80 74.6 70 61.9 60 50 40 30 36.9 22.6 20 10 0.9 2.1 0.2 0.5 0.1 0.2 Needs Improvement 3 Year Developing Ineffective 0 Highly Effective Effective 2011-12 2012-13
- 41. It’s good to learn from past failures.
- 42. What’s the analogy to schools? Policy makers believe valueadded metrics provide a statistical means to measure the effectiveness of teachers and principals.
- 43. What’s the assumed parallel to schools? Policy Policy-makers assume that reading and mathematics constitute adequate measures of effectiveness. Policy-makers assume that the principal controls the acquisition and deployment of talent.
- 44. The Cincinnati Approach - Method • Evaluators were trained and calibrated to the Danielson model • Both peer and administrator evaluators were used. • Each teacher was observed three times by a peer and once by an administrator. • Stakes were higher for beginning teachers than veterans. Source: Taylor, E. and Tyler, J. (2012, Fall). Can Teacher Evaluation Improve Teaching?
- 45. The Cincinnati Approach - Findings • In the first year, the average teacher improved student math scores by .05 SD, in subsequent years this improved to .11 SD, • Improvement was sufficient to move a 25th percentile teacher to near average. • Reading scores did not improve. • The evaluations retained a “leniency” bias typical of other evaluation programs. • The pilot cost was high, $7,500 per teacher.
- 46. The Cincinnati Approach - Context • In the first year, the average teacher improved student math scores by .05 SD, in subsequent years this improved to .11 SD, • Gains in the first two years of teaching are typically .10 SD in mathematics (Rockoff, 2004). • Gains from being placed with highly effective peers are .04 SD in mathematics (Jackson and Bruegmann,). • The pilot cost was high, $7,500 per teacher. Rockoff, J. E. (2004) “The Impact of Individual Teachers on Student Achievement: Evidence from Panel Data.” American Economic Review. 94(2): 247-252. Jackson, C. K. and Bruegmann, E., Teaching Students and Teaching Each Other: The Importance of Peer Learning for Teachers (2009, July). NBER Working Paper No. 15202 JEL No. I2,J24
- 47. Reliability of a variety of teacher observation implementations Observation by Reliability coefficient (relative to state test value-added gain) Proportion of test variance explained Principal – 1 .51 26.0% Principal – 2 .58 33.6% Principal and other administrator .67 44.9% Principal and three short observations by peer observers .67 44.9% Two principal observations and two peer observations .66 43.6% Two principal observations and two different peer observers .69 47.6% Two principal observations one peer observation and three short observations by peers .72 51.8% Bill and Melina Gates Foundation (2013, January). Ensuring Fair and Reliable Measures of Effective Teaching: Culminating Findings from the MET Projects Three-Year Study
- 48. Assessment Literacy in a Teacher Evaluation Framework Presenter - John Cronin, Ph.D. Contacting us: Rebecca Moore: 503-548-5129 E-mail: rebecca.moore@nwea.org This PowerPoint presentation and recommended resources are available at our SlideShare website:
- 49. Why it’s time we stopped pretending schools should be managed like baseball teams
- 50. Suggested reading Baker B., Oluwole, J., Green, P. (2013). The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the Race to the Top Era. Education Policy Analysis Archives. Vol 21. No 5.
- 51. Thank you for attending Presenter - John Cronin, Ph.D. Contacting us: NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org The presentation and recommended resources are available at our SlideShare site: http://www.slideshare.net/NWEA/tag/kingsbury-center
- 52. What about principals? The issue is the same with principals, it is difficult to separate the contribution of the principal to learning from the contribution of teachers. Source: Lipscomb, S.; Teh, B.; Gill, B.; Chiang, H.; Owens, A (2010, Sept.). Teacher and Principal ValueAdded: Research Findings and Implementation Practices. Cambridge, MA. Mathematica Policy Research.
- 53. How does it work in classrooms? + .25 Two very important assumptions • The teacher directly delivers instruction that causes learning! Last spring • The teacher’s impact can be measured within a school year! This spring
- 54. Four issues • How do you measure a principal? • How accurate and reliable are these measures? • What anticipated and unanticipated impacts do your measures have on behavior? • Where should our energy really be focused?
- 55. It’s good to learn from past failures.
- 56. So what are the issues? • We’ve confused players with managers. We’ve metrics are problematic. • The confused players with managers. • We’ve chosen the wrong focus for policy.
- 57. How does it work in education? Teacher or School Value-Added How much academic growth does a teacher or school produce relative to the median teacher or school?
- 58. So what are the issues