By John Mauer
The education of our children is an important local priority. Yet we currently measure its effectiveness using yearly standardized tests, as mandated by the federal government. In fact, Connecticut had such tests a decade prior to the intervention of the federal government. But do these tests actually aid in evaluating our schools? How do they measure the value added by our teachers?
Our local high school, Housatonic Valley Regional High School, provides a good case in point. Housy always, almost always, exceeds the state average for students meeting state testing goals. For instance, in reading, Housy exceeded the state average four of the last five years as shown.
This appears to be satisfactory performance, until one realizes that the state average is pulled down by the inner city schools. And this does provide a direct measure of student performance measured against state standards. But is it indicative of the school and its teaching? Does student performance depend on other factors besides the teaching environment?
One of the best ways of measuring the influence of teachers and the curriculum is to build a statistical comparison of the standardized tests, year to year. In that way, the effect of student performance is minimized; we assume that the same students perform the same way, year to year. The model we use, as shown in Meyer and Dokumaci, will make use of Connecticut standardized tests. In particular, we will use the Connecticut Mastery Tests (CMT) for the 8th grade and the Connecticut Achievement and Performance Tests (CAPT) for 10th grade. As in the data above, we use the Reading test results because Housy emphasizes the humanities in its curriculum.
CAPT = λ*CMT + [Student Effects] + [State Effects] + [Unknown Student Effects]
The school and district effects are not included; most small towns in Connecticut have only a single high school fed by one or more elementary/middle schools.
In order to evaluate this model, we used over 100 Connecticut schools that met those desired characteristic, and where good data existed at the state level.
As can be seen, the model fits reasonably well. From the R2, the high schools added about 46% of the value of the test on average; the elementary schools provided some of the base. The state effect was negative, but this could easily be due to a difference in the difficulty of the tests. The slope of the model exhibits diminishing returns from education at this level.
Of more interest is the ordered plot of all the schools:
On this plot of reading enhancement, the ranking of the schools is obvious and indicates the value added of each school relative to the other schools. Our local high school was 109th out of 119 schools studied, a very poor ranking for a school concentrating in the humanities.
Of course, we must consider unknown student effects. In particular, a significant number of students in Region 1’s six elementary schools go to private school. This leaves the perception that the remaining students are less capable than those who left, a “brain drain”. Consider the following flow of students:
While it is very difficult to account for the students coming into Housy, we can compare the CMT scores of all students in 8th grade, and the subset that go to Housy. When we do that for reading we find that there is no statistical difference between the two groups; there is no “brain drain”. Private school attendees, on average, are no more capable than those who remain in the public school system.
In summary, we can compare the effects of different schools on a good relative scale. The value added of teaching can be measured. However, we cannot determine the cause of any differences; say the teachers, the curriculum, or the school environment. And measuring individual teachers is likely to be more difficult, not impossible if the subject matter is well tested, just more difficult.
 “Value-Added Models and the Next Generation of Assessments”, Meyer and Dokumaci, Value-Added Research Center, University of Wisconsin–Madison, 2011
 For the mathematically curious, this data fits a Weibull distribution, not a Normal, so relative separation by standard deviations cannot be used.