The virtue of a thing, Plato tells us in "The Republic," is that state or condition which enables it to perform its proper function well. The virtue of a knife is its sharpness; the virtue of a racehorse is its fleetness of foot. So too the virtue of any measurement tool lies in its reliability and validity. Validity is the overall concept used to refer to how good an answer the study yields. In other words, does the instrument measure what it is supposed to measure? Reliability is roughly the same as consistency or repeatability. The concept applies to either operational definitions or to measuring devices.
Thursday, the Daily News published an article entitled "Evaluations Controlled by Departments." In that report Terry King, provost and vice president for Academic Affairs, pointed out that evaluations are "beneficial because they are a way for a professor to receive feedback and for students to express opinion." The comments of the department chairs and the deans were mainly related to different methods of administering the student evaluation forms.
Absent from the administrators' comments was any concern for the reliability and validity of the instruments used to evaluate faculty members. The results of a longitudinal study I conducted, which included more than 700 students' evaluations in the Miller College of Business, clearly show that the instrument used for student evaluation is neither reliable nor valid. The unreliability and invalidity of the instrument can be attributed to the lack of operational definitions of the concepts the instrument attempts to measure (fairness, respect, encouragement, etc.). Without having operational definition of these concepts, there is no way either to know the intended meanings of the students' responses or to take corrective course of action. In short, students' feedbacks become meaningless.
The main question is: Why are the administrators using invalid and unreliable data for tenure, promotion and merit pay decisions? One possible explanation is that they likely carry the baggage of Platonic heritage that seeks sharp essences and definite boundaries, although nature often comes to us as irreducible continua. As Jay Gould pointed out this heritage leads us to view statistical measures of central tendency wrongly, indeed opposite to the appropriate interpretation in our world of variation, shadings and continua.
My recommendation to the administrators is this: Get rid of invalid and unreliable statistics. You have nothing to lose but your false assumptions.
Write to Shaheen at sborna@bsu.edu