Frequently Asked Questions About Student Ratings: Summary of Research Findings
Student ratings of instruction are one of the most studied topics in all of higher education, with several thousand research articles and books addressing various aspects of this topic over the past 100 years. The summary below is intended to offer an overview of some common questions about ratings. This review is not designed to be exhaustive, but rather to present trends in the literature on these issues along with references for further reading. Individual studies challenging these trends exist, but as with any literature review we have tried to synthesize across studies and provide a synopsis of key themes of a large body of literature.
What do student ratings tell us about teaching effectiveness?
Several measures have been used to determine the extent to which ratings measure effective teaching. One set of studies has examined that relationship between ratings and student achievement in a course. Results of this approach have been mixed. Meta-analyses conducted in the 80s and 90s showed a moderate positive correlation between ratings results and learning as measured by exam scores. In general, these findings indicate that ratings of the course or instructor on overall items (‘Overall this is an excellent instructor’ ‘Overall this is an excellent course’) show a consistent positive relationship to achievement (i.e., in classes where the achievement level is high, instructors tend to receive high ratings). However, a more recent meta-analysis showed a less consistent positive relationship. Other studies have compared student ratings results with the ratings of colleagues and of trained observers, and, in general, these studies have shown positive correlations. Researchers have also found that ratings of overall instructional excellence are best explained by ratings of specific approaches to teaching (e.g., structuring the class, encouraging involvement, and establishing rapport) rather than extraneous factors. Finally, one area of concern about the validity of ratings is recent research suggesting that an instructor’s favorable ratings in one course do not necessarily predict their performance in subsequent courses. This could indicate that instructors with high ratings may simply be “teaching to the test” but not facilitating deep learning, or it could indicate a lack of curricular alignment between the instructor’s course and subsequent courses.
For information on student ratings as a measure of effective teaching:
Benton, S. L., & Cashin, W. E. (2012). Student ratings of teaching: A summary of research and literature. IDEA Paper No. 50. Manhattan, KS: The IDEA Center. Retrieved from http://www.theideacenter.org/sites/default/files/idea-paper_50.pdf
Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 118(3), 409-432.
Clayson, D. E. (2009). Student evaluations of teaching: Are they related to what students learn? A meta-analysis and review of the literature. Journal of Marketing Education, 31(1), 16-30.
Hativa, N. (2014). Student ratings of instruction: A practical approach to designing, operating, and reporting (2nd ed.). CreateSpace Independent Publishing Platform.
IDEA Research Note 1. (2003). The “excellent teacher” item. Manhattan, KS: The IDEA Center. Retrieved from http://ideaedu.org/wp-content/uploads/2014/11/Research_Note1_ExcellentTeacher.pdf
Murray, H. G. (2007). Low-inference teaching behaviors and college teaching effectiveness: Recent developments and controversies. In R. P. Perry & J. C. Smart (eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 145-200). Dodrecht, The Netherlands: Springer.
What do we know about the relationship between grades and student ratings?
Teachers in classes with higher expected grades have slightly higher ratings on teacher evaluations. Such results could mean that student ratings are biased and that instructors receive higher ratings because they give higher grades. However, in studies of courses with multiple sections, there are differences among instructors’ average ratings when there is no difference in average expected grade. Therefore, we cannot conclude that instructors receive higher ratings because they give higher grades. Research done in experimental settings confirms that the effect of grades on student ratings may be overstated. In these experiments, sections graded on a “C” curve do not give significantly lower evaluations than those graded on a “B” curve. In addition, studies of U-M course evaluation data have found small or no correlation between grades and overall global ratings of the instructor and course. Grading fairness, on the other hand, does appear to influence student ratings. Studies indicate that instructors need to grade fairly and consistently and give students realistic expectations about their grades. When instructors give grades that contradict students’ performance expectations, they do, in fact, receive lower student ratings.What do student ratings tell us about teaching effectiveness?
For information on grades and student ratings:
Benton, S. L., & Cashin, W. E. (2012). Student ratings of teaching: A summary of research and literature. IDEA Paper No. 50. Manhattan, KS: The IDEA Center. Retrieved from http://www.theideacenter.org/sites/default/files/idea-paper_50.pdf.
Cohen, P. A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research, 51, 281-309.
Feldman, K. A. (1989). The association between student ratings of specific instructional dimensions and student achievement: Refining and extending the synthesis of data from multisection validity studies. Research in Higher Education, 30, 583-645.
Holloway, J. P., & Meadows, L. (2011, February). Do high grades lead to high instructor evaluations? Poster presented at Fifth Annual Research and Scholarship in Engineering Education Poster Session. Available at http://crlte.engin.umich.edu/wp-content/uploads/sites/7/2013/06/Holloway_50x36.pdf
LaVaque-Manty, M., & Cottrell, D. (2015, October). Course evaluations at Michigan: What do we know? Ann Arbor, MI: University of Michigan Learning Analytics Task Force.
Marsh, H. W. (1987). Students' evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253-388.
Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642.
Should we believe student ratings reflect an instructor’s ability to “entertain”?
Early experiments indicated that instructors who were “entertaining” or “expressive” (i.e., witty, enthusiastic, theatrical, or engaging) received high student ratings even though they delivered very little information in their lectures. Later analysis revealed serious flaws in these studies and showed the relationship between expressiveness and ratings to be exaggerated. More recent studies indicate that expressive instructors receive higher ratings because their expressiveness helps boost student motivation to learn. When students are not highly motivated (e.g., in introductory, required courses), instructor expressiveness has a larger effect on student achievement than does the amount of content covered. Expressive instructors stimulate and maintain student attention, and students learn more when they are engaged in the subject. Students and faculty agree that instructor enthusiasm is an important element of effective teaching. Furthermore, expressiveness includes a range of specific behaviors related to good lecturing, such as speaking emphatically, using humor, and moving about during lecture. Trained observers found that highly-rated faculty exhibit these behaviors more frequently than other faculty. While expressiveness was important, it was by no means the only factor that explained high ratings. Highly rated instructors also used examples, stressed important points, and asked questions. As part of a joint venture with another group or institution, it is important for a representative from all parties to participate in this conversation.
For information on expressivity:
Abrami, P. C., d’Apollonia, S., & Cohen, P. A. (1990). Validity of student ratings of instruction: What we know and what we do not know. Journal of Educational Psychology, 82, 285-296.
Benton, S. L.,& Cashin, W. E. (2012). Student ratings of teaching: A summary of research and literature. IDEA Paper No. 50. Manhattan, KS: The IDEA Center. Retrieved from http://www.theideacenter.org/sites/default/files/idea-paper_50.pdf
Calkins, S., & Micari, M. (2010). Less-than-perfect judges: Evaluating student evaluations. Thought & Action: The NEA Higher Education Journal, Fall 2010, 7-22.
Cohen, P. A. (1986, April). An updated and expanded meta-analysis of multisection student rating validity studies. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco. (ERIC: ED 270 471).
What factors unrelated to teaching can influence student ratings?
Course characteristics that have been found to correlate with student ratings but not with teaching effectiveness include course level (upper-level courses receive higher ratings), size (smaller classes receive slightly higher ratings), and academic discipline (STEM and quantitative courses receive lower ratings). Research on the impact of workload/difficulty is mixed. Some studies have found that courses with higher workload actually receive higher ratings, while other studies have found that perceptions of a courses with higher levels difficulty receive lower ratings. Timing of when students complete ratings correlates with scores, too. Students tend to rate instructors lower, on average, during exam times. Faculty characteristics unrelated to instruction–such as instructor race, gender, age, and physical attractiveness–may also influence student ratings in complex ways. A good deal of student ratings research finds no significant effect of instructor identity on global student ratings (i.e., the questions pertaining to holistic evaluation of the course and instructor). However, aspects of instructor identity may interact with each other for an effect (e.g., race and gender may exacerbate biases), identity may interact with course content (e.g., a female instructor teaching in a male-dominated field), and there may be an impact on non-global ratings items or comments. Further, instructor characteristics can interact with student characteristics to affect ratings. For instance, research indicates that relative to ratings of male faculty, male students’ ratings of female faculty are more negative and female students’ ratings of female faculty are more positive. Caring and expressive qualities tend to be associated with female faculty and tend to be observed or valued more by female students. These findings could be due to different teaching behaviors used by male and female faculty and/or due to students reacting differently to teaching behaviors based on their own gender. More studies are needed to understand the extent to which race and other intervening variables, such as gender and age, affect student ratings and how they interact with each other.
For information on factors unrelated to teaching that can influence student ratings:
Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64, 431–441
Basow, S. A. (1998). Student evaluations: Gender bias and teaching styles. In L. H. Collins, J. C. Chrisler, & K. Quina (Eds.), Career strategies for women in academe: Arming Athena (pp. 135-156). Thousand Oaks, CA: SAGE.
Baslow, S. A. (2000). Best and worst professors: Gender patterns in students’ choices. Sex Roles, 45(5/6), 407-417.
Calkins, S., & Micari, M. (2010). Less-than-perfect judges: Evaluating student evaluations. Thought & Action: The NEA Higher Education Journal, Fall 2010, 7-22.
Centra, J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of teaching? Journal of Higher Education, 71 (1), 17-33.
Kardia, D. B., & Wright, M. C. (2004). Instructor identity: The impact of gender and race on faculty experiences with teaching. CRLT Occasional Paper No. 19. Ann Arbor, MI: Center for Research on Learning and Teaching, University of Michigan.
Kogan, L. R., Schoenfeld-Tacher, R., & Hellyer, P.W. (2010). Student evaluations of teaching: perceptions of faculty based on gender, position, and rank. Teaching in Higher Education, 15(6), 623-636.
LaVaque-Manty, M., & Cottrell, D. (2015, October). Course evaluations at Michigan: What do we know? Ann Arbor, MI: University of Michigan Learning Analytics Task Force..
Nilson, L. B. (2012). Time to raise questions about student ratings. In J. E. Groccia & L. Cruz (Eds.), To improve the academy: Resources for faculty, instructional, and organizational development, Vol. 31 (pp. 213-228). San Francisco, CA: Jossey-Bass.
Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642.
Sprague, J., & Massoni, K. (2005). Student evaluations and gendered expectations: What we can’t count can hurt us. Sex Roles: A Journal of Research, 53 (11-12), 779-793.
How should student ratings be used?
Student ratings are widely used in evaluations of faculty for promotion and tenure, but the practice is controversial because of shortcomings and distrust of ratings. Among three key dimensions of teaching—content expertise, instructional delivery skills, and instructional design skills—student ratings are best suited for evaluating the second dimension. Because students do not have the expertise to comment on them, the other two dimensions should be evaluated using other sources of data, such as department heads or peers. This echoes the overwhelming consensus among researchers in the field: student ratings offer just one measure of faculty teaching effectiveness and should be paired with other measures of teaching effectiveness when evaluating faculty. Other recommendations for using ratings for personnel decisions include the following: use results of several courses over several terms, focus on global items, do not overemphasize small differences in results, allow instructors to provide context by commenting on ratings results, and treat results from courses with low response rates or very low total numbers with caution (See CRLT’s Best Practices for Using Online Student Ratings for Personnel Decisions for a full discussion of these issues and others).
Another potential use of ratings is for instructional improvement. Ratings are more likely to lead to improvement in teaching when instructors discuss results with a colleague or a teaching consultant who can help them interpret the results and consider changes to their teaching in response to those results. Consultative feedback is most beneficial when additional sources of information (e.g., classroom observation, video recording of teaching) supplement student ratings and when consultative sessions are long enough to allow time for reflection. Ranking faculty against colleagues can have a negative motivational effect or help the instructor interpret ratings, depending on how ratings are used and understood in the larger context. Using normative data on a small, comparable subset of courses can make ratings more meaningful. Rather than presenting university-wide quartiles in a single term, for instance, more granular comparisons to the same or similar courses over time is preferable.
For information on using student ratings:
Arreola, R. (2006). Developing a comprehensive faculty evaluation system: A guide to designing, building, and operating large-scale faculty evaluation systems (3rd ed.). San Francisco: Jossey Bass.
Boysen, G. A., Kelly, T. J., Raesly, H. N., & Casner, R. W. (2014). The (mis)interpretation of teaching evaluations by college faculty and administrators. Assessment & Evaluation in Higher Education, 39(6), 641-656.
Center for Research on Learning and Teaching. (2015). Use of student ratings at the University of Michigan: Administrator and faculty perspectives. Ann Arbor, MI: Center for Research on Learning and Teaching, University of Michigan.
Centra, J. A. (1993). Reflective faculty evaluation. San Francisco: Jossey-Bass.
Kardia, D. B., & Wright, M. C. (2004). Instructor identity: The impact of gender and race on faculty experiences with teaching. CRLT Occasional Paper No. 19. Ann Arbor, MI: Center for Research on Learning and Teaching, University of Michigan.
Penny, A. R. (2003). Changing the agenda for research into students’ views about university teaching: Four shortcomings of SRT research. Teaching in Higher Education, 8(3), 399-411.
Penny, A. R., & Coe, R. (2004). Effectiveness of consultation on student ratings feedback: A meta-analysis. Review of Educational Research, 74(2), 215-253.
Zabaleta, F. (2007). The use and misuse of student evaluations of teaching. Teaching in Higher Education, 12, 55-76.
Good overviews of the literature
Aleamoni, L.M. (1987). Typical faculty concerns about student evaluation of teaching. In Techniques for evaluating and improving instruction, New Directions for Teaching and Learning, no. 31 (pp. 25-31). San Francisco: Jossey-Bass.
Benton, S. L., & Cashin, W. E. (2012). Student ratings of teaching: A summary of research and literature. IDEA Paper No. 50. Manhattan, KS: The IDEA Center.
Hativa, N. (2014). Student ratings of instruction: A practical approach to designing, operating, and reporting (2nd ed.). CreateSpace Independent Publishing Platform.
Marsh, H.W. & Dunkin, M.J. (1992). Students' evaluations of university teaching: A multidimensional approach. In J.C. Smart (Ed.), Higher education: Handbook of theory and research (Vol. 8, pp. 143-233). New York: Agathon Press.
Instructors will derive the greatest benefit from their student evaluations if they discuss the results with a colleague or a teaching professional. CRLT consultants will help instructors design evaluations, interpret results, and develop strategies for incorporating student feedback. CRLT also strongly encourages the use of multiple methods of evaluation and provides services to help instructors and units gather information from a variety of sources. For assistance call 734-764-0505 or visit our consultation request page.