This is one of the first qualitative studies of student perceptions of evaluation in undergraduate medical education. Our results might be of interest to faculty and programme directors who need to be aware of the assumptions and confounders underlying student ratings. This is of particular importance if evaluation results are used to guide resource allocation within medical schools . Medical students participating in focus group interviews identified almost all relevant aspects of course evaluation reported in the literature [16, 17] and were aware that this institutional process should gauge teaching quality by addressing various areas such as the content taught, teacher characteristics, and – most importantly – learning outcome. However, a number of contributions revealed that students did not use specific pre-defined criteria of good teaching (i.e., ‘benchmarks’) when appraising teaching quality. In the absence of such criteria, students referred to their gut feeling and single outstanding (negative or positive) events as major contributors to their overall course ratings. As many students preferred evaluation activities to be scheduled after end-of-course examinations, subjective ratings of teaching quality including learning outcome might be confounded by examination difficulty and individual scores . Unfortunately, a recent study on end-of-course examinations doubted that international minimum standards of assessment quality are currently being maintained in German medical schools, thus questioning the validity of exam scores regarding actual learning outcome . In addition, the definition of a successful individual learning outcome might substantially differ between students and medical educators. For example, a number of German Associate Deans for Medical Education have proposed to judge teaching success based on aggregated examination scores and pass rates , while the students interviewed in this study were mainly interested in individual learning outcome.
Our finding of a wide variety of confounders affecting student ratings is in line with previous quantitative research [7, 8] and suggests that overall course ratings may reflect student satisfaction with courses and teachers rather than teaching quality or actual learning outcome . Obviously, satisfaction with teaching is likely to result in higher motivation to learn, thus rendering student satisfaction an important moderator of learning behaviour and, eventually, learning success. However, faculty need to be aware that traditional evaluation tools do not explicitly measure outcome. We recently reported on a novel evaluation tool aimed at determining learning outcome regarding specific learning objectives . By using repetitive student self-assessments to calculate performance gain for specific learning objectives from all domains of medical education (knowledge, skills and attitudes), this tool produced reliable and valid data in one pilot study. In addition, results obtained with this outcome-based tool appeared to be unrelated to overall course ratings provided by students, thus potentially adding a new dimension to traditional evaluation tools . Obviously, this method should not replace evaluation focussing on structural and procedural aspects of teaching. Instead, it may add value to existing evaluation systems .
Students enumerated several quality indicators of teaching that encompassed a wide range of parameters pertaining to teachers, courses and the medical school as a whole. In contrast, suggestions regarding consequences to be drawn from evaluation results were mainly directed at individual teachers. This may be due to the fact that teacher characteristics appear to be crucial for student perceptions of teaching quality. While more research into this issue is warranted , it may be hypothesised that empathic and enthusiastic teachers can improve student learning by increasing their motivation to learn. However, this aspect is rarely specifically addressed in evaluation forms. Given the importance of the individual teacher, it does seem natural for students to favour evaluation systems entailing direct consequences for specific teachers. Most students preferred incentives over negative consequences. Published reports of instruments to increase motivation to teach usually refer to positive reinforcement measures [22–24]. In order to distinguish effective from potentially detrimental incentive systems, the views of teachers and programme directors should be considered. To this end, focus group discussions involving these stakeholders of undergraduate medical education may be useful.
Students listed a number of course and teacher characteristics that are frequently addressed in faculty development programmes (i.e., alignment of teaching to student level, prioritisation of important content, teacher feedback, adequacy of examinations; see Table 1) . This list stresses the relevance of teacher training with regard to improving teaching quality and increasing student motivation to learn. However, students were ambivalent regarding the effectiveness of teacher training in individuals with low motivation to teach.
As far as evaluation format was concerned, students consistently preferred online evaluations over paper-and-pencil methods. At the same time, participation in online evaluations was not given high priority, and students tended to postpone or forget to log on. Low response rates have been reported by many institutions using online methods; there is currently no clear solution to this problem [26, 27]. There was a general concern that evaluation frequently fails to meet its primary goal of improving teaching quality . These concerns might be addressed by providing students with feedback on the consequences of evaluation.
Limitations and suggestions for further research
Focus group discussions are a useful adjunct to quantitative statistical methods [29–31]. However, they have certain limitations. Thus, while providing in depth information on individual opinions and specific problems, they may not be fully representative of the group of interest. Both the number of groups and the number of students included were small but within the range used in similar research . Group composition was similar for all groups, and we did not attempt to sample specific sub-groups. Discussions were focussed on the issue of evaluation, and interviews were standardised . As no major new themes emerged from the third group discussion, it is likely that sampling was adequate for current purposes.
Only students voluntarily signing up for focus group discussions were included in the study. Thus, potential self-selection bias might have favoured those particularly interested in the subject. The proportion of female participants in focus groups (76%) was similar to the percentage (65%) recently found in a nationwide survey of German medical students . Since gender does not appear to impact heavily on evaluation results , the slight over-representation of females in our sample is unlikely to threaten the validity of our findings.
Rather than producing statistically representative data, qualitative research facilitates easy identification of general trends or patterns regarding the attitudes of the target group, establishing ‘functional and psychological representativeness’ . However, the assumption that data collection was relatively comprehensive is supported by the identification of a large number of aspects known to be relevant from more representative research (see above).
Moderators or participants themselves may influence the behaviour and responses of discussants. We have no reason to assume that our results have been particularly confounded by such factors; however, we cannot rule out this bias as a potential limitation of our study. To date, very few qualitative studies have focussed on student perceptions of evaluation. As a consequence, the validity of our findings needs to be confirmed in further studies in order to assess the generalizability of our results to other institutions and study subjects. While this study generated a set of variables deemed important by students, quantitative studies are needed to estimate the actual impact each of these factors has on student ratings of teaching quality. Finally, future research should be directed at the perspectives of teachers and programme directors on evaluation.