“Who writes what?” Using written comments in team-based assessment to better understand medical student performance: a mixed-methods study

Background Observation of the performance of medical students in the clinical environment is a key part of assessment and learning. To date, few authors have examined written comments provided to students and considered what aspects of observed performance they represent. The aim of this study was to examine the quantity and quality of written comments provided to medical students by different assessors using a team-based model of assessment, and to determine the aspects of medical student performance on which different assessors provide comments. Methods Medical students on a 7-week General Surgery & Anesthesiology clerkship received written comments on ‘Areas of Excellence’ and ‘Areas for Improvement’ from physicians, residents, nurses, patients, peers and administrators. Mixed-methods were used to analyze the quality and quantity of comments provided and to generate a conceptual framework of observed student performance. Results 1,068 assessors and 127 peers provided 2,988 written comments for 127 students, a median of 188 words per student divided into 26 “Areas of Excellence” and 5 “Areas for Improvement”. Physicians provided the most comments (918), followed by patients (692) and peers (586); administrators provided the fewest (91). The conceptual framework generated contained four major domains: ‘Student as Physician-in-Training’, ‘Student as Learner’, ‘Student as Team Member’, and ‘Student as Person.’ Conclusions A wide range of observed medical student performance is recorded in written comments provided by members of the surgical healthcare team. Different groups of assessors provide comments on different aspects of student performance, suggesting that comments provided from a single viewpoint may potentially under-represent or overlook some areas of student performance. We hope that the framework presented here can serve as a basis to better understand what medical students do every day, and how they are perceived by those with whom they work.


Background
Observation of the performance of medical students in the clinical environment is a key part of assessment and learning. There is increasing recognition of the importance of interprofessional education in medical education [1], and of the need to assess the performance of medical professionals working as part of a larger interprofessional team [2,3]. Numerous tools have been designed to assess observed medical student performance [4,5]. The ideal assessment system should sample observations widely and systematically, and generate written or verbal comments describing observed performance (qualitative data) as well as numerical data on some form of ratings scale (quantitative data) [6].
Many traditional assessment tools employ a numerical ratings scale followed by a section for general comments (eg. "Comments on this Trainee"). Although written comments have been utilized in this way for many years, little research has focused on how these comments are generated and what they represent. The few studies which have considered the use of written comments in feedback suggest that they may contain useful information and may be able to improve performance [5,7,8]. Recently there has been increased interest in the use of written comments to strengthen existing assessment methods, especially in the assessment of professionalism [9][10][11][12]. There is evidence that teachers and learners may place more emphasis on written comments than numerical ratings [13], and that the observations recorded as written comments may be distinct from those associated with numerical ratings of academic success [14]. A number of authors have pointed out that physicians' direct observations of their learners' day-to-day clinical performance on the healthcare team may be limited, that their ratings and comments may be prone to indirect inference and positive bias [6,[15][16][17], and that physicians sometimes provide students with global impressions on generalized behaviours instead of giving specific advice on how to improve [18][19][20][21].
In previous work, we established the feasibility and acceptability of a team-based assessment model in a surgery clerkship, in which student performance is observed by a range of assessors on the surgical healthcare team including physicians, residents, patients, nurses, peers and administrators [22,23]. This method of assessment utilizes both numerical ratings and written comments. It was the aim of this study to examine the quantity and quality of written comments provided to medical students by different assessors, and to determine on what aspects of medical student performance different assessors can observe and provide comments.

Methods
In the academic year 2009/10, the 7-week, Year 3 clerkship in General Surgery, Anesthesiology & Pain Medicine at our medical school adopted a team-based method of assessment, as previously described [22,23]. The assessment plan employed a multiple choice examination, an objective structured clinical examination, a reflective written assignment and the team-based assessment (TBA) of clinical performance. For the TBA element, each student had assessment forms completed by the following groups of assessors (number of forms in brackets): physician (surgeon) [6], physician (anesthesiologist) [2], chief resident [2], operating room nurse [2], patient [6], ward nurse manager on behalf of a team of ward nurses [2], peers (anonymous, 4-6) and administrators [1]. Forms were designed with the input of assessors and students, and contained areas for written comments on "Areas of Excellence" and "Areas of Improvement." We chose to provide assessors with 'cues' in this way to avoid general or generic comments [24]. Each form also contained a number of items requiring a response required on a 5-point Likert scale, as previously described [23]. Forms were completed on paper, in person at the conclusion of a period in which the assessor was working with the student, with the exception of the peer forms (completed anonymously) and the administrator forms which were completed at the conclusion of the clerkship. Assessors were also provided with information and training about the new assessment method: posters were placed in prominent locations and information and advice was provided in person and online. Assessors were advised that providing written comments to students was encouraged but was not mandatory, and that the purpose of giving comments was to provide formative feedback. At the end of the clerkship, assessment forms were collected and all written comments were transcribed and entered into an electronic database. Each student was provided with a one-page "Summary of Assessment" listing all of the written comments given by each assessor type, shown in Figure 1 [23].
A mixed-methods analysis was used to examine the written comments provided to students. Comments were anonymized before analysis, so that individual students and assessor could not be identified. The first step was a quantitative analysis, counting the number of comments and words provided to each student, broken down by assessor type and by "Excellence" versus "Improvement". The second step was a thematic content analysis to review and categorize each comment, constructing a conceptual framework which described all of the comments provided [25]. This step was supplemented by the generation of "word clouds" for comments provided by each assessor group. This technique summarizes large amounts of data by presenting individual words in a 'cloud' in which the size of the word is directly related to its frequency of occurrence in the dataset ( Figure 2). Two readers read all of the comments provided by each assessor group, and met weekly for 8 weeks to develop a framework which described the range of comments observed. The unit of analysis was not the comments given to the individual student, but rather the comments provided by each group of assessors. As far as possible, domains and subdomains within the framework were named using direct quotation from assessor comments (in situ coding). Apart from comments from the administrator, each sub-domain represented at least two written comments from at least two assessors in the same group. As a wide range of behaviours was described in the written comments, multiple meetings were required to refine the coding structure and achieve consensus to ensure that the meaning of each domain and sub-domain was clear, and that each subdomain was clearly separate from the others. Categorization and coding was continued until data saturation was achieved, and no further categories emerged. Once coding was completed, both readers read through the entire set of comments once again to ensure that no further domains or sub-domains could be identified. As a final step, the number of domains and sub-domains represented in the comments from each assessor group was also calculated. Approval for this study was granted by the University of Alberta's Health Research Ethics Board (reference #8891).

Quantitative analysis
87% of forms completed contained written comments. 1,068 assessors and 127 peers provided 2,988 written comments comprising a total of 22,183 words for 127 students.

Qualitative analysis
The conceptual framework generated contained four major domains: 'Student as Physician-in-Training' , 'Student as Learner' , 'Student as Team Member' , and 'Student as Person' (presented with representative quotations in Table 2). A total of 39 sub-domains were also identified. The domains and sub-domains represented in the comments from each assessor group is shown in Table 3.
The domain 'Student as Physician-in-Training' was developed to describe comments relating to the behaviours often referred to as 'clinical skills' , i.e. those skills involved in the day-to-day duties of "doctoring". There were 14 sub-domains which described students' general medical knowledge and rapport with patients, and students' communication with patients (history taking, listening, explaining), physical examination, organization of findings (information management, critical thinking, diagnosis and management, written notes and oral presentation) and technical skills (gloving/gowning, asepsis, procedures). The number of sub-domains covered by each group was as follows: physician: 9, resident: 9, OR nurse: 5, ward nurse: 4, administrator: 1, peer: 5, patient: 7. Most assessor groups provided comments relating to medical knowledge, information management and procedural skills. Physicians and residents commented on critical thinking and oral presentation skills. Residents, peers and ward   nurses commented on written notes. Only physicians, residents and patients commented on history-taking; only physicians and residents commented on skills relating to diagnosis and management; only OR nurses commented on aseptic technique and gowning/gloving. Only patients provided comments on listening and explaining. The domain 'Student as Learner' was developed to describe comments relating primarily to the student's attitudes and behaviour relating to learning. Eight subdomains were identified including: interest/enthusiasm, initiative & self-direction, preparation for learning, asking questions to learn, openness to feedback & self-improvement, progress & improvement, confidence and career suitability. The number of sub-domains covered by each group was as follows: physician: 8, resident: 6, OR nurse: 6, ward nurse: 6, administrator: 2, peer: 5, patient: 3. The majority of assessor groups provided comments in the areas of interest/enthusiasm, initiative, preparation to learn and student confidence. Physicians and nurses commented on openness to feedback, suggesting they were providing feedback to students. Seven of the eight assessor groups commented on student confidence, but only physicians and patients commented on student progress and improvement (eg. 'this student has improved/will improve with more experience'). Physicians, residents and patients commented on the student's suitability for medicine in general or for a certain speciality in particular.
The domain 'Student as Team Member' was developed to describe comments relating directly to the student's work within the healthcare team, and their interactions with other team members. Eight sub-domains were identified here, including: work ethic, organization/time management, conscientiousness, helpfulness, team communication, cooperation, follows instructions and leadership. The number of sub-domains covered by each group was as follows: physician: 5, resident: 6, OR nurse: 5, ward nurse: 4, administrator: 2, peer: 7, patient: 0. Student peers provided the most comments in this domain, covering 7 of the 8 sub-domains (omitting 'follows instructions'), and were the only group to provide comments on 'leadership' (eg. 'student could take a leadership role on the team'). Physicians provided no comments on cooperation with other team members, but these sub-domains were covered by comments from operating room nurses and ward nurses, who also commented on team communication.
Administrators commented on organization/time management and team communication. Patients provided no comments at all in this domain. The domain 'Student as a Person' was developed to describe comments relating specifically to students' personal attributes. Nine sub-domains were identified, including: compassion, patient-centredness, personability, respect for others, humour, politeness, resilience, common sense and honesty/integrity. The number of subdomains covered by each group was as follows: physician: 4, resident: 3, OR nurse: 3, ward nurse: 5, administrator: 2, peer: 6, patient: 7. All assessor groups commented on personability, and most groups commented on compassion, patient-centredness, respect for others and politeness. Only peers and patients commented on humour, only peers commented on resilience, and only physicians commented on common sense.
Residents and physicians provided the most representation of "Physician-in-Training" (64% of sub-domains each), and physicians provided the most representation of "Learner" (100% of sub-domains). Peers and residents provided the most representation of "Team Member" (88% and 75% of sub-domains respectively), while patients and peers provided the most representation of "Person" (78% and 67% of sub-domains respectively).

Discussion
This study demonstrates that a wide range of observed medical student performance is recorded in written comments provided by members of the surgical healthcare team. This study also demonstrates that different groups of assessors provide comments on different aspects of student performance, suggesting that comments provided from a single viewpoint may potentially under-represent or overlook some areas of student performance.
In their roles as teachers and expert clinicians, we suggest that physicians see students primarily as trainee doctors with a certain set of skills which they must learn. Thus, "Physician-in-Training" and "Learner" comprise over half of the sub-domains in the framework, and these domains are well-represented in the written comments provided by physicians. In contrast, physicians covered fewer of the "Person" and "Team Member" domains; comments in these domains came more often from peers and residents ("Team Member") and patients, nurses and peers ("Person"). We hypothesize that this is because peers and residents work with students more closely than physicians, and that patients, nurses and peers relate more closely to students as people than do physicians. We suggest that in providing written comments, physicians focus more on the cognitive knowledge and skills associated with learning medicine (knowing, thinking, doing, reading, learning) and less on "softer" interpersonal behaviours (listening, explaining, helping, cooperating); it is possible that physicians find some elements of performance more legitimate to provide a written comment on than others.
Our findings suggest that important aspects of medical student performance can be observed by non-physician members of the surgical team. Some sub-domains were covered by multiple assessor groups, but in several areas comments were provided by only one group of assessors. These included listening & explaining (patients), physical examination (patients), organization of findings: diagnosis and management (residents), technical skills: gowning/gloving and asepsis (operating room nurses), follows instructions (operating room nurses) and leadership and resilience (peers). This observation suggests that each group of assessors brings something valuable to student assessment, and that omitting comments from one group would lead to the loss of observations from specific areas of student performance; as Lockyer and Clyman write, "additive value is accrued from comparison of multiple sources" [26]. While physicians could solicit comments from other assessor groups (patients, nurses, etc.), we believe that the method described here is more valid as it allows other members of the healthcare team to provide first-hand observations directly to students. We believe that including comments from other team members has the potential to improve student assessment by facilitating sampling from more domains of medical student performance [27,28].
Others have shown that non-physicians are able to evaluate the performance of physicians in training and practice [29][30][31], and have suggested that different assessor groups rate performance in different ways [32]. Several papers have suggested that assessment by nurses may yield different information than that obtained from physicians [28,33]. Peer assessment of medical students is also well-established and has been shown to improve student performance, especially interpersonal skills and professionalism [34][35][36]. Feedback from patients has also been shown to be useful, although most ratings and comments received are positive and complimentary, as we have observed [37][38][39]. Receiving feedback directly from patients is also likely to increase students' awareness of the patients' perspective on illness [40]. There is little work on the use of administrators in assessing medical students, but we believe that including their opinions is important as it is fairly simple to do and allows observations on issues such as absence, lateness and respect for non-medical staff. We also found that written comments provided by a range of assessors was a rich source of data for student assessment which proved helpful when making decisions on academic promotion and advancement in the months after the clerkship had finished.
We were pleasantly surprised at the quality and quantity of written comments provided to students, and hypothesize that the immediacy of the assessment model, with comments being written immediately after a period working with the assessor may have accounted for the large number of comments provided. Asking for 'prompted comments' in response to ' Areas of Excellence' and ' Areas of Improvement' may have helped assessors provide specific comments instead of more general observations [24]. We were also pleased with the number of comments received from the 'non-traditional' assessor groups such as peers, nurses and patients; together, these made up more than half of the comments received. We noted that the "valance" of written comments (the ratio of positive to negative) was 2,363:625, a ratio of 3.8:1. Others have reported valance ratios ranging from 2:1 to 15:1 [21,24,41].
We believe that the 4 main domains of the framework described here are distinct from one another and provide a useful way of considering medical student performance. The framework describes the clinical performance of medical students in more detail than previous work, and also provides more clarity on larger constructs such as 'personality' and 'clinical skills'. It is also the first framework to consider comments from a range of assessor groups, and the first to suggest that a longer list of specific sub-domains can be grouped into four main areas of performance. The framework we have presented here has two potential applications. Firstly, it serves as a theoretical basis to explain what medical students do every day and how they are perceived by those with whom they work. Secondly, it has a practical application in helping to guide the written feedback which assessors provide to students; for instance, assessors could perhaps be reminded: "when commenting on a student's performance, try to write down one thing from each of the four domains: Physicianin-Training, Learner, Team Member and Person".
A number of other authors have also developed frameworks to describe the range of written comments given to medical trainees by physicians. Lye's 2001 study of comments provided to students on a pediatrics clerkship identified a total of 12 domains of clinical performance, many of which are similar to those identified in our study [42]. Plymale et al. also studied comments given to students on a surgery clerkship and identified 21 domains of performance [8]. Sokol-Hessner identified 20 domains of performance in comments given to clerkship students [19], while Frohna et al. identified eight possible domains [41], and Schum six [24].
There are several arguments which support the validity of the framework developed in this study. Firstly, the number of comments received appeared to vary by the amount of time that each group of assessors would have been expected to spend with a student: thus, physicians, patients and peers spent the most time with the students, and left the most comments while administrators had the least interactions with the students and left the fewest comments. Secondly, there was evidence that the content of comments varied between assessors groups based on the expected context of the interaction with the student; in general, the assessor groups commented on what they would be expected to observe, and did not comment on areas they would not be expected to observe. Thus, operating room nurses commented on aseptic technique and not history-taking skills, while physicians commented on oral presentation skills but not helpfulness or cooperation. Patients did not comment on 'Student as a Working Member of the Team' at all, as they did not see them working in that context. It is interesting to compare this framework to other frameworks such as CanMEDs [43]. The domain of 'Physician-in-Training' corresponds most closely with 'Medical Expert' , while 'Team Member' aligns best with 'Collaborator'. Other CanMEDs domains such as 'Communicator' and 'Professional' are represented in various sub-domains, while 'Scholar' and ' Advocate' were not strongly emphasized in the comments. It is interesting to note that much of the material coded under 'Student as Person' is not present in CanMEDs. A recent study describing a framework for written comments given to residents suggests that many written comments can be mapped onto CanMEDs domains, but that some comments fall outside the CanMEDs framework [44]. Two of the areas identified were 'disposition (attitudes and personality)' and 'trajectory (norm reference, improvement and future predictions)'; we observed similar types of comments in this study, coded under 'Student as Person' and 'Student as Learner: Improvement/Suitability'. While some may consider some of the sub-domains listed under 'Person' as intrinsic traits (eg. common sense, sense of humour), we believe that it is important to provide students with information about how their behaviour is perceived by their patients and those with whom they work, with the intention of helping them develop into more effective professionals.
This study has several limitations. The first is that it deals only with words written down to describe what was discussed at an encounter taking place to discuss performance which had already been observed. There is evidence that much of what is discussed in person is not recorded in written comments [19,45]; this concurs with anecdotal reports of the same in our program, and thus we surmise there may be elements of student performance which were not recorded. We hope that including comments from a variety of assessor groups ameliorates this effect to some degree. Secondly, we considered the possible interaction of the numerical items on the ratings form with the comments which were written down. It is possible that assessors would 'take a cue' to write a comment about an area of performance mentioned in the numerical items [41], or perhaps would not write a comment about a particular area of performance as a numerical rating had already been given. We did not detect any strong evidence of this; while many written comments corresponded with items on the ratings form, many written comments did not relate to any particular item. Thus, we could not entirely exclude this effect. Our study was limited in examining comments by administrators, as it included only one assessor of this type; we hope to conduct additional studies with a larger number of administrators in future. Lastly, our findings relate to the context of a hospital-based surgery clerkship; it is possible that different findings would be obtained if the study were repeated with different healthcare teams working in different settings.
We agree with other authors that written comments provided to students are a rich source of data [41]; we plan to continue to use and study this method of assessment, and to further validate the framework of comments we have developed. In future studies, we will compare written comments given to students in different clerkships, using comments given to individual students as the unit of analysis, and will also investigate the ways in which assessors decide on the content of the written comments they provide. We encourage others to apply the framework presented here in other settings to further refine our understanding of what medical students do and how it is perceived.

Conclusions
In assessing the performance of medical students using written comments, it is important to consider "who writes what". The study shows that written comments provided to medical students related to a wide range of observed student performance, and that different groups of assessors provide comments on different aspects of student performance. Comments provided from a single viewpoint may thus potentially under-represent or overlook some areas of student performance. The conceptual framework presented here may be useful in better understanding medical student performance, and in improving the content of written comments provided to students.