- Research article
- Open Access
- Open Peer Review
Reliability of the Interprofessional Collaborator Assessment Rubric (ICAR) in Multi Source Feedback (MSF) with post-graduate medical residents
BMC Medical Educationvolume 14, Article number: 1049 (2014)
Increased attention on collaboration and teamwork competency development in medical education has raised the need for valid and reliable approaches to the assessment of collaboration competencies in post-graduate medical education. The purpose of this study was to evaluate the reliability of a modified Interprofessional Collaborator Assessment Rubric (ICAR) in a multi-source feedback (MSF) process for assessing post-graduate medical residents’ collaborator competencies.
Post-graduate medical residents (n = 16) received ICAR assessments from three different rater groups (physicians, nurses and allied health professionals) over a four-week rotation. Internal consistency, inter-rater reliability, inter-group differences and relationship between rater characteristics and ICAR scores were analyzed using Cronbach’s alpha, one-way and two-way repeated measures ANOVA, and logistic regression.
Missing data decreased from 13.1% using daily assessments to 8.8% utilizing an MSF process, p = .032. High internal consistency measures were demonstrated for overall ICAR scores (α = .981) and individual assessment domains within the ICAR (α = .881 to .963). There were no significant differences between scores of physician, nurse, and allied health raters on collaborator competencies (F2,5 = 1.225, p = .297, η2 = .016). Rater gender was the only significant factor influencing scores with female raters scoring residents significantly lower than male raters (6.12 v. 6.82; F1,5 = 7.184, p = .008, η 2 = .045).
The study findings suggest that the use of the modified ICAR in a MSF assessment process could be a feasible and reliable assessment approach to providing formative feedback to post-graduate medical residents on collaborator competencies.
In Canada, medical education at the undergraduate, post-graduate and continuing medical education (CME) levels is supported by the CanMEDS framework of the Royal College of Physicians and Surgeons of Canada . The CanMEDS framework describes seven core roles that physicians should demonstrate competence in: Medical expert, Communicator, Collaborator, Manager, Health Advocate, Scholar, and Professional . “Competency” has been defined as a dynamic concept that encompasses an understanding of the knowledge, clinical skills, interpersonal and problem solving skills required for excellence in professional performance . The Collaborator role, in particular, defines physician collaboration as “effectively working within a health care team to achieve optimal patient care” . The two key competencies for this role require that physicians are able to: 1) Participate effectively and appropriately in an interprofessional health care team; and 2) Effectively work with other health professionals to prevent, negotiate and resolve interprofessional conflict.
Competency-based education places a greater emphasis on the attainment of required competence and the practice of skills in the real environment. An essential principle of competency-based education is the ability to assess objectively for the achievement of competence. Massagli and Carline  suggest that physician competence is multi-dimensional and that no single tool is capable of assessing all aspects of competence. It has been recommended that assessment of the CanMEDS roles should be based upon a multi-faceted approach that occurs at varying times to assess different aspects of skill, attitude, behavior, and performance . Many instruments have been reported for the self-assessment of attitudinal shifts, however there needs to be greater emphasis placed on the development of tools which rely on objective, external observer measurements of all types of competencies (knowledge, skills and attitudes) for interprofessional collaboration .
Multi-Source Feedback (MSF) has become a popular assessment process in medical education -. MSF, also known as 360-degree assessment, has been described as the use of specific processes and instruments for undertaking workplace-based assessment. Evidence on performance in the workplace can be collected from multiple sources, including senior colleagues, peers, nurses, other healthcare workers, and patients . This method can be used for both formative and summative assessment purposes and has been widely used in postgraduate medical education ,, and CME . MSF originated during the second World War and was adopted in industrial settings for employee performance evaluation and began to be adopted within healthcare in the late 1990’s ,. As of 2009 over 4000 residency programs in North America and the UK report using MSF to assess residents and fellows . MSF feasibility, reliability, and validity have been studied in various medical speciality programs including, but not limited to: Emergency Medicine , Internal Medicine , Obstetrics/Gynecology , Pathology , and Psychiatry .
Massagli and Carline  and Campbell et al.  note that MSF is best utilized when incorporated as part of a formative process of assessment whereby residents can review the results, or are provided feedback, to develop a plan of action to reach competency with their mentor or residency director. Systematic reviews of the literature on MSF concluded that incorporation of multiple perspectives in various environments is essential to evaluate performance -. Participating residents felt that the evaluations increased their awareness of how they interacted with patients . When ratees take part in the evaluation process, it allows self-reflection, increased engagement in the evaluation process, and comparison as to how their self-assessment aligns with those they interact with. Similarly, it allows assessment from the perspectives of individuals who may rarely offer input, such as nurses, allied health professionals or even patients. Joshi et al.  demonstrated that in a stable institution, with a relatively small number of residents, MSF is a practical, effective evaluation of interpersonal and communication skills. More quantitatively, a systematic review by Donnon et al.  concluded that adequate statistical reliability and generalizability is achieved with the 41 participants: 8 medical peers, 8 non-physician co-workers (nurses, psychologists, pharmacists, and other allied health professionals) and 25 patients.
Hammock et al.  suggests MSF is an important mechanism for influencing the delivery of interprofessional education (IPE) by increasing awareness of the roles of other health professionals that contribute to quality patient care. Unfortunately, nursing staff are infrequently involved in resident evaluation as often it is only the attending physicians participating in completing surveys or questionnaires for a specific rotation . Nursing staff may observe different aspects - such as team relationships, interactions with patients and family, and humanistic attitudes - of a resident’s performance that may not be viewed by attending physicians and thus may offer a unique perspective during resident assessment ,. The ability of residents to create and maintain positive collaborative relationships with nursing staff is essential for patient safety and in establishing a mutually supportive clinical environment . Studies have reported that physicians, faculty, nurses, and allied health professionals and patients can reliably rate physicians’ humanistic behavior ,,. Al Ansari et al.  conducted a meta-analysis that demonstrated acceptable construct validity using the MSF process for the assessment of physicians and surgeons across the multiple years of a residency, or in practice.
The Interprofessional Collaborator Assessment Rubric (ICAR) was originally developed for use in the assessment of interprofessional collaborator competencies . The development of ICAR was guided by an interprofessional advisory committee comprising health professional educators from the fields of medicine, nursing and the rehabilitative sciences. The Rubric dimensions are based on interprofessional collaborator competency statements that were developed and validated through a typological analysis of national and international competency frameworks, a Delphi survey of experts, and interprofessional focus groups with students and faculty.
The purpose of the study was to evaluate the feasibility and reliability of the use of the ICAR in a MSF process for assessing post-graduate medical residents’ Collaborator competencies.
The original version of the ICAR contains 31 evaluative items organized into 6 domains. Domains and associated items reflect competency statements of the Royal College of Physicians and Surgeons of Canada CanMEDS Collaborator role. Each item on the original ICAR is evaluated on a scale of 1 to 4 (1 = Minimal, 2 = Developing, 3 = Competent, 4 = Mastery) based on the frequency of demonstrated ability of the trainee as outlined by behavioral indicators. The content validity of the original ICAR version was reviewed with a small group of clinician-educators (MDs) with the purpose of affirming the relevance of items within a post-graduate medical education context. Items were removed or retained based on the level of agreement on each item between the reviewing physicians. From this review, a 17-item modified ICAR was pilot tested. The pilot study encompassed daily assessments (over a two week study period) of post-graduate trainees’ Collaborator competencies. The purpose of the pilot study was to evaluate the feasibility and inter-rater reliability of the ICAR. The pilot study results led to a modified revision of the ICAR in which the scoring scale was expanded to 9-points where “1 = well below expectations, 5 = meets expectations, and 9 = well above expectations” (http://www.med.mun.ca/CCHPE/Faculty-Resources/Interprofessional-Collaborator-Assessment-Rubric.aspx or: http://bit.ly/Rubric). The following methodological discussion pertains to the subsequent field test using the modified ICAR in a MSF assessment approach.
Residents and medical staff from four post-graduate medical education programs (Internal Medicine, Obstetrics/Gynecology, Neurology, and Orthopedic Surgery) were recruited to participate in the field test study. Residents - from these disciplines - completed 4-week rotations on one of five medical/surgical units. Residents were excluded from the study if, during the assessment period, they were on a rotation, or elective, outside of the research hospital. Due to the inclusion/exclusion criteria of the study sixteen (n = 16) residents were deemed to be eligible to be assessed by attending physicians, nurses and allied professionals. Residents were blind to which rotation they would be evaluated on and which specific healthcare professionals were assessing them. To be eligible for inclusion in the final statistical analysis, a resident must have received a minimum of six (n = 6) assessments from at least two members in each rater group. In total, six (n = 6) residents met this requirement and were incorporated in the statistical analysis. The six residents were considered representative of the resident population of which we sampled from as the residents comprised at least four different medical disciplines covering each of the post-graduate years (PGY 1 – 5). Table 1 demonstrates the distribution of raters for each resident.
Physicians, nurses and allied health professionals were recruited from the participating medical/surgical units to assess their respective residents on Collaborator competencies using the modified ICAR. Physicians were recruited based on the specific resident’s recommendations depending on which physicians they interacted with on their rotation. All nursing and allied health professionals on participating units were invited to complete an ICAR. Individuals from both groups were excluded if they missed at least one of the four weeks of the resident’s rotation. Division managers provided names and shift schedules for participating nurses and allied health professionals.
A cover page to the modified ICAR collected information on the descriptive characteristics of each rater, including: profession, gender, years of experience in profession, years of experience in current medical/surgical unit, frequency of interaction, and type of interaction. A direct interaction was defined as a ‘face-to-face or phone conversation’, while an indirect interaction was defined as ‘contact through chart notes, orders, or requests; discharge planning; hearing from other colleagues; or hearing from patient or family’. Descriptive characteristic variables for ‘years of experience in profession’, ‘years of experience in current medical/surgical unit’, and ‘frequency of interaction’ were transformed into new binary variables to allow adequate sample sizes for statistical analysis. Descriptive characteristics for each rater group and the distribution of raters per group across residents were compared using Pearson’s Chi-Square test.
Missing data ranged from 0% to 26.5% across the 17 items. All missing data in quantitative variables was replaced using a single imputation stochastic regression method . This method imputed an individual missing value from the data set using the rater mean, item mean, grand mean, and a random error term.
Comparison of overall ICAR score was analyzed using one-way ANOVA for rater groups and the remaining binary descriptive characteristics. To determine the effect of independent variables on overall ICAR score, two-way repeated-measures ANOVA was utilized to test for within-subject and between-subject main effects and interactions across the 17 items of the ICAR between residents. A summary package of both quantitative and qualitative data was provided to the six residents involved in the analysis.
Ethics approval was received from the Interdisciplinary Committee on Ethics in Health Research (ICEHR), Memorial University of Newfoundland.
Results and discussion
One hundred and five (n = 105) raters initially consented to participate. Of these, 80 raters completed an ICAR assessment form for a 76.2% response rate. One hundred and fifty-five (n = 155) assessments were completed indicating that each rater completed, on average, 1.94 (or ~2) ICAR assessments. The subsequent analysis was based on the completed 155 ICAR assessments of the six residents receiving at least two assessments per rater group.
Of the three participating professional groups, nurses and allied health professions had near equal response rates of 75.0% (n = 57) and 75.2% (n = 13) respectively. Physicians were found to have the highest response rate of 90.9% (n = 10). There was no significant difference in response rates between rater groups (χ2 = 0.19, df = 2, p = .909).
Table 1 summarizes the distribution of rater groups across residents. There was no significant difference in the proportion of raters per resident (χ2 = 13.412, df = 10, p = .202) with per resident raters ranging from 19 – 37. The ranges within rater groups across residents were: 2 – 5 physicians; 10 – 30 nurses; and 4 – 5 allied health professionals.
Table 2 summarizes the background characteristics of the rater groups. Nurses completed the majority of assessments (n = 107, 69.0%), followed by allied health professionals (n = 26, 16.8%), and physicians (n = 22, 14.2%). Females completed 81.3% (n = 126) of the total assessments. There were significant (p < .001) differences in the gender of participants from each rater group; male physicians (81.8%), female nurses (92.5%), and female allied health professionals (88.4%). There were more assessments completed by raters with at least 10 years of professional experience (60.0%) and in their current unit (55.5%). As well, the majority (65.8%) of assessments were completed by raters who reported at least one resident interaction per day.
A paired samples t-test analysis of missing data revealed a significant reduction between the MSF field test study using the ICAR and the initial pilot study, 8.8% vs. 13.1% respectively, p = .032 (Table 3). The final two items of the ICAR, #16 and #17 – both under the Conflict Management/Resolution domain – were found to have the highest percent of missing data in both the pilot and field test studies, averaging 22.3% and 40.6% respectively. The difference between means and standard deviation (SD) in the new and original dataset was −0.05 (6.30 vs. 6.25) and −0.04 (1.49 vs. 1.45) respectively. This result suggests that the replacement of missing data was successful in maintaining the validity of the data set and could be used for further analysis.
Table 4 summarizes internal consistency analyses of the modified ICAR and the associated domains of the instrument. An overall Cronbach’s alpha coefficient of α = .981 revealed high internal consistency reliability. Each domain also demonstrated high internal consistency, ranging between .881 - .963. Due to the high internal consistency of the domains, the overall ICAR scores used in further analysis were the sum of all 17 items from the six domains.
Results of the ANOVA for determining which independent, or descriptive, variables of the rater’s background characteristics affected resident overall ICAR score are summarized in Table 5. The profession of the rater yielded no significant effect with a very small effect size (F 2,5 = 1.225, p = .297, η2 = .016). The only significant, main-effect on overall ICAR score was found to be the gender of the rater (F 1,5 = 7.184, p = .008, η2 = .045) providing a moderate effect size constituting 4.5% of the variance. Female raters scored residents significantly lower than male raters (6.12 v. 6.82). Figure 1 depicts the significant difference between male and female rater overall ICAR scores.
A further gender effect was found through two-way repeated measures ANOVA analysis that revealed a significant interaction effect (F = 1.911, p = .021, η2 = .013) of rater gender across the 17 item scores. Figure 2 depicts the overall ICAR scores across the 17 items for male and female raters. It is important to note that there was no interaction effect between resident gender and rater gender, p = .359.
A final analysis underscoring the effect gender played on overall ICAR score utilized logistic regression. The analysis revealed that rater gender was the only significant predictor of overall ICAR score. Male raters were 3.08 times more likely than female raters to provide an overall ICAR score of above 6.0 (p = .013) and 3.28 times more likely to score above 7.0 (p = .005).
A significant interaction effect resulted from a two-way repeated measures ANOVA analysis involving the frequency of interaction between raters and residents across items (F = 2.103, p = .025, η 2 = .014). The post-hoc analysis revealed effect was due to items #5, 6 and 7 of the ICAR (all comprising the ‘Collaborator’ domain). Figure 3 depicts the overall ICAR score for items #5, #6, and #7 being scored lower by raters who interacted with residents less than once per shift.
A significant difference was found between overall ICAR scores for each of the 17 items. Analysis revealed there was a significant main effect on the means of the individual 17 items (F = 2.79, p = .002, η 2 = .02), indicating a small effect size accounting for 2% of the total variance. However, rater groups did not differ in their scores across items as indicated by a non-significant interaction effect (F = 0.807, p = .713, η 2 = .012).
Finally, qualitative data was recorded to supplement the quantitative scores. A summary package of both quantitative and qualitative data was provided to the six residents involved in the analysis. The qualitative data illustrates the variety of feedback that can be received. For example one resident received positive (Good communication skills. Able to amalgamate clinical knowledge to these scenarios), neutral (Very rare for this specific rater to have interactions with this resident. But when on unit, zero problems or issues with collaboration), and negative (Would rather do anything but listen to suggestions of those he feels are below him. A smart student but should be more respectful of the interdisciplinary team) feedback.
The overall response rate (76.2%) in the field test of the modified ICAR in a MSF assessment process was generally high for all rater groups; ranging from 75.0% to 90.9%. This result reflects the upper end of response rates reported in the literature regarding MSF feedback which ranges from 36%  to 95% . This response rate suggests that the use of the ICAR in a MSF process with post-graduate residents may be a viable option to assess Collaboration competencies. The modified ICAR also demonstrated high internal consistency reliability for the overall ICAR score and each of the domains (α = .881 - .963). A reduction in missing data between the pilot and field test of the modified ICAR suggests that prolonged observation periods may be needed for adequate assessment of Collaborator competencies. Items in the ‘Conflict Resolution and Management’ domain also demonstrated a high proportion of missing data despite the extended observation period in the field test. Conflict resolution and management skills may be more challenging competencies to assess through direct observation, particularly if the work environment is well functioning and highly productive.
Analysis of overall ICAR scores revealed no significant differences between physicians, nurses, and allied health professionals. This finding tends to support the inter-rater reliability of the modified ICAR form and its use in a MSF assessment process. This result may also counter claims that non-physician medical staff are unable to provide reliable observations of non-medical expert roles such as Collaborator competencies. In earlier work, Rezler et al.  reported that some residents had questioned whether nurses or allied health professionals had the ability to evaluate them adequately and Canavan et al.  found that feedback from nursing and allied health professionals was overly positive and useful to enhancing performance improvement in particular areas. The MSF literature suggests that MSF is best used as a form of formative assessment and feedback .
The analysis did reveal significant findings with respect to the gender of the rater, but not the gender of the resident. Male raters tended to rate residents more highly than female raters. It is difficult to infer from these results whether female raters had higher expectations (e.g., score lower) than males with respect to the Collaborator competencies of the residents. Earlier work has indicated significant differences in gender attitudes towards interprofessional healthcare teams ,. Ostroff et al.  has also examined the predictive ability of background characteristics on the score an individual would receive and found that male raters tended to be over-estimators of an individual’s performance. Analysis of the gender of residents did not yield a significant difference in overall mean ICAR score which is contrary to other findings. Previous research has suggested that female medical learners score higher than their fellow male students -.
The qualitative data demonstrated rich value for the medical learners as they were able to not only see their collaborative abilities as a number but also how it affected the team they worked with. The anonymous feedback provided a variety of constructive feedback from positive, neutral, and negative responses from which the learner can reflect on. The participating residents were quite appreciative to receive the qualitative feedback.
The main limitations of the study were that it was conducted in a single institution and on only four medical units. The sample sizes of physicians and allied health professionals were also low. It was not possible to recruit the adequate participants to meet the criteria denoted by Donnon et al.  of 41 participants: 8 physicians, 8 coworkers, and 25 patients given the time limitation and the non-participation of patients. There was also an uneven distribution of resident gender and residents indicated which physicians were appropriate to assess them.
The study findings suggest that the use of the modified ICAR form in a MSF assessment process could be a feasible assessment approach to providing formative feedback to post-graduate medical residents on Collaborator competencies. There were no significant differences in the overall mean ICAR score between three interprofessional rater groups across three different medical units. The experience level of the rater and the frequency of interaction with the resident also had no significant effect on the overall ICAR score. Qualitative data demonstrated the array of feedback that can be provided to learners, which was appreciated by the participants.
Continuing medical education
Interprofessional collaboration assessment rubric
The CanMEDS 2005 physician competency framework. Better standards. Better physicians. Better care. 2005, The Royal College of Physicians and Surgeons of Canada, Ottawa, ON
Verma S, Paterson M, Medves J: Core competencies for health care professionals: What medicine, nursing, occupational therapy, and physiotherapy share. Journal of Allied Health. 2006, 35 (2): 109-115.
Bandiera G, Sherbino J, Frank JR: The CanMEDS assessment tools handbook. An introductory guide to assessment methods for the CanMEDS competencies. 2006, The Royal College of Physician and Surgeons of Canada (RCPSC), Ottawa
Massagli TL, Carline JD: Reliability of a 360-degree evaluation to assess resident competence. American Journal of Physical Medicine & Rehabilitation. 2007, 86 (10): 845-852. 10.1097/PHM.0b013e318151ff5a.
Overeem K, Wollersheim H, Driessen E, Lombarts K, van de Ven G, Grol R, Arah O: Doctors' perceptions of why 360-degree feedback does (not) work: A qualitative study. Medical Education. 2009, 43 (9): 874-882. 10.1111/j.1365-2923.2009.03439.x.
Ogunyemi D, Gonzalez G, Fong A, Alexander C, Finke D, Donnon T, Azziz R: From the eye of the nurses: 360-degree evaluation of residents. Journal of Continuing Education in the Health Professions. 2009, 29 (2): 105-110. 10.1002/chp.20019.
Stark R, Korenstein D, Karani R: Impact of a 360-degree professionalism assessment on faculty comfort and skills in feedback delivery. Journal of General Internal Medicine. 2008, 23 (7): 969-972. 10.1007/s11606-008-0586-0.
Davis MH, Ponnamperuma CG, Wall D: Workplace-based assessment. A Practical Guide for Medical Teachers. Edited by: Dent JA, Harden RM. 2009, Elsevier, Edinburgh, UK
Norcini J, Burch V: Workplace-based assessment as an educational tool: AMEE guide no. 31. Medical Teacher. 2007, 29: 855-871. 10.1080/01421590701775453.
Lockyer JM, Clyman SG: Multisource feedback (360-degree evaluation). Practical Guide to the Evaluation of Clinical Competence. Edited by: Holmboe ES, Hawkins RE. 2008, Mosby, Inc, Philadelphia, PA
Sargeant J, Mann K, Ferrier S: Exploring family physicians’ reactions to multi-source feedback: Perceptions of credibility and usefulness. Medical Education. 2005, 39 (5): 497-504. 10.1111/j.1365-2929.2005.02124.x.
Lockyer J: Multisource feedback in the assessment of physician competencies. Journal of Continuing Education in the Health Professions. 2003, 23 (1): 4-12. 10.1002/chp.1340230103.
Garra G, Wackett A, Thode H: Feasibility and reliability of a multisource feedback tool for emergency medicine residents. Journal of Graduate Medical Education. 2011, 3 (3): 356-360. 10.4300/JGME-D-10-00173.1.
Warm EJ, Schauer D, Revis B, Boex JR: Multisource feedback in the ambulatory setting. Journal of Graduate Medical Education. 2010, 2 (2): 269-277. 10.4300/JGME-D-09-00102.1.
Joshi R, Ling F, Jaeger J: Assessment of a 360-degree instrument to evaluate residents' competency in interpersonal and communication skills. Academic Medicine. 2004, 79 (5): 458-463. 10.1097/00001888-200405000-00017.
Lockyer JM, Violato C, Fidler H, Alakija P: The assessment of pathologists/ laboratory medicine physicians through a multisource feedback tool. Archives of Pathology & Laboratory Medicine. 2009, 133: 1301-1308.
Violato C, Lockyer JM, Fidler H: Assessment of psychiatrists in practice through multisource feedback. Canadian Journal of Psychiatry. 2008, 53 (8): 525-533.
Campbell JL, Roberts M, Wright C, Hill J, Greco M, Taylor M, Richards S: Factors associated with variability in the assessment of UK doctors’ professionalism: Analysis of survey results. British Medical Journal. 2011, 27 (343): d6212-10.1136/bmj.d6212.
Wood L, Hassell A, Whitehouse A, Bullock A, Wall D: A literature review of multi-source feedback systems within and without health services, leading to 10 tips for their successful design. Medical Teacher. 2006, 28: e185-e191. 10.1080/01421590600834286.
Donnon T, Al Ansari A, Al Alawi S, Violato C: The reliability, validity, and feasibility of multisource feedback physician assessment: a systematic review. Academic Medicine. 2014, 89 (3): 511-516. 10.1097/ACM.0000000000000147.
Al Ansari A, Donnon T, Al K, Darwish A, Violato C: The construct and criterion validity of the multi-source feedback process to assess physician performance: a meta-analysis. Advances in Medical Education and Practice. 2014, 5: 39-51. 10.2147/AMEP.S57236.
Wood J, Collins J, Burnside ES, Albanese MA, Propeck PA, Kelcz F, Spilde JM, Schmaltz LM: Patient, faculty, and self-assessment of radiology resident performance: A 360-degree method of measuring professionalism and interpersonal/communication skills. Academic Radiology. 2004, 11 (8): 931-939.
Hammick M, Freeth D, Koppel I, Reeves S, Barr H: A best evidence systematic review of interprofessional education: BEME guide no. 9. Medical Teacher. 2007, 29: 735-751. 10.1080/01421590701682576.
Johnson D, Cujec B: Comparison of self, nurse, and physician assessment of residents rotating through an intensive care unit. Critical Care Medicine. 1998, 26 (11): 1811-1816. 10.1097/00003246-199811000-00020.
Risucci DA, Tortolani AJ, Ward RJ: Ratings of surgical residents by self, supervisors and peers. Journal of Surgery, Gynecology and Obstetrics. 1989, 169 (6): 519-526.
Curran VR, Hollet A, Casimiro LM, McCarthy P, Banfield VS, Hall P: Development and validation of the interprofessional collaborator assessment rubric (ICAR). Interprofessional Care. 2011, 25: 339-344. 10.3109/13561820.2011.589542.
Enders CK: Applied Missing Data Analysis. 2010, Guildford Press, New York, NY
Hill JJ, Ansprey A, Richards SH, Campbell JL: Multisource feedback questionnaires in appraisal and for revalidation: A qualitative study in UK general practice. British Journal of General Practice. 2012, 62 (598): e314-e321. 10.3399/bjgp12X641429.
Rezler AG, Bruce NC, Schmitt BP: Dilemmas in the evaluation of residents. Proceedings of the Annual Conference on Resident Medical Education. 1986, 25: 371-378.
Canavan C, Holtman MC, Richmond M, Katsufrakis PJ: The quality of written comments on professional behaviors in a developmental multisource feedback program. Academic Medicine. 2010, 85: S106-S109. 10.1097/ACM.0b013e3181ed4cdb.
Curran VR, Sharpe D, Forristall J: Attitudes of health sciences faculty towards interprofessional teamwork and education. Medical Education. 2007, 41: 892-896. 10.1111/j.1365-2923.2007.02823.x.
Curran VR, Sharpe D, Forristall J, Flynn K: Attitudes of health sciences students towards interprofessional teamwork and education. Learning in Health and Social Care. 2008, 7 (3): 146-156. 10.1111/j.1473-6861.2008.00184.x.
Ostroff C, Atwater LE, Feinberg BJ: Understanding self-other agreement: A look at rater and ratee characteristics, context, and outcomes. Personnel Psychology. 2004, 57: 333-375. 10.1111/j.1744-6570.2004.tb02494.x.
Day SC, Norcini JJ, Shea JA, Benson JA: Gender differences in the clinical competence of resident in internal medicine. Journal of General Internal Medicine. 1989, 4: 309-312. 10.1007/BF02597403.
Kaplan CB, Centor RM: The use of nurses to evaluate houseofficers' humanistic behavior. Journal of General Internal Medicine. 1990, 5 (5): 410-414. 10.1007/BF02599428.
Smith CJ, Rodenhuaser P, Markert RJ: Gender bias of Ohio physicians in the evaluation of the personal statements of residency applicants. Academic Medicine. 1991, 66 (8): 479-481. 10.1097/00001888-199108000-00014.
Rand VE, Hudes ES, Browner WS, Wachter RM, Avins AL: Effect of evaluator and resident gender on the American board of internal medicine evaluation scores. Journal of General Internal Medicine. 1998, 13: 670-674. 10.1046/j.1525-1497.1998.00202.x.
Wiskin CMD, Allan TF, Skelton JR: Gender as a variable in the assessment of final year degree-level communication skills. Medical Education. 2004, 38: 129-137. 10.1111/j.1365-2923.2004.01746.x.
The authors would like to acknowledge the voluntary participation of physician, nurse and allied health professionals of Eastern Regional Integrated Health Authority in the study as raters, and the voluntary participation of post-graduate medical residents.
The authors declare that they have no competing interests.
MH was research lead in data collection and analysis and first draft author. VC created the assessment tool and provided guidance throughout study and helped draft the final manuscript. BC was MSc research supervisor and co-conceived the study design. HS provided statistical advice and draft revisions. SM co-conceived study design and provided revision edits throughout. All authors read and approved the final manuscript.