Article (bolded authors claimed evidence of bias) | Description | Main findings | Diversity conclusions | Strengths/limitations | MERSQI Scorea (11.3/18 over all articles) |
---|---|---|---|---|---|
Canada (1 article) | 9 | ||||
MacLellan et al. (2010) [25] | Compared IMG and DMG performance on in- and end-training exams | End-training exam pass rate: IMG 56% versus DMG 93.5% (p < .0001) | IMG: IMG low pre-selection scores consistent with low pass rates on certification exams | Strengths: Multiple year, large sample Limitations: Exploratory, single program, single specialty | 9 |
UK (7 articles) | 15.2 | ||||
Esmail et al. (2013) [26] | Compared IMG with DMG performance on end-training exams (GP/Family medicine) | URM failed first attempt more than white DMG (OR 3.5, p < .001) IMG failed first attempt more than white DMG (OR 14.7, p < .001) | URM/IMG: Higher failure rates in domestic and foreign URM/IMG are partly explained by lower pre-selection academic achievement, and may also reflect bias during clinical OSCE-based exams | Strengths: Complete cohort, large sample, multiple years, end-training outcome Limitations: Exploratory, single specialty | 15.8 |
McManus et al. (2014) [27] | Compared IMG with DMG performance on end-training exams (GP/Family medicine & Internal medicine) | IMG performed worse than DMG on end-training exams (~ 1.25 SD) | IMG: Lower pre-selection scores are an accurate measure of suitability for training Raising cutoffs is needed for equivalence with DMG but would affect workforce | Strengths: Follow-up study, multiple programs, large sample, multiple years Limitations: Two specialties | 15.2 |
Patterson et al. (2018) [28] | Measured factors associated with differences in performance of IMG and DMG on end-training exams (GP/Family medicine) | Clinical skill performance better predicted by SJT than CPST (beta 0.26 v 0.17) SJT mediated relationship between English fluency and clinical skills performance | IMG: IMG performance on end-of-training exams is predicted by socio-linguistic factors not clinical knowledge and skills | Strengths: National cohort, large sample, multiple years, end-of-training follow-up Limitations: Exploratory study, single specialty | 14.6 |
Tiffin et al. (2014) [29] | Measure IMG performance during residency | IMG more likely to receive unsatisfactory ARCP than DMG (OR 1.63, p < .05) | IMG: PLAB language exam does not establish linguistic equivalence of IMG and DMG Thresholds would need to be increased to achieve equivalence, but would affect workforce and decrease diversity | Strengths: National cohort, large sample Limitations: | 14.6 |
Tiffin et al. (2018) [30] | Measure bias against IMG in resident selection comparing pre-training academic attainment with in-training assessment | UK overseas graduates more likely deemed appointable than IMG (OR 1.29, p < .05) but more likely to later receive less satisfactory ARCP (OR 1.20, p < .05) | IMG: Bias favouring UK born graduates trained overseas versus IMGs may be due to excessive weight given to interview | Strengths: National cohort, large sample, all specialties, Limitations: Incomplete data set | 15.8 |
Wakeford et al. (2015) [31] | Measure correlation between GP/Family medicine and Internal medicine exam performance by ethnicity | High correlation between GP/IM exam performance, suggesting validity of each assessment (and does not suggest bias against URM) URM performed less well | URM: No evidence of bias against URM; differences in assessment likely to reflect true differences in ability | Strengths: National cohort, multiple years, large sample Limitations: Exploratory, two specialties | 15.8 |
Woolf et al. (2019) [32] Identified by specific search terms | Measure effect of gender on specialty training selection | Across all specialties female applicants had: • No difference in applications • Increased offers (OR 1.4, p < .001) • Increased acceptance (OR 1.43, p < .001) 2 specialties had significant gender differences in applications (both favouring women): • Paediatrics (OR 1.57, p < .05) • GP (OR 1.23, p < .05) | Gender: Gender segregation in specialties is due to differential application rates, not instrument bias; research is needed on why men are less likely to apply for GP/Paediatric training, and less likely to accept GP training if offered | Strengths: Follow-up study, national cohort, large sample, multiple specialties Limitations: 1–2 years intake, incomplete data set | 14.6 |
US (27 articles) | 10.4 | ||||
Aisen et al. (2018) [33] Identified by specific search terms | Examine effect of gender on urology applicant academic achievement and selection into specialty | Higher % of males matched (73% v 67%) Among matched applicants: • Males less honors (2.8 v 2.2, p < .021) • Males higher USMLE1 (245.9 v 240.8, p < .001) | Gender: Male/Female candidates had similar pre-selection results and no evidence of bias in selection | Strengths: Moderate size Limitations: Exploratory, single program, single specialty, 1–2 years intake | 11.3 |
Brandt et al. (2013) [34] | Examine effect of gender on O&G applicant academic achievements and selection into specialty | No gender difference on USMLE Females more likely to have honors (51% v 41%, p < .021) and published (87% v 79%, p < .01) | Gender: Male/Female candidates had similar USMLE1 scores, higher female honors may explain lower rate of M applications for O&G training | Strengths: Large sample, multiple years Limitations: Exploratory, single program, single specialty, incomplete data set | 11.3 |
Chapman et al. (2019) [35] | Identify factors associated with under-representation of women across medical specialties | Female representation higher in specialties with lower mean USMLE1 entry score (p < .017) 1% increase in female faculty prevalence associated with 1.45% increase in female trainees in specialty (p < .001) | Gender: No evidence of USMLE 1 bias against females Association between female faculty and female trainees suggests mentoring may increase diversity | Strengths: National cohort, large sample, all specialties Limitations: Exploratory, 1–2 years intake, incomplete data set | 9 |
De Oliveira et al. (2012) [36] Identified by specific search terms | Measure factors associated with selection to anaesthetics residency including gender, age, country of training | Factors associated with selection: • Female • Younger • Higher USMLE 2 • DMG | Gender/Age: Bias favouring selection of female and younger applicants | Strengths: Large sample Limitations: Exploratory, single program, single specialty, 1–2 years intake, inferences made without statistical test | 12.4 |
Dirschl et al. (2006) [37] Identified by specific search terms | Measure whether gender and academic scores can predict orthopaedic end-of-training exams | 12.5% female applicants Faculty ratings of training were not associated with academic scores | Gender: No gender bias detected | Strengths: Follow-up study, large sample, multiple years Limitations: Single program, single specialty | 9 |
Driver et al. (2014) [38] | Identify factors associated with ophthalmology selection including IMG status | Increased % of selection associated with: • Higher USMLE1 (OR 3.22, p < .05) • Letters of recommendation (OR 6.2, p < .05) • Publications (OR 3, p < .05) | IMG: Design prevented conclusions about bias | Strengths: National cohort, large sample, multiple years Limitations: Exploratory, single specialty | 11.3 |
Durham et al. (2018) [39] | Measure effect of gender on selection into neurosurgical training | 13.8% female applicants USMLE1 higher for selected (233 v 211, p < .001) Females had lower OR of matching (0.59, p < .001) Females had lower mean USMLE1 scores (222 v 230, p < .001) | Gender: USMLE 1 is best predictor of selection Reduced female selection partially explained by lower USMLE 1 scores Possible bias remains after multivariate analysis | Strengths: Statewide cohort, large sample, multiple years Limitations: Exploratory, single specialty | 11.3 |
Edmond et al. (2001) [40] Identified by specific search terms | Measure bias against African Americans due to USMLE 1 in internal medicine residency selection | Mean USMLE1 of African Americans was 200, non-AA was 216 OR for rejection of AA varied from 3 to 6 (p < .05) | Race: USMLE 1 reduces selection of African Americans | Strengths: Large sample Limitations: Exploratory, single program, single specialty, 1–2 years intake, uncontrolled confound | 12.4 |
Filippou et al. (2019) [41] | Measure gender bias in letters of recommendation for urology resident applicants | LoR for males had: • More authentic tone • More references to personal drive, work, and power LoR referring to power more likely to be associated with selection | Gender: Gender bias in letters of recommendation may reduce selection of females | Strengths: Moderate sample Limitations: Exploratory, single program, single specialty, 1–2 years intake | 9 |
French et al. (2019) [42] | Measure gender bias in LoR for general surgery resident applicants | Female authors wrote longer letters | Gender: No gender bias detected in letters of recommendation | Strengths: Large sample, adequate power Limitations: Exploratory, single program, single specialty, 1–2 years intake | 7.9 |
Friedman et al. (2017) [43] | Measure gender bias in standardised versus narrative LoR for otolaryngology surgery residents | No difference in ranking of male/female applicants Female writers produce LoRs different to male writers (p < .05) LoRs written for female applicants less positive than those written for male applicants (p < .05) | Gender: Standardised letters of recommendation have reduced but not eliminated biases that contribute to reduced selection of females | Strengths: Moderate sample Limitations: Exploratory, single program, single specialty, 1–2 years intake | 7.9 |
Gardner et al. (2019) [44] | Measure effect of USMLE cutoffs on underrepresented minorities in general surgery training | Reducing USMLE1 cutoffs and adding SJT screening increased URMs offered interview by 8% | Gender/URM: USMLE 1 screening reduces selection of URMs for interview Does not claim bias | Strengths: Multiple program sample, large sample Limitations: Exploratory, single specialty, 1–2 years intake | 9 |
Girzadas et al. (2004) [45] | Measure effect of gender on SLoR for emergency medicine residency | Female author with female applicant OR 2 to get highest ranking on LoR (p = .023) | Gender: No gender bias detected in letters of recommendation | Strengths: Large sample Limitations: Exploratory, single program, single specialty, 1–2 years intake, selection process changed during study | 7.9 |
Hewett et al. (2016) [16] | Measure gender bias in radiology residency selection | 24% female applicants Females were • 30% of offered interviews • 38% of top quartile (p < .001) • 25% of selected Female applicants average USMLE1 score was 5 points lower (p < .05) Female applicants had higher mean interview scores (p < .05) | Gender: Bias favouring female applicants Associated with lower female USMLE1 scores Associated with higher female interview scores | Strengths: Multiple years intake, large sample Limitations: Exploratory, single program, single specialty, variable selection/scoring methods | 11.3 |
Hoffman et al. (2020) [46] | Measure gender bias in LoR for pediatric surgery residency selection | Female LoR had more communal phrases (p < .01) | Gender: Gender biases against females in LoRs may affect selection into training | Strengths: Multiple years intake Limitations: Exploratory, single program, single specialty, small sample, ad-hoc measures | 7.9 |
Hoffman et al. (2019) [47] | Measure gender bias in LoR for transplant surgery resident applicants | Male applicant LoR had more agentic terms (p < .05) LoR written by senior staff more likely to describe female applicants with communal terms (p < .05) | Gender: Gender biases in LoRs against females may affect selection into training | Strengths: Moderate sample size, multiple years intake Limitations: Exploratory study, single program, single specialty, limited power | 7.9 |
Hopson et al. (2019) [48] Identified by specific search terms | Measure influence of gender on outcome of emergency medicine selection interviews | No significant difference on standardised video interview | Gender: No gender bias detected on standardised video interview | Strengths: Multiple program cohort, large sample size, adequate power reported Limitations: Exploratory study, single specialty, 1–2 years intake, aggregates heterogenous groups, ad-hoc measures | 10.1 |
Kobayashi et al. (2019) [49] | Measure influence of gender on LoR in orthopaedic surgery residency | Female applicants had: • Longer LoR (p < .003) • More “achieve” words (p < .0001) No differences for male v female authors | Gender: No gender bias detected on letters of recommendation | Strengths: Large sample Limitations: Exploratory study, single program, single specialty, 1–2 years intake, ad-hoc measures | 11.3 |
Lin et al. (2019) [50] | Measure gender bias in LoR for ophthalmology residency | M/F applicants had similar: • USMLE1 • Academic achievement LoR for male applicants had: • Less feel words (p < 041) • Less biological words (p < .028) | Gender: Gender biases in LoRs against females may affect selection into training | Strengths: Moderate sample size Limitations: Exploratory, single program, single specialty, 1–2 years intake, ad-hoc measures | 11.3 |
Lypson et al. (2010) [51] Identified by specific search terms | Measure correlation between USMLE scores and clinical competence at beginning of residency across specialties | USMLE1 scores lower for URM (212 v 230, p < .001) URM not significantly worse than non-URM on OSCE stations at beginning of residency | URM: USMLE 1 scores are biased against URMs, revealed by similar OSCE scores at beginning of residency | Strengths: Multiple specialties, multiple years intake Limitations: Exploratory, single program, small sample, limited power | 7.9 |
Norcini et al. (2014) [52] | Predict patient outcomes of IMGs from USMLE scores across specialties | Increased USMLE2 CK score associated with decreased mortality as a physician 1 SD on USMLE 2 CK associated with 4% improvement in mortality | IMG: USMLE2 CK scores are a valid measure of suitability for IMG selection/certification | Strengths: Follow-up study, statewide sample, large sample, multiple specialties, multiple years intake, patient outcomes Limitations: Unmeasured confounds | 14.5 |
Poon et al. (2019) [53] Identified by specific search terms | Compare orthopaedic residency enrolment rates and academic metrics of applicants and matriculated residents by race/ethnicity | URM were 29% of applicants and 25% of enrolments White/Asian applicants had higher USMLE1 than Black applicants (234 v 218, p < .05) | URM: USMLE1 screening may contribute to lower rates of application of URMs Bias not evaluated | Strengths: National cohort, large sample, adequate power Limitations: Important variables not measured | 13.5 |
Quintero et al. (2009) [54] | Measure effect of personality similarity to bias the selection of orthopaedic residents | Clinicians rated candidates more favourably when they shared personality characteristics (p = .044) | Personality: Increased awareness of implicit biases may reduce inequity of current selection processes | Strengths: Moderate sample size Limitations: Exploratory, single program, single specialty, 1–2 years intake, limited power, follow-up to selection, protocol variations | 12.4 |
Scherl et al. (2001) [55] | Measure gender bias in orthopaedic resident selection | No significant difference in selection of male and female charts | Gender: No gender bias detected based on gendered versions of applicant charts | Strengths: Experimental design Limitations: Exploratory, single program, small sample, selection bias, partial blinding | 11.3 |
Stain et al. (2013) [56] Identified by specific search terms | Measure attributes of top-ranked applicants to general surgery residency | Males had higher USMLE1 (238 v 230, p < .001) Males/Females had similar USMLE2 scores (245 v 244, p = .54) Highly competitive programs associated with • USMLE1 (RR 1.36) • Publications (RR 2.2) • Asian (RR 1.7 v white) | Gender: No gender bias detected based on pre-selection academic achievements | Strengths: National cohort, moderate sample size Limitations: Single program, single specialty, ad-hoc measures | 12.4 |
Unkart et al. (2016) [57] | Measure reduction in general surgical residency applications among candidates self-identified as “disadvantaged” | URM were: • Older at entry (24 v 23, p < .001) • Lower MCAT (30 v 33, p < .001) • More likely to choose a less competitive specialty (p < .03) | URM/Gender: No bias detected based on USMLE 1 | Strengths: National cohort, multiple years intake, large sample Limitations: Aggregates heterogenous groups, limited follow-up | 12.4 |
Villwock et al. (2019) [58] Identified by specific search terms | Measure effect of STAR tool for selecting otolaryngology residency candidates to interview | USMLE scores significantly increased after STAR tool No differences in gender/URM before/after introduction of STAR selection tool | URM/Gender: STAR selection tool did not increase representation of URM/Gender | Strengths: Moderate sample size Limitations: Single program, exploratory | 7.9 |