Systematic review of specialist selection methods with implications for diversity in the medical workforce

Amos, Andrew James; Lee, Kyungmi; Sen Gupta, Tarun; Malau-Aduli, Bunmi S.

doi:10.1186/s12909-021-02685-w

BMC Medical Education

Table 3 Summary of Reviewed Articles

From: Systematic review of specialist selection methods with implications for diversity in the medical workforce

Article (bolded authors claimed evidence of bias)	Description	Main findings	Diversity conclusions	Strengths/limitations	MERSQI Score^a (11.3/18 over all articles)
Canada (1 article)					9
MacLellan et al. (2010) [25]	Compared IMG and DMG performance on in- and end-training exams	End-training exam pass rate: IMG 56% versus DMG 93.5% (p < .0001)	IMG: IMG low pre-selection scores consistent with low pass rates on certification exams	Strengths: Multiple year, large sample Limitations: Exploratory, single program, single specialty	9
UK (7 articles)					15.2
Esmail et al. (2013) [26]	Compared IMG with DMG performance on end-training exams (GP/Family medicine)	URM failed first attempt more than white DMG (OR 3.5, p < .001) IMG failed first attempt more than white DMG (OR 14.7, p < .001)	URM/IMG: Higher failure rates in domestic and foreign URM/IMG are partly explained by lower pre-selection academic achievement, and may also reflect bias during clinical OSCE-based exams	Strengths: Complete cohort, large sample, multiple years, end-training outcome Limitations: Exploratory, single specialty	15.8
McManus et al. (2014) [27]	Compared IMG with DMG performance on end-training exams (GP/Family medicine & Internal medicine)	IMG performed worse than DMG on end-training exams (~ 1.25 SD)	IMG: Lower pre-selection scores are an accurate measure of suitability for training Raising cutoffs is needed for equivalence with DMG but would affect workforce	Strengths: Follow-up study, multiple programs, large sample, multiple years Limitations: Two specialties	15.2
Patterson et al. (2018) [28]	Measured factors associated with differences in performance of IMG and DMG on end-training exams (GP/Family medicine)	Clinical skill performance better predicted by SJT than CPST (beta 0.26 v 0.17) SJT mediated relationship between English fluency and clinical skills performance	IMG: IMG performance on end-of-training exams is predicted by socio-linguistic factors not clinical knowledge and skills	Strengths: National cohort, large sample, multiple years, end-of-training follow-up Limitations: Exploratory study, single specialty	14.6
Tiffin et al. (2014) [29]	Measure IMG performance during residency	IMG more likely to receive unsatisfactory ARCP than DMG (OR 1.63, p < .05)	IMG: PLAB language exam does not establish linguistic equivalence of IMG and DMG Thresholds would need to be increased to achieve equivalence, but would affect workforce and decrease diversity	Strengths: National cohort, large sample Limitations:	14.6
Tiffin et al. (2018) [30]	Measure bias against IMG in resident selection comparing pre-training academic attainment with in-training assessment	UK overseas graduates more likely deemed appointable than IMG (OR 1.29, p < .05) but more likely to later receive less satisfactory ARCP (OR 1.20, p < .05)	IMG: Bias favouring UK born graduates trained overseas versus IMGs may be due to excessive weight given to interview	Strengths: National cohort, large sample, all specialties, Limitations: Incomplete data set	15.8
Wakeford et al. (2015) [31]	Measure correlation between GP/Family medicine and Internal medicine exam performance by ethnicity	High correlation between GP/IM exam performance, suggesting validity of each assessment (and does not suggest bias against URM) URM performed less well	URM: No evidence of bias against URM; differences in assessment likely to reflect true differences in ability	Strengths: National cohort, multiple years, large sample Limitations: Exploratory, two specialties	15.8
Woolf et al. (2019) [32] Identified by specific search terms	Measure effect of gender on specialty training selection	Across all specialties female applicants had: • No difference in applications • Increased offers (OR 1.4, p < .001) • Increased acceptance (OR 1.43, p < .001) 2 specialties had significant gender differences in applications (both favouring women): • Paediatrics (OR 1.57, p < .05) • GP (OR 1.23, p < .05)	Gender: Gender segregation in specialties is due to differential application rates, not instrument bias; research is needed on why men are less likely to apply for GP/Paediatric training, and less likely to accept GP training if offered	Strengths: Follow-up study, national cohort, large sample, multiple specialties Limitations: 1–2 years intake, incomplete data set	14.6
US (27 articles)					10.4
Aisen et al. (2018) [33] Identified by specific search terms	Examine effect of gender on urology applicant academic achievement and selection into specialty	Higher % of males matched (73% v 67%) Among matched applicants: • Males less honors (2.8 v 2.2, p < .021) • Males higher USMLE1 (245.9 v 240.8, p < .001)	Gender: Male/Female candidates had similar pre-selection results and no evidence of bias in selection	Strengths: Moderate size Limitations: Exploratory, single program, single specialty, 1–2 years intake	11.3
Brandt et al. (2013) [34]	Examine effect of gender on O&G applicant academic achievements and selection into specialty	No gender difference on USMLE Females more likely to have honors (51% v 41%, p < .021) and published (87% v 79%, p < .01)	Gender: Male/Female candidates had similar USMLE1 scores, higher female honors may explain lower rate of M applications for O&G training	Strengths: Large sample, multiple years Limitations: Exploratory, single program, single specialty, incomplete data set	11.3
Chapman et al. (2019) [35]	Identify factors associated with under-representation of women across medical specialties	Female representation higher in specialties with lower mean USMLE1 entry score (p < .017) 1% increase in female faculty prevalence associated with 1.45% increase in female trainees in specialty (p < .001)	Gender: No evidence of USMLE 1 bias against females Association between female faculty and female trainees suggests mentoring may increase diversity	Strengths: National cohort, large sample, all specialties Limitations: Exploratory, 1–2 years intake, incomplete data set	9
De Oliveira et al. (2012) [36] Identified by specific search terms	Measure factors associated with selection to anaesthetics residency including gender, age, country of training	Factors associated with selection: • Female • Younger • Higher USMLE 2 • DMG	Gender/Age: Bias favouring selection of female and younger applicants	Strengths: Large sample Limitations: Exploratory, single program, single specialty, 1–2 years intake, inferences made without statistical test	12.4
Dirschl et al. (2006) [37] Identified by specific search terms	Measure whether gender and academic scores can predict orthopaedic end-of-training exams	12.5% female applicants Faculty ratings of training were not associated with academic scores	Gender: No gender bias detected	Strengths: Follow-up study, large sample, multiple years Limitations: Single program, single specialty	9
Driver et al. (2014) [38]	Identify factors associated with ophthalmology selection including IMG status	Increased % of selection associated with: • Higher USMLE1 (OR 3.22, p < .05) • Letters of recommendation (OR 6.2, p < .05) • Publications (OR 3, p < .05)	IMG: Design prevented conclusions about bias	Strengths: National cohort, large sample, multiple years Limitations: Exploratory, single specialty	11.3
Durham et al. (2018) [39]	Measure effect of gender on selection into neurosurgical training	13.8% female applicants USMLE1 higher for selected (233 v 211, p < .001) Females had lower OR of matching (0.59, p < .001) Females had lower mean USMLE1 scores (222 v 230, p < .001)	Gender: USMLE 1 is best predictor of selection Reduced female selection partially explained by lower USMLE 1 scores Possible bias remains after multivariate analysis	Strengths: Statewide cohort, large sample, multiple years Limitations: Exploratory, single specialty	11.3
Edmond et al. (2001) [40] Identified by specific search terms	Measure bias against African Americans due to USMLE 1 in internal medicine residency selection	Mean USMLE1 of African Americans was 200, non-AA was 216 OR for rejection of AA varied from 3 to 6 (p < .05)	Race: USMLE 1 reduces selection of African Americans	Strengths: Large sample Limitations: Exploratory, single program, single specialty, 1–2 years intake, uncontrolled confound	12.4
Filippou et al. (2019) [41]	Measure gender bias in letters of recommendation for urology resident applicants	LoR for males had: • More authentic tone • More references to personal drive, work, and power LoR referring to power more likely to be associated with selection	Gender: Gender bias in letters of recommendation may reduce selection of females	Strengths: Moderate sample Limitations: Exploratory, single program, single specialty, 1–2 years intake	9
French et al. (2019) [42]	Measure gender bias in LoR for general surgery resident applicants	Female authors wrote longer letters	Gender: No gender bias detected in letters of recommendation	Strengths: Large sample, adequate power Limitations: Exploratory, single program, single specialty, 1–2 years intake	7.9
Friedman et al. (2017) [43]	Measure gender bias in standardised versus narrative LoR for otolaryngology surgery residents	No difference in ranking of male/female applicants Female writers produce LoRs different to male writers (p < .05) LoRs written for female applicants less positive than those written for male applicants (p < .05)	Gender: Standardised letters of recommendation have reduced but not eliminated biases that contribute to reduced selection of females	Strengths: Moderate sample Limitations: Exploratory, single program, single specialty, 1–2 years intake	7.9
Gardner et al. (2019) [44]	Measure effect of USMLE cutoffs on underrepresented minorities in general surgery training	Reducing USMLE1 cutoffs and adding SJT screening increased URMs offered interview by 8%	Gender/URM: USMLE 1 screening reduces selection of URMs for interview Does not claim bias	Strengths: Multiple program sample, large sample Limitations: Exploratory, single specialty, 1–2 years intake	9
Girzadas et al. (2004) [45]	Measure effect of gender on SLoR for emergency medicine residency	Female author with female applicant OR 2 to get highest ranking on LoR (p = .023)	Gender: No gender bias detected in letters of recommendation	Strengths: Large sample Limitations: Exploratory, single program, single specialty, 1–2 years intake, selection process changed during study	7.9
Hewett et al. (2016) [16]	Measure gender bias in radiology residency selection	24% female applicants Females were • 30% of offered interviews • 38% of top quartile (p < .001) • 25% of selected Female applicants average USMLE1 score was 5 points lower (p < .05) Female applicants had higher mean interview scores (p < .05)	Gender: Bias favouring female applicants Associated with lower female USMLE1 scores Associated with higher female interview scores	Strengths: Multiple years intake, large sample Limitations: Exploratory, single program, single specialty, variable selection/scoring methods	11.3
Hoffman et al. (2020) [46]	Measure gender bias in LoR for pediatric surgery residency selection	Female LoR had more communal phrases (p < .01)	Gender: Gender biases against females in LoRs may affect selection into training	Strengths: Multiple years intake Limitations: Exploratory, single program, single specialty, small sample, ad-hoc measures	7.9
Hoffman et al. (2019) [47]	Measure gender bias in LoR for transplant surgery resident applicants	Male applicant LoR had more agentic terms (p < .05) LoR written by senior staff more likely to describe female applicants with communal terms (p < .05)	Gender: Gender biases in LoRs against females may affect selection into training	Strengths: Moderate sample size, multiple years intake Limitations: Exploratory study, single program, single specialty, limited power	7.9
Hopson et al. (2019) [48] Identified by specific search terms	Measure influence of gender on outcome of emergency medicine selection interviews	No significant difference on standardised video interview	Gender: No gender bias detected on standardised video interview	Strengths: Multiple program cohort, large sample size, adequate power reported Limitations: Exploratory study, single specialty, 1–2 years intake, aggregates heterogenous groups, ad-hoc measures	10.1
Kobayashi et al. (2019) [49]	Measure influence of gender on LoR in orthopaedic surgery residency	Female applicants had: • Longer LoR (p < .003) • More “achieve” words (p < .0001) No differences for male v female authors	Gender: No gender bias detected on letters of recommendation	Strengths: Large sample Limitations: Exploratory study, single program, single specialty, 1–2 years intake, ad-hoc measures	11.3
Lin et al. (2019) [50]	Measure gender bias in LoR for ophthalmology residency	M/F applicants had similar: • USMLE1 • Academic achievement LoR for male applicants had: • Less feel words (p < 041) • Less biological words (p < .028)	Gender: Gender biases in LoRs against females may affect selection into training	Strengths: Moderate sample size Limitations: Exploratory, single program, single specialty, 1–2 years intake, ad-hoc measures	11.3
Lypson et al. (2010) [51] Identified by specific search terms	Measure correlation between USMLE scores and clinical competence at beginning of residency across specialties	USMLE1 scores lower for URM (212 v 230, p < .001) URM not significantly worse than non-URM on OSCE stations at beginning of residency	URM: USMLE 1 scores are biased against URMs, revealed by similar OSCE scores at beginning of residency	Strengths: Multiple specialties, multiple years intake Limitations: Exploratory, single program, small sample, limited power	7.9
Norcini et al. (2014) [52]	Predict patient outcomes of IMGs from USMLE scores across specialties	Increased USMLE2 CK score associated with decreased mortality as a physician 1 SD on USMLE 2 CK associated with 4% improvement in mortality	IMG: USMLE2 CK scores are a valid measure of suitability for IMG selection/certification	Strengths: Follow-up study, statewide sample, large sample, multiple specialties, multiple years intake, patient outcomes Limitations: Unmeasured confounds	14.5
Poon et al. (2019) [53] Identified by specific search terms	Compare orthopaedic residency enrolment rates and academic metrics of applicants and matriculated residents by race/ethnicity	URM were 29% of applicants and 25% of enrolments White/Asian applicants had higher USMLE1 than Black applicants (234 v 218, p < .05)	URM: USMLE1 screening may contribute to lower rates of application of URMs Bias not evaluated	Strengths: National cohort, large sample, adequate power Limitations: Important variables not measured	13.5
Quintero et al. (2009) [54]	Measure effect of personality similarity to bias the selection of orthopaedic residents	Clinicians rated candidates more favourably when they shared personality characteristics (p = .044)	Personality: Increased awareness of implicit biases may reduce inequity of current selection processes	Strengths: Moderate sample size Limitations: Exploratory, single program, single specialty, 1–2 years intake, limited power, follow-up to selection, protocol variations	12.4
Scherl et al. (2001) [55]	Measure gender bias in orthopaedic resident selection	No significant difference in selection of male and female charts	Gender: No gender bias detected based on gendered versions of applicant charts	Strengths: Experimental design Limitations: Exploratory, single program, small sample, selection bias, partial blinding	11.3
Stain et al. (2013) [56] Identified by specific search terms	Measure attributes of top-ranked applicants to general surgery residency	Males had higher USMLE1 (238 v 230, p < .001) Males/Females had similar USMLE2 scores (245 v 244, p = .54) Highly competitive programs associated with • USMLE1 (RR 1.36) • Publications (RR 2.2) • Asian (RR 1.7 v white)	Gender: No gender bias detected based on pre-selection academic achievements	Strengths: National cohort, moderate sample size Limitations: Single program, single specialty, ad-hoc measures	12.4
Unkart et al. (2016) [57]	Measure reduction in general surgical residency applications among candidates self-identified as “disadvantaged”	URM were: • Older at entry (24 v 23, p < .001) • Lower MCAT (30 v 33, p < .001) • More likely to choose a less competitive specialty (p < .03)	URM/Gender: No bias detected based on USMLE 1	Strengths: National cohort, multiple years intake, large sample Limitations: Aggregates heterogenous groups, limited follow-up	12.4
Villwock et al. (2019) [58] Identified by specific search terms	Measure effect of STAR tool for selecting otolaryngology residency candidates to interview	USMLE scores significantly increased after STAR tool No differences in gender/URM before/after introduction of STAR selection tool	URM/Gender: STAR selection tool did not increase representation of URM/Gender	Strengths: Moderate sample size Limitations: Single program, exploratory	7.9

ARCP Annual Review of Competence Progression, CPST Clinical Problem Solving Test, DMG Domestic Medical Graduate, IMG International Medical Graduate, LoR Letter of Recommendation, PLAB Professional and Linguistic Assessment Board, SJT Situational Judgement Test, URM Underrepresented minority
^a MERSQI scores include subscales which are not applicable for all articles; scores are scaled after removal of these subscales to allow comparison with a maximum score of 18 for all articles (Reed et al, 2007) [17]

Back to article page

ISSN: 1472-6920

Contact us

Submission enquiries: bmcmedicaleducation@biomedcentral.com
General enquiries: ORSupport@springernature.com