Skip to main content

USMLE step 1 and step 2 CK as indicators of resident performance

Abstract

Background

The purpose of this systematic review was to (1) determine the scope of literature measuring USMLE Step 1 and Step 2 CK as predictors or indicators of quality resident performance across all medical specialties and (2) summarize the ability of Step 1 and Step 2 CK to predict quality resident performance, stratified by ACGME specialties, based on available literature.

Methods

This systematic review was designed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [16]. The original search strategy surveyed MEDLINE and was adapted to survey Cochrane Library and Embase. A study was deemed eligible if it provided all three of the following relevant information: (a) Step 1 or Step 2 CK as indicators for (b) resident outcomes in (c) any ACGME accredited specialty training program.

Results

A total of 1803 articles were screened from three separate databases. The 92 included studies were stratified by specialty, with Surgery (21.7% [20/92]), Emergency Medicine (13.0% [12/92]), Internal Medicine (10.9% [10/92]), and Orthopedic Surgery (8.7% [8/92]) being the most common. Common resident performance measures included ITE scores, board certification, ACGME milestone ratings, and program director evaluations.

Conclusions

Further studies are imperative to discern the utility of Step 1 and Step 2 CK as predictors of resident performance and as tools for resident recruitment and selection. The results of this systematic review suggest that a scored Step 1 dated prior to January 2022 can be useful as a tool in a holistic review of future resident performance, and that Step 2 CK score performance may be an effective tool in the holistic review process. Given its inherent complexity, multiple tools across many assessment modalities are necessary to assess resident performance comprehensively and effectively.

Peer Review reports

Introduction

In the early 1990s, the birth and evolution of the United States Medical Licensing Examination (USMLE) was implemented as the defining pathway for medical licensure in the United States [1]. It currently consists of Step 1, assessing the application of foundational sciences; Step 2 Clinical Knowledge (CK), assessing acquired knowledge of clinical medicine; and Step 3, assessing knowledge of clinical medicine and patient management. The landscape of medical education has undergone transformational change resulting in a growing number of schools strategically moving Step 1 following core clerkships education [2] followed by Step 2 CK which is traditionally completed during the fourth year of medical school [3]. The USMLE announced in January 2021, that Step 2 Clinical Skills (CS) would be indefinitely canceled citing the initial postponement during the COVID-19 pandemic and the ever-changing environment of medical education [4]. USMLE Step 3 is the final examination, and it is commonly taken during the PGY-1 year of residency and assesses competency in clinical knowledge and skills imperative for the unsupervised practice of medicine as a physician [5].

The USMLE Step 1 and Step 2 CK exams together have been two of the most important and influential factors for Residency Program Directors when assessing medical students’ candidacy for all residency training programs in the United States [6,7,8,9]. As of January 2022, Step 1 transitioned to a pass/fail score structure, reflecting its original purpose of being a criterion-referenced examination that determines whether examinees meet a pre-defined knowledge standard. Its previous use as a norm-referenced measure to assess performance relative to other test takers is no longer possible for any Step 1 examinations taken during or after January 2022. Today, Step 2 CK largely remains a norm-referenced exam providing residency programs standardized numerical values (1-300) to screen and compare applicants [10]. Prior to January 2022, Step 1 outcomes had a tremendous influence on the recruitment and selection of residents, acting as a rate-limiting step in reaching a new career milestone of being accepted into a desired residency ranging in different levels of competitiveness [11]. It has been reported that 94% of all National Residency Matching Program (NRMP) participating residency programs reported Step 1 as an important factor in selecting medical students to interview, with 68% of programs requiring a minimum target score [12].

The USMLE exams have shaped the culture of how students approach and value their medical education due to the emphasis placed on these two exams by the residency programs. The strategic preparation, planning, and study of a student are equivalent to, if not surpass, the emphasis that some residency programs previously placed on Step 1 as a measure of potential success. The vast majority of all medical students take a dedicated study period for both Step 1 and Step 2 exams at which time they study unmeasurable hours’ worth of material specific to the examinations [13]. It is not uncommon for students to experience imposter syndrome with their USMLE preparation and performance [14]. Together with the inherent stress of this process, it has negatively impacted the mental health and well-being of medical students across the globe [15]. Sharing many parallels to the evolutionary changes shaping medical education, the National Board of Medical Examiners (NBME) and Federation of State Medical Boards (FSMB) decision to convert Step 1 to a pass/fail score structure marked a historical change that will impose incredible influence on the next milestone of major changes that will shape medical education, career advising, and professional development [16].

The objective of this systematic review is to report on the literature that measures USMLE Step 1 and Step 2 CK as predictors of quality resident performance across all medical specialties and to better understand the ability of these exams to predict quality resident performance based on the available literature.

Method

This systematic review was designed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [17]. The guiding principle of this search strategy was to capture all studies related to the assessment of Step 1 or Step 2 CK as predictors/indicators of resident physician performance.

This review adapted a search strategy that investigated three databases. The original search strategy surveyed MEDLINE and was adapted to survey Cochrane Library and Embase. The framework for this search included the original language of publication and included original studies. The study period for inclusion was any article published after January 1990, as to include any article after the period during which the USMLE was established. Editorials and reviews were excluded. Key search terms were identified based on synonyms of the study basis (Step 1 and Step 2 CK) and study population (Residents of any ACGME-accredited specialty). The original, complete MEDLINE search strategy adapted for use on Cochrane Library and Embase is outlined in Appendix E1.

After completion of the search, irrelevant studies were excluded and duplicates were removed. The titles and abstracts of the remaining studies were individually screened by two authors, BR and NC, to determine eligibility. A study was deemed eligible if it provided all three of the following information: (a) Step 1 or Step 2 CK as indicators for (b) resident outcomes in (c) any ACGME-accredited specialty training program. Resident outcomes were considered any objective or subjective measures (examinations, evaluations, surveys, etc.) of resident performance. Both authors compared individually determining initial eligibility based on titles and abstracts and discussed any disagreements in-depth before forming a conclusion. Once the initial eligibility check was complete, full texts of each eligible article were examined before the final determination of inclusion was agreed upon by both screening authors. Using the Cochrane risk-of-bias tool, a risk-of-bias assessment was conducted on each article before inclusion to ensure a valid study design [18]. The complete search method can be appreciated in Figs. 1 and 2. Data collected from included studies were synthesized and presented in a narrative format, including a summary of the studies (Tables 1 and 2).

Fig. 1
figure 1

Study selection flowchart as designed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)

Fig. 2
figure 2

Summary figure outlining results of systematic review as designed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)

Table 1 USMLE Step 1 and/or 2 CK Correlation with Residency Performance; Stratified by Specialty
Table 2 USMLE Step 1 and/or 2 CK Correlation with Residency Performance; Stratified by Performance Measure

Results

A total of 1803 articles were screened from three separate databases. After excluding duplicates, irrelevant sources, and unoriginal research, 135 potential studies were identified from the titles and abstracts. A final screening, which included a full-text analysis, was conducted to determine eligibility. The 92 included studies were stratified by specialty, with Surgery (21.7% [20/92]), Emergency Medicine (13.0% [12/92]), Internal Medicine (10.9% [10/92]), and Orthopedic Surgery (8.7% [8/92]) being the most common. Results from each ACGME specialty were summarized in narrative format and specifics from every included study were listed (Tables 1 and 2).

Anesthesiology

Study numbers (No.) 1–3, as referenced from Table 1, were specifically relevant to Anesthesiology resident performance. Performance measures assessed across the three studies included Anesthesiology board certification, ITE scores, and Anesthesiology Knowledge Test (AKT) ranking. For board certification, No. 1 and 3 showed that Step 1 was predictive, and No. 3 showed that Step 2 CK was predictive. For ITE scores, No. 3 showed that Step 1 was predictive, and No. 2 and 3 showed that Step 2 CK was predictive. Additionally, No. 3 demonstrated that both Step 1 and Step 2 CK were correlated with AKT ranking. In summary, both Step 1 and Step 2 CK were predictive indicators of resident performance.

Dermatology

Study No. 4 was specifically relevant to Dermatology resident performance. In this study, Dermatology ITE-1, ITE-2, and ITE-3 scores were assessed as the outcome measure. After analyzing their results, the authors suggested that Step 1 scores correlated with ITE scores for each year of residency. After the screening, no data was found regarding Step 2 CK and Dermatology-specific resident performance measures.

Emergency medicine

Studies No. 5–16 were specifically relevant to Emergency Medicine resident performance. Performance measures assessed across these studies included Emergency Medicine board certification, ITE scores, ACGME core competency milestone evaluations, Script Concordance Test (EM-SCT) scores, measured negative resident outcomes, and resident graduation rank order list (ROL). For board certification, No. 15 showed that Step 1 was predictive, while No. 13 declared that Step 1 was not predictive. Both No. 13 and 15 showed that Step 2 CK was predictive. For ITE scores, No. 7 and 9 claimed that Step 1 was predictive, while No. 8 suggested Step 1 was not predictive in a multivariable linear regression model. Study No. 9 showed that Step 2 CK was predictive, while No. 8 claimed that Step 2 CK was not correlated with the ITE score in a multivariate linear regression model. Regarding ACGME milestone evaluations, results were generally mixed concerning Step 1 and Step 2 CK predictive value as demonstrated in No. 5, 10, 12, and 14. Study No. 6 showed that Step 2 CK correlated with the EM-SCT score. Study No. 11 demonstrated that Step 1 was predictive of measured negative resident outcomes. the resident ROL was not correlated with the Step 1 score according to No. 16. In summary, the predictive value of Step 1 and Step 2 CK was mixed and was largely dependent on the specific outcome measure being assessed.

Family medicine

Studies No. 17 and 18 were specifically relevant to Family Medicine resident performance. Performance measures assessed across the two studies included Family Medicine board certification, ITE scores, and ACGME core competency milestone evaluations. For board certification, No. 17 showed that Step 2 CK was predictive. For ITE scores, No. 18 showed that both Step 1 and Step 2 CK were predictive. However, No. 18 demonstrated no correlation between either Step score and ACGME milestone evaluations. In summary, Step 1 and Step 2 CK was associated with examination scores while no association was found with ACGME milestone ratings.

Fellowship - hematology and medical oncology

Studies No. 19 and 20 were specifically relevant to Hematology and Medical Oncology fellow performance. Performance measures assessed across the two studies included board certification, ITE scores, and the number of awards, abstracts, and publications during the fellowship. For board certification, No. 20 showed that Step 1 was predictive. For ITE scores, No. 19 demonstrated that both Step 1 and Step 2 CK were predictive. However, No. 19 indicated that neither Step 1 nor Step 2 CK was associated with the award, abstract, or publication number during the fellowship. In summary, Step 1 and Step 2 CK was associated with examination scores while no association was found with the number of awards, abstracts, and publications.

Fellowship - infectious disease

Study No. 21 was specifically relevant to Infectious Disease fellow performance. In this study by Grabovsky et al., the outcome measures assessed were ABIM-ID certification examination score and ABIM-ID certification examination passing status. The authors found that the Step 1 score correlated with the ABIM-ID certification examination score, while Step 2 CK was associated with ABIM-ID certification examination passing status.

Fellowship - neuroradiology

Study No. 22 was specifically relevant to Neuroradiology fellow performance. In this study by Yousem et al., the outcome measures assessed were Fellow E*Value scores (determined by faculty assessment of ACGME core competencies) and best-to-worst ranking within the fellowship cohort. After analyzing their results, the authors claimed that both Step 1 and Step 2 CK showed predictive value concerning the identified performance measures.

Internal medicine

Studies No. 23–32 were specifically relevant to Internal Medicine resident performance. Performance measures assessed across these studies included Internal Medicine board certification, ITE scores, ACGME core competency milestone evaluations, structured clinical exam performance, long block 360-degree ratings, direct patient assessment of physician attributes, and performance ratings. For board certification, No. 27, 30, and 31 showed that Step 1 was predictive, while No. 32 showed that it was not predictive. Studies No. 27, 30, and 32 claimed that Step 2 CK was predictive, while No. 29 demonstrated no predictive value. For ITE scores, No. 24, 26, and 32 showed that Step 1 was predictive, and No. 24, 26, 29, and 32 showed that Step 2 CK was predictive. For ACGME competency milestone evaluations, Step 1 was not a predictor according to No. 23, and Step 2 CK showed no correlation in No. 23 and 29. Study No. 25 showed that neither Step 1 nor Step 2 CK was associated with structured clinical exam performance. For long block 360-degree ratings, Step 2 CK was predictive while Step 1 was not according to No. 32. That same study showed that Step 2 CK, but not Step 1, was associated with direct patient assessment of physician attributes. Lastly, No. 28 demonstrated that Step 1 had predictive value for performance ratings. In summary, the predictive value of Step 1 and Step 2 CK was mixed and was largely dependent on the specific outcome measure being assessed.

Neurological surgery

Out of all screened articles across three databases, studies No. 33 and 34 were found to be pertinent to Neurological Surgery resident performance. In Neurological Surgery, the main performance measure being assessed by the studies was the American Board of Neurological Surgery (ABNS) scores. The studies were not in conflict with their conclusions, with No. 33 stating that Step 1 was correlated with ABNS scores and No. 34 stating that Step 2 CK was not correlated with ABNS scores. In summary, Step 1 correlated with resident performance while Step 2 CK did not.

Obstetrics and gynecology

In Obstetrics and Gynecology studies No. 35 and 36 were found with relevant results. Step 2 CK scores were found in No. 36 to be correlated with board certification and separately Step 1 scores were found by No. 35 to be significantly correlated with Council on Resident Education in Obstetrics and Gynecology in-training examination scores. Together, Step 1 and Step 2 CK were found to be correlated with at least one measure of resident performance in the field of Obstetrics and Gynecology. It should be noted that Step 1 had no association found with board certification, indicating a more limited application.

Ophthalmology

Studies No. 37–39 were found with relevant information in the field of Ophthalmology. The main performance indicators looked at for Ophthalmology were the Ophthalmic Knowledge Assessment Program (OKAP) exam and the American Board of Ophthalmology written qualifying examination (ABO-WQE). The studies were all concerned with how Step 1 impacted these performance measures and they were decidedly disparate in their conclusions thereby precluding any clear-cut larger takeaways. In summary, the findings indicate mixed results and no conclusion can be found concerning a correlation of Step 1 scores to resident performance.

Orthopedic surgery

Studies No. 40–47 were found with relevant information. Performance measures assessed included the American Board of Orthopedic Surgery (ABOS) Certifying Examination and Orthopedics In-Training Exam (OITE) scores. The ABOS scores reported that both Step 1 and Step 2 CK had a positive correlation. For OITE scores it was agreed upon in the publications that Step 2 CK had a positive correlation, but there was disagreement over whether the same could be said of Step 1. In summary, Step 2 CK positively correlated with both OITE and ABOS as indicators of resident performance, and Step 1 only correlated with ABOS.

Other - multispecialty residency publications

Studies No. 48–58 looked at the correlation of Step 1 and/or Step 2 CK scores to resident performance across multiple specialties. These studies generally fell into two types: those that used surveys of residency directors on resident performance as a measure, or those that used broad board exam certification rates as a measure. There is a significant amount of perceived conflict in the findings of these broader studies. The finding suggests that Step 1 and Step 2 CK had some correlation with board certification rates. It also seems that Step 1 was not correlated with professionalism as a resident. In general, these multispecialty residency publications ultimately lack cohesive conclusions about Step 1 or Step 2 CK as indicators of resident performance.

Otolaryngology - head and neck surgery

The main performance metrics found in the literature were OTE scores and WQE passage. Only studies No. 59 and 60 were found concerning Otolaryngology, and they did not conflict in their viewpoint that Step 1 and 2 CK both correlated with OTE and WQE performance. In conclusion, both Step 1 and Step 2 CK had solid positive correlations with resident performance.

Pathology

Pathology only had study No. 61 identified; a piece that found Step 1 to be correlated with ABP passage/failure rate. Therefore, the only conclusion that can be made in this specialty is that Step 1 was positively correlated with resident performance.

Pediatrics

Studies No. 62–64 identified those that could provide findings on Step 1 and Step 2 CK correlation with resident performance. Board passage rate and residency milestone scores were used as indicators of success. Step 1 was not found to be correlated with milestone scores, while it did correlate with board passage rates. For Step 2 CK, the findings conflicted as related to board passage rates. In conclusion, Step 1 and Step 2 CK predictive values were mixed for pediatric resident performance measures.

Psychiatry

Studies No. 65–66 were identified for psychiatry utilizing Psychiatry Resident-In-Training Examinations (PRITE) and psychotherapy treatment session evaluations as performance measures. Both studies noted significant correlations between Step 1 and Step 2 CK scores and resident performance based on these measures.

Radiology

Studies No. 67–71 were identified for Radiology utilizing American Board of Radiology (ABR) core examination scores, rotation evaluations, retrospective faculty recall scores, and cumulative major discordance rate for on-call cases as performance measures. Step 1 showed mixed correlation with resident success, with higher scores predicting lower major discordance rates in No. 67 and better ABR core examination performance in No. 68–71, but not predicting rotation performance. Step 2 CK consistently showed a positive correlation, with higher Step 2 CK scores indicating better ABR core examination performance in No. 69–71. Overall, Step 1 showed mixed success as an indicator of radiology resident performance while Step 2 CK was a significant indicator of resident performance.

Surgery

Studies No. 72–91 were identified in the Surgery specialty. Out of the 20 articles identified, American Board of Surgery In-Training Examination (ABSITE) scores, American Board of Surgery Qualifying and Certifying Examinations performance, faculty evaluations, Emotional Intelligence (based on Trait EI Questionnaire), resident remediation and attrition rates, categorical position placement, resident awards, case logs, scholarly activity, and professionalism were all utilized as performance measures. Step 1 showed a mixed correlation with surgical resident success. With few exceptions, many studies on objective performance measurements (e.g., In-Training Examinations, Qualifying and Certifying Examinations, remediation, and attrition), such as No. 72, 74, 76, 77, 79–84, and 86–90, showed a significant positive correlation with Step 1 scores. However, No. 73 and 91, which analyzed more subjective measures of resident performance (e.g., faculty evaluations), showed no correlation or even negative correlation in the case of resident awards. Step 2 CK also showed a mixed correlation with surgical resident success but showed a more positive correlation with overall resident performance than Step 1 demonstrated in No. 89 and 90. Step 2 CK scores showed a positive correlation with some of the same objective measures Step 1 did (e.g. In-Training Examinations, Qualifying, and Certifying Examinations), but went further to show a positive correlation with numerous other measures including Emotional Intelligence in No. 78, resident obtaining a first-choice categorical position in No. 85, and program director evaluations in No. 91. In summary, Step 1 and Step 2 CK both showed mixed correlation with surgical resident performance, with Step 2 CK showing a slightly more comprehensive correlation.

Urology

One relevant study, No. 92, was identified for Urology, utilizing Program Director Performance Evaluations as the performance measure. This study showed that Step 2 CK was significantly associated with higher Program Director Performance Evaluations, while Step 1 was not.

Discussion

Scope of literature

There have been various studies on how Step 1 and Step 2 CK scores correlate with medical resident performance but these studies have tended to be specialty-specific and varied greatly by performance measurement. This systematic and comprehensive review is the first to our knowledge considering the use of Step 1 and Step 2 CK as predictors of comprehensive resident performance across all ACGME-accredited specialties. Considering the immense weight placed on USMLE Step 1 and Step 2 CK success, only 92 studies relevant to this systematic review have been performed since the birth of these exams in the early 1990s. Taken together, this identifies an insufficient investigation by the medical community into their true predictive validity.

The scope of literature currently available regarding the utility of Step 1 and Step 2 CK as comprehensive indicators of resident performance can be appreciated in specific detail in both Tables 1 and 2. Areas, where either or both of the Step exams have not had any findings for a major performance measure in a given specialty, should be considered prime targets for further investigation. Results from subsequent investigations based on the gaps indicated by this systematic review will do a service to the program directors and other leaders in those specialties that finally receive overdue attention from the literature on residency performance prediction.

Step 1 and step 2 CK as performance predictors

The discussion regarding Step 1 and Step 2 CK as indicators of resident performance is a complicated one for two major reasons. First, it is important to recognize the potential possibility that not all medical residencies are equivalent. Therefore, to control for any potential differences concerning the field of study, the results of the systematic review were interpreted by initially stratifying the performance data by resident specialty in Table 1. Second, a case can be made that “resident performance” is an umbrella term that encompasses several different sub-categories including Board Certification, In-Training Examinations, ACGME core competency evaluations, Faculty and Program Director evaluations, and many others. Table 2 was constructed to appreciate the possibility that the USMLE Step exams may correlate to certain outcome measures and not others.

To begin the analysis, Steps 1 and 2 CK as performance indicators will be discussed for the first organizational strategy: Specialty. Steps 1 and 2 CK both demonstrated predictive value for resident performance outcomes in Anesthesiology, Infectious Disease and Neuroradiology fellowships, Otolaryngology-Head and Neck Surgery, and Psychiatry. Only Step 1 showed predictive value for resident performance outcomes in Dermatology, Neurological Surgery, and Pathology, while only Step 2 CK showed predictive value for resident performance outcomes in Obstetrics and Gynecology, Orthopedic Surgery, Radiology, and Urology. Both Step 1 and Step 2 CK showed mixed results for several specialties including Emergency Medicine, Family Medicine, Hematology and Medical Oncology Fellowship, Internal Medicine, Ophthalmology, Pediatrics, and Surgery. It is imperative to keep in perspective that these results are limited by the fact that relevant literature was lacking for many specialties. Some specialties had zero relevant articles, while certain others had one or two. More research should be conducted to assess the reproducibility of the results and strengthen the confidence in conclusions made for specialties lacking robust data.

To continue the analysis, Steps 1 and 2 CK as performance indicators will be discussed concerning the second organizational strategy: Outcome measure. An interpretation of Table 2 allows us to make generalized statements for both USMLEs. Importantly, an outcome measure was included in Table 2 only if three or more studies assessed it to increase confidence in the general conclusions that were made. Both Step 1 and Step 2 CK showed predictive value for Board Certification during residency across several different studies, with a few exceptions noted. Concerning In-Training Examinations, many articles demonstrated that Step 1 had predictive value, with a few exceptions noted. Step 2 CK also demonstrated strong predictive value, with only one exception noted. Using ACGME core competency milestone evaluations as the performance measure, Step 1 and Step 2 CK showed no predictive value, however notable exceptions to this generalization are shown in Table 2. Regarding Faculty and Program Director evaluations, Step 1 predictive value was mixed, while Step 2 CK showed predictive value in all relevant publications except one. It is important to consider the limitation that these conclusions from Table 2 are drawn across multiple studies assessing several diverse specialties and therefore may not account for any potential confounding variables associated with inherent differences between medical residencies.

The results of this study have merit and should be contemplated in the context of the decision to make Step 1 a pass/fail examination. On the one hand, Step 1 showed positive correlative value for certain outcome measures, particularly those related to standardized test-taking ability. On the other hand, Step 2 CK can largely provide much of the same information about resident performance, and perhaps even more so (i.e., predictive for Faculty/Program Director Evaluations). One downside of Step 2 CK remaining as a numeric score will be the inevitably increased pressure on medical students to perform well on this single examination.

The results of this systematic review demonstrate that both Step 1 and Step 2 CK can be useful, at least in some respects, across a variety of resident specialties and performance measures. In other respects, their value is either not researched enough or not replicable across the publications analyzed here. It is the opinion of the authors that these examinations, while helpful in some cases, should only be used as tools in the holistic assessment of future performance in residency. “Performance” is quite complex in its definition, particularly concerning a medical resident, and therefore a wide variety of assessment methods should be considered.

Limitations

It is important to interpret these findings in the context of limitations. Importantly, some of the resident indicators analyzed may be inherently unreliable as measures of overall resident success. Low correlations and negative findings may be due to characteristics of the resident performance measures and not due to the predictive power of USMLE scores. Additionally, many of the studies cited used terminology such as “predictor”, “association”, or “indicator” to describe the relationship between USMLE scores and resident performance metrics. It is important to note that this use of ‘predictor’ or ‘indicator’ does not imply causation and is only a description of positive association.

The findings from this study also should be understood to be of limited value in comprehending the true clinical proficiency of a given physician outside of the context of resident evaluation metrics. The term “performance” should not be misconstrued as referring to this wider authentic clinical ability which our study has not established as being linked to step 1 and 2 scores. Our findings should be most applicable to residency program directors and other organizers who would like a better understanding of the true association between step scores and resident performance metrics. That being said, there is a heterogeneity that must be acknowledged in residency program structure, quality, and educational emphasis that is reflected in the wide range of metrics that were relied upon in studies collected within this review.

Furthermore, it is critical to acknowledge that “objective measures” and “standardization” do not guarantee a lack of bias. Some standardization procedures may be inadvertently influenced by structural determinants that may disadvantage certain groups. Given this, it is essential for future research to explore these potential biases to ensure fairness and equitable evaluation.

Additonailly it is important to note that we have chosen not to utilize tools such as the Medical Education Research Study Quality Instrument (MERSQI) and the Newcastle-Ottawa Scale for Education (NOS-E) in the analysis of our findings for two main reason. First, the primary objective of our study was to conduct a systematic review and synthesis of the existing literature in order to investigate any possible relationship(s) between Step 1/2 performance and resident outcomes across different medical specialties. While the inclusion of quality assessment tools can be valuable, our intention was to include all available data that passed our screening process. We did not intend to rank or weigh each study based on its methodology. Second, assessing the quality of studies in the field of medical education and residency performance prediction may be possible, but is it is a complex task. The use of tools like MERSQI and NOS-E requires a detailed evaluation of various aspects of study design, methodology, and reporting. Applying these tools retrospectively to a wide range of studies with diverse research designs, settings, and outcome measures was outside the scope of our core objective.

Conclusions

These studies have reported value and are imperative to discern the utility of Step 1 and Step 2 CK as predictors of resident performance and as tools for resident recruitment and selection. The results of this systematic review suggest that both a scored Step 1 dated before January 2022 and Step 2 CK can be useful as tools in a holistic review of an applicant to residency programs. Given its inherent complexity, multiple tools across many assessment modalities are necessary to assess resident performance. Future research should explore the combined predictive value of standardized USMLE examinations, clinical evaluations, holistic review practices, and other critical skills to evaluate a more comprehensive approach to evaluating residency candidates.

Data Availability

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

USMLE:

United States Medical Licensing Examination

CK:

Clinical Knowledge

CS:

Clinical Skills

NBME:

National Board of Medical Examiners

FSMB:

Federation of State Medical Boards

PRISMA:

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

References

  1. Haist SA, Katsufrakis P, Dillon G. The evolution of the United States Medical Licensing Examination (USMLE). JAMA. 2013;310(21):2245. https://doi.org/10.1001/jama.2013.282328.

    Article  Google Scholar 

  2. United States Medical Licensing Examination. “United States Medical Licensing Examination Step 1.” United States Medical Licensing Examination, www.usmle.org/step-1/.

  3. “United States Medical Licensing Examination Step 2 Clinical Knowledge. ” United States Medical Licensing Examination, www.usmle.org/step-2-ck/.

  4. Howley LD, Deborah EL. Discontinuation of the Usmle Step 2 clinical skills examination. Acad Med Publish Ahead Print. 2021. https://doi.org/10.1097/acm.0000000000004217.

    Article  Google Scholar 

  5. “United States Medical Licensing Examination Step 3. ” United States Medical Licensing Examination, www.usmle.org/step-3/.

  6. Hartman ND, Lefebvre CW, Manthey DE. A narrative review of the evidence supporting factors used by Residency Program Directors to select applicants for interviews. J Graduate Med Educ. 2019;11(3):268–73. https://doi.org/10.4300/jgme-d-18-00979.3.

    Article  Google Scholar 

  7. Green M, Jones P, Thomas JX. Selection criteria for Residency: results of a National Program Directors Survey. Acad Med. 2009;84(3):362–7. https://doi.org/10.1097/acm.0b013e3181970c6b.

    Article  Google Scholar 

  8. Weissbart SJ, Stock JA, Wein AJ. Program Directors’ Criteria for Selection into Urology Residency. Urology. 2015;85(4):731–6. https://doi.org/10.1016/j.urology.2014.12.041.

    Article  Google Scholar 

  9. Schrock JB, Kraeutler MJ, Dayton M, McCarty E. A cross-sectional analysis of Minimum USMLE Step 1 and 2 criteria used by orthopaedic surgery Residency Programs in Screening Residency Applications. J Am Acad Orthop Surg. 2017;25(6):464–8. https://doi.org/10.5435/jaaos-d-16-00725.

    Article  Google Scholar 

  10. Gardner AK, Cavanaugh KJ, Willis RE, Dunkin BJ. Can better selection tools help us achieve our diversity goals in Postgraduate Medical Education? Comparing Use of Usmle Step 1 scores and situational Judgment tests at 7 Surgical Residencies. Acad Med. 2020;95(5):751–7. https://doi.org/10.1097/acm.0000000000003092.

    Article  Google Scholar 

  11. Mitsouras K, Dong F, Safaoui MN, Helf SC. Student Academic performance factors AFFECTING matching into First-Choice Residency and competitive specialties. BMC Med Educ. 2019;19(1). https://doi.org/10.1186/s12909-019-1669-9.

  12. Jayakumar KL. “Numerical Usmle Step 1 Scores Are Still Important in Selection of Residency Applicants.” Academic Medicine, vol. 91, no. 11, 2016, pp. 1470–1471., doi:https://doi.org/10.1097/acm.0000000000001402.

  13. Bigach SD, Winkelman RD, Savakus JC, Papp KK. A Novel Usmle Step 1 Projection Model using a single Comprehensive Basic Science Self-Assessment taken during a brief intense study period. Med Sci Educ. 2020;31(1):67–73. https://doi.org/10.1007/s40670-020-01097-7.

    Article  Google Scholar 

  14. Villwock JA, Sobin LB, Koester LA, Harris TM. Impostor Syndrome and Burnout among American Medical students: a pilot study. Int J Med Educ. 2016;7:364–9. https://doi.org/10.5116/ijme.5801.eac4.

    Article  Google Scholar 

  15. Bloodgood RA, Short J, Jackson JM, Martindale J. A change to Pass/Fail grading in the first two years at one Medical School results in Improved Psychological Well-Being. Acad Med. 2009;84(5):655–62. https://doi.org/10.1097/acm.0b013e31819f6d78.

    Article  Google Scholar 

  16. “Change to Pass/Fail Score Reporting for Step 1. ” United States Medical Licensing Examination, www.usmle.org/incus/#decision.

  17. “Preferred Reporting. Items for Systematic Reviews and Meta-Analyses (PRISMA)” PRISMA, www.prisma-statement.org.

  18. “Risk of Bias Tool. ” Cochrane Libr, https://methods.cochrane.org/bias/risk-bias-tool.

Download references

Acknowledgements

Not applicable.

Funding

The authors have no sources of funding to declare for this project.

Author information

Authors and Affiliations

Authors

Contributions

CL: Original Idea, brainstorming, planning, investigation, coordination, writing (original and final draft), reviewing, editing. NC: Brainstorming, Planning, Coordination, Investigation, writing (original and final draft), reviewing, editing. BR: Investigation, writing (original draft), reviewing, editing. JL: reviewing, editing.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics Approval and Consent to Participate

This article did not require ethics approval or consent to participate as it is a systematic review and analysis of publicly available information.

Consent for Publication

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lombardi, C.V., Chidiac, N.T., Record, B.C. et al. USMLE step 1 and step 2 CK as indicators of resident performance. BMC Med Educ 23, 543 (2023). https://doi.org/10.1186/s12909-023-04530-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12909-023-04530-8

Keywords