Skip to main content

Do coursework summative assessments predict clinical performance? A systematic review



Two goals of summative assessment in health profession education programs are to ensure the robustness of high stakes decisions such as progression and licensing, and predict future performance. This systematic and critical review aims to investigate the ability of specific modes of summative assessment to predict the clinical performance of health profession education students.


PubMed, CINAHL, SPORTDiscus, ERIC and EMBASE databases were searched using key terms with articles collected subjected to dedicated inclusion criteria. Rigorous exclusion criteria were applied to ensure a consistent interpretation of ‘summative assessment’ and ‘clinical performance’. Data were extracted using a pre-determined format and papers were critically appraised by two independent reviewers using a modified Downs and Black checklist with level of agreement between reviewers determined through a Kappa analysis.


Of the 4783 studies retrieved from the search strategy, 18 studies were included in the final review. Twelve were from the medical profession and there was one from each of physiotherapy, pharmacy, dietetics, speech pathology, dentistry and dental hygiene. Objective Structured Clinical Examinations featured in 15 papers, written assessments in four and problem based learning evaluations, case based learning evaluations and student portfolios each featured in one paper. Sixteen different measures of clinical performance were used. Two papers were identified as ‘poor’ quality and the remainder categorised as ‘fair’ with an almost perfect (k = 0.852) level of agreement between raters. Objective Structured Clinical Examination scores accounted for 1.4–39.7% of the variance in student performance; multiple choice/extended matching questions and short answer written examinations accounted for 3.2–29.2%; problem based or case based learning evaluations accounted for 4.4–16.6%; and student portfolios accounted for 12.1%.


Objective structured clinical examinations and written examinations consisting of multiple choice/extended matching questions and short answer questions do have significant relationships with the clinical performance of health professional students. However, caution should be applied if using these assessments as predictive measures for clinical performance due to a small body of evidence and large variations in the predictive strength of the relationships identified. Based on the current evidence, the Objective Structured Clinical Examination may be the most appropriate summative assessment for educators to use to identify students that may be at risk of poor performance in a clinical workplace environment. Further research on this topic is needed to improve the strength of the predictive relationship.

Peer Review reports


Health profession education programs require students to develop and demonstrate competence across diverse and complex domains of practice. The curriculums delivered across the medical, nursing and allied health professions vary in the attitudes, knowledge and skills required of their graduates. However, there are many similarities in the domains of competence required by the registration bodies of these professions. To be a licenced medical, nursing or allied health professional, graduates must demonstrate competence across domains of practice such as: professional and ethical behaviour, communication and interpersonal skills, knowledge, safety and quality, leadership and management, and collaborative practice [13]. Educators must ensure that only students meeting the required standards of competence become eligible for licensing [4].

As the domains of practice required by the different health professions share similarities, so to do the assessment frameworks used by their education programs [5]. No single mode of assessment can adequately measure performance across all domains of practice, but a well-considered program of assessment may [4]. Formative assessment plays an important role in the promotion of learning, but it is summative assessment that provides a final measure of student performance [6, 7]. Summative assessment in health profession education has three main goals: (i) the promotion of future learning, (ii) to ensure that high-stakes decisions such as progression, graduation and licensing are robust so the public is protected from incompetent practitioners, (iii) and to provide a basis for choosing applicants for advanced training [8]. To achieve the goals of providing robust evidence of competence, and the identification of appropriateness for advance training, summative assessments scores must necessarily be predictive of student’s future performance. However, there is limited evidence to support this assumption.

A systematic review by Hamdy et al. [9] of predictors of future clinical performance in medical students found OSCEs and pre-clinical grade point average (GPA) to be significant predictor variables for clinical performance, however the predictive relationships were limited. Additionally, a compilation and review of correlative studies by Harfmann and Zirwas [10] looked to answer whether performance in medical school could predict performance in residency. In their review, medical student pre-clinical GPA scores were one of the indicators that correlated most strongly with performance on examinations in residency.

While the reviews by Hamdy [9] and Harfmann and Zirwas [10] looked at a range of predictor variables, the only specific mode of summative assessment common to all health professions evaluated was the Objective Structured Clinical Examination (OSCE) and this was limited only to medical education programs. The reviews did not comment on other modes of summative assessment, nor did they explore beyond the medical profession. On this basis, the ability of a variety of modes of assessment to predict future clinical performance has yet to be investigated in detail.

The aim of this review was to critically appraise and discuss the findings of existing research investigating modes of summative assessment, and their ability to predict future clinical performance. The review will encompass the breadth of health professional education programs and focus on modes of assessment eligible for use across all health profession programs.


Search strategy

Peer reviewed research papers were gathered using a search of the PubMed, CINAHL, SPORTDiscus, ERIC and EMBASE databases. Key search terms were chosen to capture the breadth of assessments commonly used within the non-clinical components of health profession programs, as well as the variety of terms used to describe performance in a clinical setting. These search terms were generated following consultation with educators from health professions and are outlined in Table 1.

Table 1 Systematic review databases and search terms

Screening and selection

Title and abstracts of all papers identified by the initial database searches were screened and assessed against the following inclusion criteria:

  1. a.

    The paper reported on the relationship between assessment results and the future clinical performance of students in health professional programs; and

  2. b.

    The paper was published in the English language; and

  3. c.

    The paper was published after 1996.

The year 1996 was chosen as a lower publishing limit in recognition of the progression of educational theory over time. This date allows for the capture of 20 years of literature following on from the seminal papers by Harden [11] regarding the development of the OSCE and Miller’s framework for the assessment of clinical competence [12].

Papers selected for inclusion from the initial database searches were then subject to the application of rigorous exclusion criteria:

  1. a.

    The independent variable was a formative assessment;

  2. b.

    Individual modes of summative assessment were not specified (e.g. used overall GPA);

  3. c.

    The independent variable was a standardised assessment limited to use by a single health profession (e.g. National Board of Medical Examiners subject examinations);

  4. d.

    The independent variables were health profession education program admission criteria, applicant screening measures or entry measures;

  5. e.

    Clinical performance was not measured in either a clinical workplace setting or in a clinical examination conducted externally to the education program utilizing authentic or standardized patients; or

  6. f.

    The paper was an abstract, review, dissertation or discussion

The exclusion criteria listed above were applied to ensure reasonable consistency between papers in the interpretation of ‘summative assessment’ and ‘clinical performance’ to allow for a cohesive synthesis of the information. Review papers were used to provide background and supporting information. To ensure maximal search saturation a secondary search of the reference lists of papers retained for review, and papers providing background or supporting information were scanned for potentially relevant articles. These articles were then gathered and subjected to the same inclusion and exclusion criteria described above (Fig. 1).

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA)

Critical appraisal of methodological quality

Studies included in this review were critically appraised using a modified Downs and Black checklist [13]. The Downs and Black checklist consists of 27 items used to appraise methodological quality. The checklist was originally devised to assess the methodological quality of health care interventions, however it was appropriate to use in this review as it provided a structured format for critically appraising the papers selected for review. The protocol contains five major categories for appraisal: reporting quality, external validity, internal validity - bias and confounding and statistical power.

The original Downs and Black checklist is scored out of 32. All items excepting Items 5 and 27 are scored on a two-point scale. A classification as ‘yes’ is scored as ‘1’ point and a classification of ‘no’ or ‘unable to determine’ is scored as ‘0’ points. Item 5, which appraises the description of confounders is scored out of ‘2’ points, with ‘yes’ scoring ‘2’ points, a ‘partial description’ scoring ‘1’ point and ‘no’ scoring ‘0’ points. Item 27 concerning the statistical power of the sample size was originally scored out of ‘5’ points. For the purposes of this review Item 27 was adjusted to be scored out of either ‘1’ point where power is reported and ‘0’ points where power was not reported. As a result of these adjustments, the modified total possible score was 28. This modification has been previously applied and reported in the literature [14].

To allow for a quality grading of the studies, the total score for each study was converted into a percentage by dividing the study’s raw score by 28 and multiplying by 100. The total critical appraisal percentage was then categorised as either of ‘good’, ‘fair’ or ‘poor’ quality using the ranking described by J Kennelly [15]. When applied to the modified Downs and Black scoring Kennelly’s model categorises papers with critcal appraisal scores of 71% or greater as good quality, 54-70% as fair quality and 53% or less as poor quality.

Each paper was individually rated by two assessors (RT and NM) with the level of agreement determined by a Kappa analysis conducted by a third person (RO). Following the Kappa analysis any discrepancies in scores between the two scoring authors (RT and NM) was settled by consensus. Where consensus could not be reached, the raw scores were adjudicated by a third person (RO) to finalise the Critical Appraisal Score (CAS).

Data extraction and synthesis

Data from each paper included in the review were extracted by a single author (RT) and confirmed by the fellow authors. Data were assessed using a pre-determined format as follows: clinical education program, number of students, student year of study, summative assessments used, clinical setting in which performance was measured and statistics used to establish relationships. Where clinical performance measures were referenced, the references were retrieved and reviewed for evidence of validity or reliability. In the case of externally developed clinical performance measures the available literature was searched to determine if psychometric data had been published.

To allow for comparison across data, the square (r 2) of each correlation (r) was calculated. Squaring the correlation gives the variance which measures the proportion of variability in one variable that is explained by the relationship with the other variable [16]. In this review, the variance describes the proportion of variability in student’s clinical performance explained by summative assessment scores.


Literature search and selection

The results of the search are reported in Fig. 1. After the application of inclusion and exclusion criteria 18 papers were retained for final analysis. Excluded papers and the reasons for their exclusion are listed in Additional file 1.

Study participants

The papers retained for the final review reported on summative coursework assessments and student performance in the clinical setting and are summarised in Table 2. Across these papers seven different clinical professions were represented: medicine or osteopathic medicine (12), pharmacy (1), physiotherapy (1), dietetics (1), speech pathology (1), dentistry (1) and dental hygiene (1). Student populations studied were from the United States of America (11), Australia (2), Canada (1), the United Kingdom (1), New Zealand (1), South Korea (1) and Hong Kong (1).

Table 2 Summary of critical review papers

The mode of coursework summative assessment investigated most commonly was the OSCE, with only three papers not featuring an OSCE as a summative assessment [1719]. Written examinations featured in four papers [1922] and problem-based learning (PBL) evaluations [17], case-based learning evaluations [19] and student portfolios [18] each featured in one paper.

Measures of clinical performance used in the medical programs were: the United States Medical Licensing Examination Step 2 Clinical Skills (USMLE Step 2 CS) [23, 24]; the Comprehensive Osteopathic Medical Licensing Examination of the United States Level 2-Performance Evaluation (COMLEX-USA Level 2-PE) [25]; a Clinical Education Grade Form [26]; a standardised Clinical Evaluation Form [19]; intern performance scores [27]; senior doctor assessments [28]; the Junior Doctor Assessment Tool (JDAT) [20]; a global rating instrument [21]; program director evaluations [22] and residency program director assessments [29, 30]. A variety of clinical performance measures were used amongst the allied health programs: the Physiotherapy Clinical Performance Instrument (PT CPI) [31]; the National Dental Hygiene Examination (NDHE) [18]; the Hong Kong University (HKU) speech pathology clinical evaluation form and COMPASS®: Competency Based Assessment in Speech Pathology [17]; a standardized dietetics clinical teacher evaluation rubric [32]; an online evaluation form of pharmacy student performance [33] and a dental clinical productivity value [34].

Critical appraisal of methodological quality

Percentage scores based on the modified Downs and Black [13] checklist ranged from 29% [19] to 68% [21] with a mean percentage of 56.15% (±8.29%). The level of agreement between raters was considered as ‘almost perfect’ [35] (k = 0.852). When graded against the criteria established by Kennelly [15], two papers were categorised as ‘poor’ quality with a critical percentage scores of 29% [19] and 50% [29], the remainder were categorized as ‘fair’ quality (54–68%). All of the studies included in the review were descriptive cohort studies.

Analysis of the mean and standard deviations of the categories of the modified Downs and Black checklist were conducted and showed the mean score achieved in the ‘reporting’ category to be 5.94 points (±1.35 points) out of a possible 11 points. Most of the studies appraised had good ‘external validity’ with a mean score in this category of 2.5/3 points. The mean score in the ‘internal validitybias’ category was 4.33 points (±0.69 points) out of a possible 7 points. Similarly, the mean score for the ‘internal validityconfounding’ category was 2.94 points (±0.85 points) out of a possible 6 points.

The critical review findings are displayed in Table 3. All but four papers [22, 28, 32, 34] used either Pearson’s correlation, Spearman’s rho or point-biserial correlations to identify the relationship between summative assessment scores and clinical performance ratings. One paper reported correlations but did not specify the type [26]. Variances are listed in Table 4 and ranged from 1.4 to 39.7%.

Table 3 Critical Review Findings
Table 4 Proportion of variability accounted for by the relationship between summative assessment and clinical performance

Objective structured clinical examination

Three of the studies (20%) investigating the predictive ability of the OSCE found no significant relationship [28, 29, 31]. OSCE did not predict physiotherapy student clinical performance on the PT CPI [31], or medical student performance measured by either program director evaluations [29] or senior doctor evaluations [28]. Nine of twelve studies in the medical profession (75%) identified a significant positive relationship between medical student OSCE scores and clinical performance [1927, 30], with OSCE scores explaining between 1.9 and 39.7% of the variability in medical student clinical performance. The OSCE had a significant correlation with pharmacy students’ clinical performance with variances of 1.4–6.3% [33]. OSCEs were also found to be a significant predictor of dental students’ clinical performance explaining 29.2–37.7% of the variability in clinical productivity values [34]. A significant relationship was reported between pre-clinical OSCE scores and the clinical performance of dietetic students (β = 0.66; 95% CI 0.46–0.86; P < 0.0001) [32].

Written examinations

Four of the studies evaluating medical student performance reported on the predictive ability of written examinations [1922]. Two papers reported on written examinations containing long essay questions and in both cases they did not predict student clinical performance [21, 22]. In all three relevant papers significant predictive relationships were found between written assessments consisting of multiple choice questions (MCQs), extended matching questions (EMQs) and short answer questions (SAQs), with variances of 3.2, 7.3 and 29.2% [1921].

Other assessments

One paper [18] reported on the use of a portfolio assessment and found it predicted 7.3% of the variability in dental hygiene student clinical performance. A PBL evaluation consisting of three assessment items predicted 5.9–16.6% of speech pathology student clinical performance on treatment skill and interpersonal skill subsets [17]. Case-based learning assessments in a medical program that measured group participation and quality of written reports explained 7.3 and 4.8% of the variance students clinical performance respectively [19].

Prediction models

A prediction model for medical student clinical performance incorporating Year 4 and 5 OSCEs, Year 5 and 6 written examinations, scores from Year 6 clinical attachments and overall GPA identified that no individual summative assessment significantly influenced the clinical performance score; the best overall predictor of clinical performance measured by the JDAT was overall GPA [20]. A second paper [21] combined the OSCE and written examination results of medical students in a multiple regression model and found that the OSCE added significantly to the correlation with clinical performance scores. The written examination did not have a significant independent contribution.


The aim of this review was to critically appraise and discuss the findings of existing research investigating the ability of summative assessments used within the non-clinical components of an academic curriculum to predict clinical performance across the breadth of health profession education. Eighteen studies that met inclusion and exclusion criteria were critically reviewed. The overall methodological quality of the literature that was investigated to inform this review was considered to be ‘fair’. None of the studies included in the review were found to report on: (i) the principle confounders, (ii) the power of the research and (iii) attempts to blind either participants or those measuring clinical performance. The studies that scored more highly clearly described the summative assessment being investigated and the main findings, as well as reported actual probability values and the characteristics of students lost to follow up.

The OSCE is well established in health education programs worldwide. It is a mode of assessment specifically designed to provide a valid and reliable measure of students’ clinical competence in a simulated environment [11]. Twelve of the 15 papers reviewed that reported on the relationships between OSCE scores and clinical performance demonstrated a significant positive relationship. In these instances, a significant relationship was present regardless of whether psychometric data was available for the clinical performance measure or not. Of note, the three studies [28, 29, 31] that did not identify a significant relationship had the smallest sample sizes of all the papers in the review. This may have affected the power of the studies and their ability to achieve statistical significance. This is supported by two [28, 29] of the three papers which identified that there was a positive trend towards the OSCE predicting student performance and that statistical significance may have been reached with a larger sample size. The clinical performance measures used by studies included in this review assessed similar domains of competency to OSCEs, although in more complex and often less structured environments. OSCEs assess student performance at the ‘shows how’ level of Miller’s pyramid [36]; it is likely that the clinical performance measures also evaluate students at the ‘shows how’ level as there is a strong argument that ‘does’ can only be measured when the candidate is unaware of being observed or assessed [37]. The similarities between both the domains of competence and the levels of performance measured provides some explanation for the consistent positive relationship reported between students OSCE scores and their future clinical performance.

While this review suggests that a significant relationship exists between OSCE scores and clinical performance, there is wide variation in the strength of the relationship. With the OSCE explaining between 1.9% [20] and 39.7% [21] of the variation in student clinical performance, the strength of the relationships may have been influenced by other factors that in turn may vary between programs. One such factor is the structure of the OSCE itself. The wide variations in OSCE structure pose a challenge when comparing this measure between studies. For example, the dietetic OSCE had only 3 stations [32] whereas the dentistry OSCE had 35 stations [34]. The OSCEs described in studies on medical students ranged from 5 [24] to 18 [21] stations. The papers with the two strongest predictive relationships between OSCE and student clinical performance described OSCEs with 18 × 5 min stations [21] and 35 × 2 min stations [34] which suggests that longer OSCE assessments may be better predictors of performance. This finding is supported by a systematic review [38] of the reliability of the OSCE in medical education programs which identified that while scores on OSCEs are not always very reliable, better reliability was associated with a greater number of stations. This is attributed to a wider sampling of cases across the increased number of stations. Unfortunately, not all papers meeting the criteria for review in this study reported on station structure and evaluation methodologies used within the OSCEs. This limited the ability to further discuss the impact of OSCE structure on the predictive ability of the assessment but may explain the large differences in variance.

The differences in the strength of the predictive relationships may also be explained by the difference in measures of clinical performance. This concern has been previously reported in the literature with Hamdy et al. [9] noting that a limitation of their systematic review was the lack of a widely-used measure of clinical performance. The findings of the present review also need to be considered in light of the limitations imposed by the variety of clinical performance measures used.

A variance of 1.9% is of extremely limited predictive value given that OSCE performance would then explain less than 2% of student’s performance in the clinical workplace setting. However, a variance of 37.7% indicates a strong predictive relationship. A predictive relationship of this strength would be valuable for assisting to identify students at risk of poor performance in the clinical setting. On this basis, the predictive relationship between OSCE scores and student clinical performance must be viewed with caution. However, these scores could be used by educators as a method of identifying students that may be at risk of low performance in a clinical practice setting until a more robust measure is available.

As only one paper was identified for each of the portfolio, case-based and problem-based learning assessments there is inadequate data to draw conclusions about these modes of assessment. Four papers in the review did investigate written assessments. Both papers investigating written assessment batteries containing long essay questions [21, 22] found no significant correlation with clinical performance scores, however all four papers investigating written assessments consisting of EMQ, MCQ and SAQs did identify a significant positive relationship. This supports literature advocating the use of EMQs or MCQs in written examinations rather than essay questions [39]. Like the findings for the OSCE, there was a large difference in the strength of the relationship between papers reviewed. An EMQ/MCQ written assessment explained 29.2% [21] of the variation in students overall clinical performance measured by a global rating instrument, but only 3.2% [20] when clinical performance was measured by the JDAT. While other program factors other than the choice of clinical performance measure may also influence these relationships, there is a large difference in the ability of the MCE/EMQ written assessments to predict clinical performance. This highlights the need for research to occur where a standard measure of clinical performance is used to allow for comparison between studies. The findings of this review suggest that there is limited evidence to support the use of SMQ, MCQ and EMQ written assessments to predict student’s clinical performance and that the written examinations should be used as a predictive measure with caution.

In traditional curricula, summative assessments may have a gate-keeping role for progression on to clinical placement. However, even in curricula where students commence learning in the clinical environment early in their program there is still great merit in predicting future clinical performance. The early identification of students at risk of poor performance allows for targeted remediation prior to clinical experiences, as well as the implementation of focused support whilst the student is embedded in the clinical environment. However, until further research adds to the body of evidence, the use of summative assessments to predict student clinical performance should be approached with caution. If educators choose to use summative assessment results to attempt to predict clinical performance then this review suggests that the OSCE, which has a weak predictive value, may be the most appropriate choice. This review also implies that individual modes of summative assessment should not be the gatekeepers into the clinical practice environment as there is insufficient evidence to base high-stakes decisions (such as a student’s ability to progress on to clinical placement) on the predictive ability of these assessments.

In addition to the differences in the structure of summative assessments investigated and clinical performance measures used that this review has already discussed, a potential limitation of the research reviewed is that only students who completed their program of study were included. Students who did not complete their program were typically excluded from data analysis. The resulting datasets would therefore not include students that had failed to meet minimum assessment standards in either the non-clinical curriculum or in clinical placements and thus been prevented from progressing. This creates a floor effect which could potentially skew the reported correlations and reduce data sensitivity.

Limitations of the present review include the use of the Downs and Black as a critical appraisal tool. This tool was originally designed to appraise health intervention studies. While it has enabled a standardised critique of the studies in this review, it may be that the papers have been appraised more harshly when applied to the same critique as an interventional study. Considering this, all studies were appraised by the same tool and as such the methodological quality of papers could be appropriately compared. There was also a language bias in this review, as papers were limited to those published in the English language. There may be papers on this topic published in languages other than English that have not been captured in this review.

Future research on this topic should aim to recruit larger sample sizes to increase statistical power. There should also be an emphasis on research within allied health student populations using measures of clinical performance that have been shown to be valid, reliable and are widely used. This approach would allow for a more rigorous comparison between programs and even professions to be conducted, aiding in the generalisation of findings across the allied health professions.


The findings of this review suggest that assessments used within an academic curriculum do have significant positive relationships with the clinical performance of health professional students. To use these assessments as predictive measures caution is required due to a small body of evidence and large variations in the predictive strength of the relationships identified. The OSCE may be the most appropriate choice at this time for educators planning to use summative assessment scores to identify students that may be at risk of poor performance in a clinical workplace environment. Further research, with larger sample sizes, is required to determine the ability of summative assessments to predict the future clinical performance of health profession students particularly in allied health student populations.



Advanced Pharmacy Practice Experiences


Comprehensive Osteopathic Medical Licensing Examination of the United States Level 2 Performance Evaluation


Clinical performance examination


Central Region Dental Testing Service


Extended matching question


Grade point average


Hong Kong University


Junior Doctor Assessment Tool


Multiple choice question


National Dental Hygiene Examination


Objective structured clinical examination


Problem-based learning


Physiotherapy Clinical Performance Instrument


Short answer questions


Competency Based Assessment in Speech Pathology

USMLE Step 2 CS:

United States Medical Licensing Examination Step 2 Clinical Skills


  1. Physiotherapy Board of Australia & Physiotherapy Board of New Zealand. Physiotherapy practice thresholds in Australia and Aotearoa New Zealand. Physiotherapy Board of Australia. 2015. Accessed 19 Dec 2016

  2. Nursing & Midwifery Council. Standards for competence for registered nurses. Nursing & Midwifery Council. 2010. Accessed 19 Dec 2016

  3. General Medical Counil. Good medical practice. General Medical Council. 2013. Accessed 19 Dec 2016

  4. van der Vleuten CP, Schuwirth LW, Driessen EW, Dijkstra J, Tigelaar D, Baartman LK, Van Tartwijk J. A model for programmatic assessment fit for purpose. Med Teach. 2012;34(3):205–14.

    Article  Google Scholar 

  5. Pangaro L, Cate OT. Frameworks for learner assessment in medicine: AMEE Guide No. 78. Med Teach. 2013;35(6):e1197–210.

    Article  Google Scholar 

  6. Wood DF. Formative Assessment. In: Swanwick T, editor. Understanding Medical Education. Oxford: Wiley‐Blackwell; 2010. p. 259–70.

    Chapter  Google Scholar 

  7. Downing SM, Yudkowsky R. Introduction to assessment in the health professions. In: Downing SM, Yudkowsky R, editors. Assessment in Health Professions Education. Hoboken: Taylor and Francis; 2009.

    Google Scholar 

  8. Epstein RM. Assessment in medical education. N Engl J Med. 2007;356(4):387.

    Article  Google Scholar 

  9. Hamdy H, Prasad K, Anderson MB, Scherpbier A, Williams R, Zwierstra R, Cuddihy H. BEME systematic review: predictive values of measurements obtained in medical schools and future performance in medical practice. Med Teach. 2006;28(2):103–16.

    Article  Google Scholar 

  10. Harfmann KL, Zirwas MJ. Can performance in medical school predict performance in residency? A compilation and review of correlative studies. J Am Acad Dermatol. 2011;65(5):1010–22.

    Article  Google Scholar 

  11. Harden RM, Gleeson FA. Assessment of clinical competence using an objective structured clinical examination (OSCE). Med Educ. 1979;13(1):41.

    Article  Google Scholar 

  12. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9 Suppl):S63–7.

    Article  Google Scholar 

  13. Downs SH, Black N. The Feasibility of Creating a Checklist for the Assessment of the Methodological Quality Both of Randomised and Non-Randomised Studies of Health Care Interventions. J Epidemiol Comm Health. 1998;52(6):377–84.

    Article  Google Scholar 

  14. Cocke C, Orr R. The impact of physical training programs on the fitness of tactical populations: A critical review. J Aust Strength Cond. 2015;23(1):39–46.

    Google Scholar 

  15. Kennelly J. Methodological approach to assessing the evidence. In: Handler A, Kennelly J, Peacock N, editors. Reducing Racial/Ethnic Disparities in Reproductive and Perinatal Outcomes. Springer. 2011. p. 7–19.

    Chapter  Google Scholar 

  16. Gravetter FJ, Wallnau LB. Statistics for the behavioral sciences. 4th ed. Minneapolis: West Publishing Company; 1996.

    Google Scholar 

  17. Ho DW, Whitehill TL, Ciocca V. Performance of speech-language pathology students in problem-based learning tutorials and in clinical practice. Clin Linguist Phon. 2014;28(1–2):102–16.

    Article  Google Scholar 

  18. Gadbury-Amyot CC, Bray KK, Branson BS, Holt L, Keselyak N, Mitchell TV, Williams KB. Predictive validity of dental hygiene competency assessment measures on one-shot clinical licensure examinations. J Dent Educ. 2005;69(3):363–70.

    Google Scholar 

  19. Ferguson KJ, Kreiter CD. Using a longitudinal database to assess the validity of preceptors’ ratings of clerkship performance. Adv Health Sci Educ Theory Pract. 2004;9(1):39–46.

    Article  Google Scholar 

  20. Carr SE, Celenza A, Puddey IB, Lake F. Relationships between academic performance of medical students and their workplace performance as junior doctors. BMC Med Educ. 2014;14:157.

    Article  Google Scholar 

  21. Wilkinson TJ, Frampton CM. Comprehensive undergraduate medical assessments improve prediction of clinical performance. Med Educ. 2004;38(10):1111–6.

    Article  Google Scholar 

  22. LaRochelle JS, Dong T, Durning SJ. Preclerkship assessment of clinical skills and clinical reasoning: the longitudinal impact on student performance. Mil Med. 2015;180(4 Suppl):43–6.

    Article  Google Scholar 

  23. Berg K, Winward M, Clauser BE, Veloski JA, Berg D, Dillon GF, Veloski JJ. The relationship between performance on a medical school’s clinical skills assessment and USMLE Step 2 CS. Acad Med. 2008;83:10.

    Article  Google Scholar 

  24. Dong T, Swygert KA, Durning SJ, Saguil A, Gilliland WR, Cruess D, DeZee KJ, LaRochelle J, Artino AR. Validity evidence for medical school OSCEs: associations with USMLE® step assessments. Teach Learning Med. 2014;26(4):379–86.

    Article  Google Scholar 

  25. Baker HH, Cope MK, Adelman MD, Schuler S, Foster RW, Gimpel JR. Relationships between scores on the COMLEX-USA Level 2-Performance Evaluation and selected school-based performance measures. J Am Osteopath Assoc. 2006;106(5):290–5.

    Google Scholar 

  26. Cope MK, Baker HH, Foster RW, Boisvert CS. Relationships between clinical rotation subscores, COMLEX-USA examination results, and school-based performance measures. J Am Osteopath Assoc. 2007;107(11):502–10.

    Google Scholar 

  27. Han ER, Chung EK. Does medical students’ clinical performance affect their actual performance during medical internship? Singapore Med J. 2016;57(2):87–91.

    Article  Google Scholar 

  28. Probert CS, Cahill DJ, McCann GL, Ben-Shlomo Y. Traditional finals and OSCEs in predicting consultant and self-reported clinical skills of PRHOs: a pilot study. Med Educ. 2003;37(7):597–602.

    Article  Google Scholar 

  29. Kahn MJ, Merrill WW, Anderson DS, Szerlip HM. Residency program director evaluations do not correlate with performance on a required 4th-year objective structured clinical examination. Teach Learn Med. 2001;13(1):9–12.

    Article  Google Scholar 

  30. Campos-Outcalt D, Watkins A, Fulginiti J, Kutob R, Gordon P. Correlations of family medicine clerkship evaluations and Objective Structured Clinical Examination scores and residency directors’ ratings. Fam Med. 1999;31:90–9.

    Google Scholar 

  31. Wessel J, Williams R, Finch E, Gemus M. Reliability and validity of an objective structured clinical examination for physical therapy students. J Allied Health. 2003;32(4):266–9.

    Google Scholar 

  32. Hawker JA, Walker KZ, Barrington V, Andrianopoulos N. Measuring the success of an objective structured clinical examination for dietetic students. J Hum Nutr Diet. 2010;23(3):212–6.

    Article  Google Scholar 

  33. McLaughlin JE, Khanova J, Scolaro K, Rodgers PT, Cox WC. Limited Predictive Utility of Admissions Scores and Objective Structured Clinical Examinations for APPE Performance. Am J Pharm Educ. 2015;79(6):84.

    Article  Google Scholar 

  34. Graham R, Zubiaurre Bitzer LA, Anderson OR. Reliability and predictive validity of a comprehensive preclinical OSCE in dental education. J Dent Educ. 2013;77(2):161–7.

    Google Scholar 

  35. Viera AJGJ. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360–3.

    Google Scholar 

  36. Khan KZ, Ramachandran S, Gaunt K, Pushkar P. The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part I: an historical and theoretical perspective. Med Teach. 2013;35(9):e1437.

    Article  Google Scholar 

  37. Khan K, Ramachandran S. Conceptual framework for performance assessment: competency, competence and performance in the context of assessments in healthcare--deciphering the terminology. Med Teach. 2012;34(11):920.

    Article  Google Scholar 

  38. Brannick MT, Erol-Korkmaz HT, Prewett M. A systematic review of the reliability of objective structured clinical examination scores. Med Educ. 2011;45(12):1181–9.

    Article  Google Scholar 

  39. Lukhele R, Thissen D, Wainer H. On the Relative Value of Multiple-Choice, Constructed Response, and Examinee-Selected Items on Two Achievement Tests. J Educ Meas. 1994;31(3):234–50.

    Article  Google Scholar 

  40. Gimpel JR, Boulet DOJR, Errichetti AM. Evaluating the clinical skills of osteopathic medical students. J Am Osteo Assoc. 2003;103(6):267.

    Google Scholar 

  41. Sandella JM, Smith LA, Dowling DJ. Consistency of interrater scoring of student performances of osteopathic manipulative treatment on COMLEX-USA Level 2-PE. J Am Osteo Assoc. 2014;114(4):253.

    Article  Google Scholar 

  42. De Champlain A, Swygert K, Swanson DB, Boulet JR. Assessing the underlying structure of the United States Medical Licensing Examination Step 2 test of clinical skills using confirmatory factor analysis. Acad Med. 2006;81(10 Suppl):S17.

    Article  Google Scholar 

  43. Carr SE, Celenza A, Lake F. Assessment of Junior Doctor performance: a validation study. BMC Med Educ. 2013;13:129.

    Article  Google Scholar 

  44. Kreiter CD, Ferguson K, Lee WC, Brennan RL, Densen P. A generalizability study of a new standardized rating form used to evaluate students’ clinical clerkship performances. Acad Med. 1998;73(12):1294.

    Article  Google Scholar 

  45. Haladyna TM. An Evaluation of the Central Regional Dental Testing Services National Dental Hygiene Examination. 2011.

    Google Scholar 

  46. McAllister M. Transformative teaching in nursing education: leading by example. Collegian. 2005;12(2):11–6.

    Article  Google Scholar 

  47. McAllister S, Lincoln M, Ferguson A, McAllister L. Issues in developing valid assessments of speech pathology students’ performance in the workplace. Int J Lang Commun Disord. 2010;45(1):1–14.

    Article  Google Scholar 

  48. Dong T, Durning SJ, Gilliland WR, Swygert KA, Artino Jr AR. Development and initial validation of a program director’s evaluation form for medical school graduates. Mil Med. 2015;180(4):97–103.

    Article  Google Scholar 

  49. Roach K, Gandy J, Deusinger SC, Gramet P, Gresham B, Hagler P, Rainery Y. The development and testing of APTA Clinical Performance Instruments. American Physical Therapy Association. Phys Ther. 2002;82(4):329–53.

    Google Scholar 

  50. Straube D, Campbell SK. Rater discrimination using the visual analog scale of the Physical Therapist Clinical Performance Instrument. J Phys Ther Educ. 2003;17(1):33.

    Google Scholar 

  51. Wilkinson TJ, Frampton CM. Assessing performance in final year medical students. Can a postgraduate measure be used in an undergraduate setting? Med Educ. 2003;37(3):233–40.

    Article  Google Scholar 

Download references


Not applicable.


No funding was received.

Availability of data and materials

All data generated or analysed during this study are included in this published article (and its Additional file 1).

Authors’ contributions

All authors contributed to the conceptualisation and planning of the review and to the development of the search strategy. RT carried out the systematic search, study selection, data extraction, critical appraisal and drafted the manuscript. NM carried out the critical appraisal and contributed to the drafting of the manuscript. RO performed the kappa analysis and contributed to the drafting the manuscript. WH settled any disagreements in critical appraisal and contributed to drafting the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rebecca Terry.

Additional file

Additional file 1:

Papers excluded from review and reasons for exclusion. (DOCX 30 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Terry, R., Hing, W., Orr, R. et al. Do coursework summative assessments predict clinical performance? A systematic review. BMC Med Educ 17, 40 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: