Relationship between epa level of supervision with their associated subcompetency milestone levels in pediatric fellow assessment

Background Entrustable Professional Activities (EPA) and competencies represent components of a competency-based education framework. EPAs are assessed based on the level of supervision (LOS) necessary to perform the activity safely and effectively. The broad competencies, broken down into narrower subcompetencies, are assessed using milestones, observable behaviors of one’s abilities along a developmental spectrum. Integration of the two methods, accomplished by mapping the most relevant subcompetencies to each EPA, may provide a cross check between the two forms of assessment and uncover those subcompetencies that have the greatest influence on the EPA assessment. Objectives We hypothesized that 1) there would be a strong correlation between EPA LOS ratings with the milestone levels for the subcompetencies mapped to the EPA; 2) some subcompetencies would be more critical in determining entrustment decisions than others, and 3) the correlation would be weaker if the analysis included only milestones reported to the Accreditation Council for Graduate Medical Education (ACGME). Methods In fall 2014 and spring 2015, the Subspecialty Pediatrics Investigator Network asked Clinical Competency Committees to assign milestone levels to each trainee enrolled in a pediatric fellowship for all subcompetencies mapped to 6 Common Pediatric Subspecialty EPAs as well as provide a rating for each EPA based upon a 5-point LOS scale. Results One-thousand forty fellows were assessed in fall and 1048 in spring, representing about 27% of all fellows. For each EPA and in both periods, the average milestone level was highly correlated with LOS (rho range 0.59–0.74; p < 0.001). Correlations were similar when using a weighted versus unweighted milestone score or using only the ACGME reported milestones (p > 0.05). Conclusions We found a strong relationship between milestone level and EPA LOS rating but no difference if the subcompetencies were weighted, or if only milestones reported to the ACGME were used. Our results suggest that representative behaviors needed to effectively perform the EPA, such as key subcompetencies and milestones, allow for future language adaptations while still supporting the current model of assessment. In addition, these data provide additional validity evidence for using these complementary tools in building a program of assessment.


Background
Early in the transition to a competency-based model for trainee education and assessment, identifying the Accreditation Council for Graduate Medical Education (ACGME) core competencies in the United States (US) and the CanMeds roles in Canada were critical first steps [1,2].Each of the core competencies was further refined into specific "subcompetencies" in the US and each Can-Meds role elaborated and defined.An important next step was the creation of milestones specific to the subcompetency or the role [3,4].The milestones represent defined, observable abilities of an individual's skills along a developmental continuum [5].In the US, each specialty was tasked with creating both the subcompetencies and milestones for the ACGME competencies [3].Pediatrics created milestones for 48 subcompetencies, of which only 21 were reported to the Accreditation Council for Graduate Medical Education (ACGME) biannually for all trainees [6].Milestone ratings ranged from one to four or one to five, but trainees were not necessarily expected to achieve the highest levels at the time of graduation.
The subsequent creation of Entrustable Professional Activities (EPAs) by ten Cate and Scheele [7] complements the milestones by providing a meaningful clinical context for the subcompetencies.EPAs are observable activities of a profession that an individual should be able to execute without supervision when in practice [8][9][10][11].As opposed to subcompetencies, EPA assessments are based upon the amount of supervision a trainee needs to perform the activity safely and effectively, ranging from direct to indirect to none [12,13].Basing EPA judgements on needed levels of trainee supervision aligns what faculty do in real time with what they are asked to do as part of the assessment process, thus adding to their validity evidence [14].
To link EPA and milestone assessments, medical education leaders then mapped the subcompetencies thought to be critical in executing the professional activities of each EPA [7,15,16].An example of the mapping of the Leadteam EPA (Table 1) is illustrated in Fig. 1.For this EPA, 8 subcompetencies were judged to be important in making the entrustment decision.Milestones for 5 of the 8 subcompetencies are required to be reported to the ACGME in the fall and spring each year.While mapping was accomplished by experts through an iterative process, data supporting the mapping are lacking and it is unknown if any specific subcompetency is more important than the others in making the entrustment decision.Similarly, it is unclear whether using all the mapped subcompetencies, or only those required to be reported to the ACGME, are critical in formulating the entrustment decision.This information would be helpful to know for future studies, particularly if the milestone levels could be obtained directly from the ACGME.
Milestones and EPA level of supervision (LOS) both represent elements of a system of trainee assessment [17].While all ACGME accredited programs must report milestones, many specialties are now promoting the use of EPA LOS for assessment.The American Board of Pediatrics announced that it will begin using EPA LOS ratings to determine eligibility to sit for its certification exams [18].In July 2023, the American Board of Surgery began using EPAs as the foundation for competency-based surgical training [19].Both Emergency Medicine and Family Medicine, along with other disciplines, are also currently exploring the use of EPA LOS in their training programs [18].Since EPAs provide a holistic view ("wide lens" in Fig. 1) of the execution of an activity, and milestones provide a more granular assessment ("narrow lens") of specific behaviors needed to perform them, there should be a strong relationship between these two approaches to assessment [20].This relationship requires further exploration as the finding of a strong association between the two would provide validity evidence for both types of assessments.
Using this logic, our hypotheses were: 1) there is a strong correlation between EPA LOS rating with the average score of the mapped subcompetency milestone levels needed to perform the EPA; 2) some subcompetencies would be more critical than others such that weighted scores will have a stronger correlation; and 3) if only those milestones required for reporting to the ACGME were included in the analysis, the correlation between EPA LOS rating and average milestone level would be weaker.

Methods
We performed the study using the Subspecialty Pediatrics Investigator Network (SPIN), a medical education research network that includes representatives from each of the 14 pediatric subspecialties with primary American Board of Pediatrics certification as well as the Council of Pediatric Subspecialties, Association of Pediatric Program Directors Fellowship Executive Committee, and Association of Pediatric Program Directors Longitudinal Educational Assessment Research Network (APPD LEARN) [21].The goal was to recruit at least 20% of fellowships from each subspecialty.We obtained IRB approval from each participating institution with the University of Utah serving as the lead.

Lead within the subspecialty profession Leadprof
Level 1:Trusted to observe only Level 2:Trusted to contribute to advocacy and public education activities for the subspecialty profession with direct supervision and coaching at the institutional level Level 3:Trusted to contribute to advocacy and public education activities for the subspecialty profession with indirect supervision at the institutional level Level 4:Trusted to mentor others and lead advocacy and public education activities for the subspecialty profession at the institutional level Level 5:Trusted to lead advocacy and public education activities for the subspecialty profession at the regional and/or national level One week before the Clinical Competency Committee (CCC) meeting, we asked fellowship program directors (FPDs) to assign a LOS rating for each fellow for 6 of the 7 EPAs common to all pediatric subspecialties (Common Pediatric Subspecialty EPAs; Table 1) [22].Then, at the CCC meeting, we asked the members to first assign a milestone level to all 29 subcompetencies mapped to these six EPAs.Of note, in Pediatrics, all subspecialties utilize the same milestones.CCCs then assigned a LOS rating for each fellow for each Common Pediatric Subspecialty EPA.We provided no specific instructions to the FPD or CCC members about the procedure to determine fellow ratings or faculty development about EPAs or the EPA LOS scales.Representatives of the 14 subspecialties contributed to the development of the EPA LOS scales and the validity evidence for them has previously been published [23,24].Designed to be intuitive, these ordinal 5-level scales are based upon direct, indirect and no supervision with case complexity being a variable in determining the need for supervision at some levels for some EPAs (Table 1).
The anonymity of trainees was ensured by creating a unique participant identifier number using an algorithm developed by APPD LEARN [25].Once this ID was created, we provided specific links to the online data collection instruments.In the survey instrument, we first elicited milestone ratings for each of the subcompetencies grouped by the 6 core competencies and then obtained LOS ratings for each EPA.When presenting the subcompetency, we displayed the subcompetency name and descriptions for each milestone; when presenting each EPA, we displayed the title of the EPA and the associated functions necessary to carry out the activities followed by the LOS scale.We also collected information about each fellow's subspecialty and year of fellowship, institution, the number of fellows in the program, how long the FPD served in this role, and FPDs self-reported understanding of EPAs (unfamiliar, basic, in-depth, or expert).We also asked whether the FPD was a member of the CCC since FPD participation on the CCC may influence assignment of trainee ratings [24].Details about the data collection tools have been previously described [23].We collected data in fall 2014 and spring 2015.The abbreviations for each EPA are listed in Table 1.
For each EPA, we computed an unweighted composite milestone score by averaging the milestone levels for the subcompetencies mapped to that EPA.We compared LOS ratings and unweighted composite score for trainees at each data collection period using linear mixed models adjusting for repeated measures and clustering within programs.
We computed a weighted composite milestone score for each EPA by using a confirmatory factor analysis procedure to fit path coefficients and mean structures between each EPA's LOS and its mapped subcompetencies, adjusting for clustering in program, and then used the path coefficients to generate a weighted average of the milestone levels.To assess the fit of the procedure, we examined the comparative fit index and the root mean squared error of approximation using the entire sample; to guard against overfitting, we also conducted a fivefold cross-validation bootstrap process, fitting the path coefficients on 80% of the data and making predictions on the remaining 20%, repeating the process for each fold and averaging 500 replications.
We tested the hypothesis that composite milestone scores would be correlated with LOS ratings using Spearman's ρ.We tested the hypothesis that weighted composites would outperform unweighted composites by comparing confidence intervals around the ρ values for the weighted and unweighted composites, and similarly tested differences between unweighted composites at programs where the FPD did or did not serve on the CCC.We tested the hypothesis that using all critical subcompetencies would better predict LOS than using only ACGME-reported milestones in a similar fashion, and directly compared the fit of the nested weighted confirmatory factor analysis models using a likelihood ratio χ 2 test.
We generated equations to predict milestone levels using the path coefficients in the confirmatory factor analysis for each model.For external validation of the predicted levels, we used spring 2019 EPA LOS ratings that were obtained in a recently completed study [26].EPA LOS ratings were collected in the same manner as described above except that milestone levels were not obtained.Spring 2019 milestone levels were provided by the ACGME through a data sharing agreement with APPD LEARN.With these data, we examined the goodness-of-fit using the ACGME equations.
Using the model with the best fit and parsimony, we constructed receiver operating characteristic (ROC) curves for the ability of that model's composite milestone score to discriminate between decisions affirming or refuting entrustment, using levels 4 or 5 as the minimum level for affirmation of entrustment.Data analyses were conducted using R 3.6 (R Core Team, Vienna, Austria).
Mean EPA LOS and unweighted composite milestone scores for each period are displayed in Table 2.Both EPA LOS and milestone score increased from the fall to the spring (p < 0.001 for each EPA, adjusting for repeated measures and clustering of trainees in programs).Testing hypothesis 1: There is a strong correlation between EPA LOS rating with the average score of the mapped subcompetency milestone levels needed to perform the EPA There was moderate to strong correlation between the unweighted composite milestone score and EPA LOS, ranging from 0.59 for the Management EPA to 0.74 for Leadteam, supporting our first hypothesis (Fig. 2, Table 3).There was no difference in the correlations between the two periods (p > 0.05).Correlations between LOS ratings made independently by the FPD before the CCC meeting with composite milestone scores were similar to those made by the CCC when the FPD was not a member.In addition, when examining the associations in programs where the FPD was not a CCC member versus where the FPD was a member, the correlations were somewhat lower for some EPAs (QI and Management in the fall).Otherwise, they were not significantly different.The significant associations between milestone and EPA LOS ratings persisted after adjustment for institution, subspecialty, and program and FPD characteristics.
Testing hypothesis 2: Some subcompetencies would be more critical than others such that weighted scores will have a stronger correlation Correlations calculated using a weighted composite milestone score were not significantly better than those calculated using an unweighted score, counter to our second hypothesis.The most parsimonious best-fitting model was thus the unweighted ACGME-reportedmilestones-only composite score.Figure 3 shows ROC curves using unweighted ACGME-reported-milestonesonly composite score from the spring for the 6 EPAs to predict entrustment based upon a minimum EPA LOS of 4 or 5.The area under the curve (AUC) was excellent, ranging from 0.81 (95% CI: 0.78-0.84)for Management to 0.90 (0.86-0.94) for QI.When assuming entrustment based upon attaining a LOS of 5, the AUCs were similar to those using a minimum of level 4 (p > 0.05).For each EPA, there was no difference between the AUC in fall and spring (p > 0.05) or based upon FPD CCC membership (p > 0.05).

Testing hypothesis 3: If only those milestones required for reporting to the ACGME were included in the analysis, the correlation between EPA LOS rating and average milestone level would be weaker
Goodness-of-fit for weighted models using all milestones or only those reported to the ACGME were both excellent and did not differ (p = 0.72), counter to our third hypothesis that the correlation using all mapped milestones would be stronger.The comparative fit index and root mean square error of approximation of models using all milestones were 0.999 (> 0.95 is excellent) and 0.034 (< 0.05 is excellent), respectively, while values using only the ACGME reported milestones were 0.998 and 0.043 [28].
Prediction equations for each model are shown in Table 4.
The external validation sample included 1373 EPA LOS ratings from 503 (36.6%) first year, 448 (32.6%) second year and 422 (30.7%) third year fellows.The comparative fit index and root mean square error of approximation of models using the ACGME prediction equations were 0.994 (> 0.95 is excellent) and 0.071 (0.06-0.08 is fair), respectively [28].

Discussion
In support of our first hypothesis, we found a strong relationship between milestone levels for subcompetencies mapped to an EPA and the LOS ratings for that EPA, providing validity evidence for both approaches.Our data do not support the second hypothesis in that we found the relationship between EPA LOS and milestone scores was nearly identical whether we used the unweighted or weighted milestone scores.Likewise, the relationship between milestone level and EPA LOS rating was similar when only the ACGME reported milestones were utilized in the model compared with using milestones from all 29 subcompetencies mapped to the six EPAs.
Our results are similar to those of Larrabee et al., who examined the association between 27 EPAs that they developed for 4 core rotations in pediatric residency with milestones mapped to these EPAs [29].They found a strong correlation between the two ratings, with an overall median R2 of 0.81.Although these investigators focused on residents and used a different LOS scale, the concordance between Larrabee's findings and ours nevertheless suggests that the relationship between milestones and EPA LOS is generalizable and not dependent upon a particular group of trainees or a specific LOS scale.
The areas under the curve for all EPAs were very high.Except for one, this was irrespective of whether entrustment was set at level 5 or at level 4 or 5, indicating that the relationship was not solely dependent upon how entrustment was defined.While executing the EPA without supervision is the goal, not all FPDs believe that fellows must achieve level 5 (unsupervised practice) in all EPAs to graduate, indicating that some supervision may still be needed [30][31][32].Also, for some EPAs, while the correlations were somewhat weaker if the FPD was not a member on the CCC, the differences were small and the AUCs for the ROC curves were not affected.We constructed equations to predict milestone level based upon EPA LOS rating that had an excellent goodness of fit.In both the derivation and validation samples, the comparative fit index was excellent.While there was a slight decline in the root mean square error of approximation using the data 4 years later, coupled with the strong comparative fix index, the overall goodness-of-fit was still very good.Since the spring 2019 data represent assessments of different fellows, and likely include CCCs that had differing compositions, this shows that the equations maintained their precision over time.
Showing that there is a strong relationship between milestones and EPA LOS helps to address FPD's concerns about the additional work involved as EPAs become a required element of trainee assessment [33][34][35].Faculty find using EPA LOS scales very intuitive and generating milestone levels based on EPA LOS ratings should be timesaving [33].These predicted milestone ratings can serve as a starting point when the CCC discusses each trainee and makes the final assignments.We used milestones version 1.0 in our study to develop the equations, but milestones 2.0 will shortly be implemented across all specialties and subspecialties.Our finding that models using all milestones compared with only those reported to the ACGME are similar will make it easier to revise the equations with updated milestones.We found little difference in the correlations between EPA LOS and milestones based on whether or not the FPD was a member of the CCC.These findings are consistent with our previous report that the association between FPD and CCC assignment of EPA LOS is strong [24].With the exception of the Management EPA in the fall, the correlations for when the FPD made the assessments independently of the CCC were also similar.It is reassuring that the relationship between LOS and milestones is not affected by FPD membership on the CCC.
While both approaches to assessment are highly related, it is important to recognize the contribution of each in creating an overall program of trainee assessment [17,[36][37][38].As program directors and CCCs make decisions about trainee progression toward unsupervised practice, the need to focus more on either EPA LOS ratings or milestone levels may depend on the circumstances.For high performing trainees who require minimal supervision to execute an EPA, milestone levels may be less important than for a trainee who is early in development and requires more supervision.In the latter case, the descriptive language of the skills included in each milestone level can help the trainee focus on improvements needed to effectively perform the EPA, especially if they are below the normative national standards or not meeting program expectations [39][40][41].
There are several limitations to this study.We asked FPDs to assign milestones before assigning LOS for the EPAs rather than randomizing the assessments.This could have biased their rating for EPA LOS.The initial data collection period was the first time FPDs had to report milestones to the ACGME and assign EPA LOS ratings.With more experience, the application of these assessments may change, although we saw no difference in results between the two reporting periods.In addition, the goodness-of-fit of the equations using the ACGME-reported milestones suggests that there has not been much change in in the milestone-LOS relationship over time.Finally, we used the Common Pediatric Subspecialty EPAs, and the findings cannot necessarily be extrapolated to assessments made using EPAs developed by other specialties.

Conclusions
We found strong agreement between assessments based on subcompetency milestones with those using EPA LOS but no difference if the subcompetencies were weighted or if only the ACGME reported subcompetencies were used.In addition, these data provide additional validity evidence for both types of assessment.We were also able to develop equations to generate milestone levels based on EPA supervision ratings using the ACGME reported subcompetencies.This will help to address the time burden faced by educators while also allowing them the flexibility to use EPAs and milestones as appropriate in assessing their trainees and "developing "a program of assessment fit for purpose" [17].

Fig. 2
Fig. 2 Graph showing Spearman Rho correlations (95% confidence intervals) of EPA level of supervision ratings by the Clinical Competency Committee with unweighted (#1) and weighted (#2) composite score using all mapped milestones, unweighted using only the ACGME reported milestones (#3) and unweighted using milestones from when the fellowship program director was (#4) or was not (#5) a member of the Clinical Competency Committee.The last graph in each group (#6) shows the correlation of EPA level of supervision ratings made independently by the fellowship program director with milestones for when the program director was not a member of the clinical competency committee.Data are from the fall 2014 and spring 2015.Abbreviations: ACGME = Accreditation Council for Graduate Medical Education, CCC = Clinical Competency Committee, EPA = Entrustable professional activities, FPD = fellowship program director; LOS = level of supervision

Fig. 3
Fig. 3 ROC curves and area under the curve for spring 2015 data using unweighted Accreditation Council for Graduate Medical Education subcompetency milestone composite score to achieve EPA level of supervision ratings of 4 or 5 (solid line; black) or only 5 (dashed line; red).Ratings utilized data from all members of the Clinical Competency Committee.Area under the curve in black is based upon a rating of 4 or 5 while that in red used a rating of 5. Abbreviations: ROC = receive operating characteristics, AUC = area under the curve, EPA = Entrustable professional activities

Table 1
The six Common Pediatric Subspecialty EPAs evaluated in this study with their abbreviations and scales for EPA level of supervision

Table 2
Overall mean (95% CI) unweighted composite subcompetency milestone scores and EPA level of supervision ratings in fall 2014 and spring 2015 Abbreviations: EPA Entrustable professional activities, LOS level of supervision a p < 0.05 fall vs. spring for all EPAs

Table 3
Correlation [Rho (95% CI)] of the unweighted and weighted composite subcompetency milestone score with EPA level of supervision ratings in fall 2014 and spring 2015 using all milestone data, only ACGME reported milestones and whether the program director was or was not a member of Clinical Competency Committee a all values p < 0.001; b p < 0.05 compared with unweighted composite milestone score for programs in which the FPD is on the CCC; c p < 0.05 compared with unweighted composite milestone score with all members on CCC ACGME Accreditation Council for Graduate Medical Education, CCC Clinical Competency Committee, FPD program director, LOS level of supervision

Table 4
[42]tions used to determine subcompetency milestone level based upon EPA level of supervision ratings and whether all mapped subcompetency milestones were used in the model or only those reported to the ACGME a ACGME Accreditation Council for Graduate Medical Education, LOS level of supervision a Subcompetencies are those associated with the Pediatric Milestones;[42]subscripts indicate specific EPA