Skip to main content

Tools for measuring technical skills during gynaecologic surgery: a scoping review

Abstract

Background

Standardised assessment is key to structured surgical training. Currently, there is no consensus on which surgical assessment tool to use in live gynaecologic surgery. The purpose of this review is to identify assessment tools measuring technical skills in gynaecologic surgery and evaluate the measurement characteristics of each tool.

Method

We utilized the scoping review methodology and searched PubMed, Medline, Embase and Cochrane. Inclusion criteria were studies that analysed assessment tools in live gynaecologic surgery. Kane’s validity argument was applied to evaluate the assessment tools in the included studies.

Results

Eight studies out of the 544 identified fulfilled the inclusion criteria. The assessment tools were categorised as global rating scales, global and procedure rating scales combined, procedure-specific rating scales or as a non-procedure-specific error assessment tool.

Conclusion

This scoping review presents the current different tools for observational assessment of technical skills in intraoperative, gynaecologic surgery. This scoping review can serve as a guide for surgical educators who want to apply a scale or a specific tool in surgical assessment.

Peer Review reports

Background

Surgical training has been shown to improve surgical skills, and several assessment tools have been validated in both live surgical settings and in a simulated environment [1]. Superior surgical performance is directly related to improved patient outcomes and standard-setting methods can define benchmarks and ensures content validity [2, 3]. The value of assessing a surgeon’s competencies is indisputable, but requires a trained assessor and an objective structured assessment tool [4]. Currently, there is no consensus on which scales to use in surgical assessment in gynaecologic surgery.

More than 20 years ago, Van der Vleuten explored and described five criteria (reliability, validity, impact on future behaviour, acceptability, and costs) to take into consideration when choosing an assessment method in the clinical setting [5]. They remain highly relevant, but especially reliability and validity must be thoroughly tested when applying an assessment tool.

Both task-specific and global rating tools are widely used in a variety of specialties [6]. The tools use various scoring systems, e.g. binary checklists or anchors, such as a Likert scale. In general surgery, a number of task-specific checklists exist, but recent reviews showed a lack of validity and reliability [7, 8].

Implementation of objective assessment in clinical practice is difficult due to challenges on many levels: lack of time, lack of resources, and often also lack of knowledge on how and when to use an assessment tool. To overcome these challenges it is important that the chosen assessment tool is acceptable, feasible, valid, well-described, and easy to apply in a surgical setting [5].

Our aim was to identify assessment tools measuring technical skills during live gynaecologic surgery and to evaluate the measurement characteristics of the tools used in a clinical setting in the operating room.

In this review, the term assessment tool refers to a specific tool that assesses specific surgical competencies, whereas the word scale refers to a widely applicable assessment tool or a component of a specific tool.

Methods

We chose the scoping review methodology to characterise the quantity and quality of existing assessment scales [9]. Conducted in accordance with Arksey and O’Malley’s framework [10], the review was designed to cover all available literature on the topic, to summarise existing knowledge and to identify research gaps in the current literature. The underlying methodological framework comprised five consecutively linked stages (Table 1). Levac et al., has further developed Arksey and O’Malley’s approach in order to clarify and enhance the various stages and they recommend an optional sixth stage that involves consulting stakeholders [11]. The review is reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) [12].

Table 1 Six consecutively linked stages of underlying methodological framework

Search strategy

In accordance with stages one and two of Arksey and O’Malley’s methodology, we used a broad search strategy guided by an information specialist (PP). We identified keywords and created a search string (Appendix 1). On 1 of February 2020, four databases were searched covering 1989–2020: PubMed, Medline, Embase, and Cochrane. The search was updated on 8 January 2021.

We uploaded identified references into Covidence where two reviewers (LH, JH) screened the titles and abstracts and assessed the full texts in detail against the inclusion criteria [13]. The reference lists from the included studies were searched for additional studies.

Inclusion/exclusion criteria

Inclusion of studies was based on the population (types of participants), concept, and context framework [14]. We included studies with the following types of participants: novices, junior doctors/ residents, and/or experts, whose technical skills were assessed (concept) during live gynaecologic surgery (context). Studies that assessed the competence of medical students were not included.

We included studies in English with empirical data that were published in peer-reviewed journals with no restriction of study design or publication year. We excluded studies assessing surgical performance on animals and studies evaluated in a simulation setting.

Data synthesis

The measurement characteristics, i.e. validity, reliability, and validation context, were summarised for each type of assessment tool, see Table 2. Inspired by a recent systematic review by Hatala et al., we applied Kane’s validity argument, which comprises four inferences (Table 3) to evaluate the various assessment tools, see Table 4 [15, 16].

Table 2 Overview of studies identified for inclusion

Results

We found eight articles: one global rating scale, three global and procedure rating tools combined, three procedure-specific rating tools and one non-procedure-specific error assessment tool.

Figure 1 contains a flowchart of the reference search, and Table 2 presents an overview of study characteristics for the eight articles that met our inclusion criteria. Table 4 present the studies evaluated by Kane´s validity argument.

Fig. 1
figure1

Flow diagram

Table 3 Kane’s validity argument
Table 4 Studies evaluated by Kane’s validity argument

Global rating scale (N = 1)

Objective structured assessment of technical skills (OSATS)

Hiemstra et al. [17] evaluated OSATS to establish learning curves and analyzed the scores of nine trainees over a three-month period. Nineteen types of procedures were identified among the 319 procedures assessed. The item “knowledge of instruments” and “instrument handling” from the original OSATS was merged and resulted in six assessment items.

The trainees and the supervising consultant were instructed to fill out an OSATS assessment sheet after a performed procedure. The consultant would discuss the result with the trainee and provide constructive feedback. Within the six OSATS items scores range from 6 to 30 points, and a score of 24 was the selected threshold for good surgical performance in the absence of benchmark criteria.

To prove construct validity, the authors hypothesise that surgical performance improves over time, with increasing procedure-specific experience [17]. They found that performance improved 1.10 OSATS points per assessed procedure (p = 0.008, 95% confidence interval (CI) 0.44–1.77) and that the learning curve for a specific procedure passed the threshold of 24 points at a caseload of five procedures. Furthermore, a performance plateau was reached after performing eight of the same procedures.

Global and procedure-specific assessment tools combined (N = 3)

Vaginal surgical skills index (VSSI)

Chen et al. [18] Introduced Vaginal Surgical Skills Index (VSSI), a procedure-specific rating scale for evaluating surgeons while performing vaginal hysterectomies. They refined the original seven-item Global Rating Scale (GRS) to contain 13 items considered important for vaginal surgery.1 2

Twenty-seven trainees performed 76 surgeries in the study period. A supervisor assessed the trainee immediately after the surgery, using VSSI, GRS and a 100-mm visual analogue scale (VAS). VAS was included as an additional measure to furnish the assessor with a global impression of the trainees’ surgical skills. Construct validity was analysed by comparing VSSI and GRS scores, respectively. The procedure was videotaped, and to evaluate interrater reliability, a blinded reviewer assessed the performing surgeon using VSSI, GRS and VAS.

To evaluate intrarater reliability, the supervising surgeon watched and re-evaluated the video after 4 weeks. Internal consistency for VSSI and GRS was high (Cronbach’s alpha = 0.95–0-97). Inter- and intrarater reliability was ICC = 0.53 and ICC = 0.82, respectively.

Hopkins assessment of surgical competency (HASC)

Chou et al. developed and evaluated the Hopkins Assessment of Surgical Competency (HASC). The assessment form contains seven items from OSATS [25] and four items from an American Council on Resident Education in Obstetrics and Gynaecology toolbox. Another four items were included from an existing assessment tool at the author’s institution. After modifying the items by factor analysis, the results were two six-item scales; a General Surgical Skills and Case Specific Surgical Skills [19].

.Sixteen faculty physicians evaluated 16 trainees after performed surgery. The trainees also performed self-evaluation. The study analysed 362 cases, and the authors reported internal validity and reliability, demonstrated by high Cronbach’s alpha (>.80) and high Pearson correlation coefficients (> 0.80) for both scales. Discriminant validity was significant (p < 0.001) for both scales when comparing the performance of trainees in their second and fourth year of training.

Objective structured assessment of laparoscopic salpingectomy (OSALS)

Larsen et al. [20] developed and evaluated Objective Structured Assessment of Laparoscopic Salpingectomy (OSALS) a general rating scale and a case-specific scale, where three of the case-specific items are directly related to the procedure evaluated: laparoscopic salpingectomy. Two independent observers used the OSALS chart for assessment of 21 unedited video recordings of 21 laparoscopic salpingectomies, performed by 21 different surgeons grouped as either novices, intermediate or experts. The median score in each group proved construct validity and OSALS was able to discriminate between all groups (p < 0.03). The study found a wide performance range in the expert group and a narrow performance range in the novice group. Interrater reliability was reported acceptable.

Task-specific assessment tools (N = 3)

Robotic hysterectomy assessment score (RHAS)

Frederick et al. [21] developed and evaluated Robotic Hysterectomy Assessment Score (RHAS). The assessment tool consists of six surgical domains on a five-point Likert scale, each domain subdivided into specific tasks with a possible maximum score of 80. Delphi methodology was used for content validation. The participating surgeons were grouped as experts, advanced beginners or novices. A blinded expert-reviewer evaluated fifty-two recorded procedures. Interrater reliability was acceptable and RHAS demonstrated construct validity, as it was able to differentiate between experts, advanced beginners and novices.

Competence assessment tool for laparoscopic Supracervical hysterectomy (CAT-LSH)

Goderstad et al. [22] developed Competence Assessment Tool for Laparoscopic Supracervical Hysterectomy (CAT-LSH), a procedure-specific rating tool for laparoscopic supracervical hysterectomy and compared it with Global Operative Assessment of Laparoscopic Skills (GOALS) [26], a general rating scale. GOALS has been validated for laparoscopic ventral hernia repair, laparoscopic appendectomy, laparoscopic inguinal hernia and laparoscopic and open cholecystectomy [7].

Three experts reached content validity by defining the main steps of the hysterectomy procedure. Each step evaluates the use of instruments, tissue handling and errors, with a maximum of 16 points assigned per step for a total possible score of 64. Twenty-one participants, grouped as either inexperienced, intermediate experienced or expert surgeons, performed 37 procedures eligible for assessment. The procedure was recorded, and the performing surgeon was assessed by both the operating assistant and by two blinded reviewers.

CAT-LSH proved significant discriminative validity as both the assistant surgeon and the blinded reviewers were able to discriminate between inexperienced, intermediate experienced and experts surgeons, respectively. The comparison, GOALS allowed blinded observers to differentiate between inexperienced and intermediate experienced surgeons, but not between intermediate experienced surgeons compared to expert surgeons (p = 0.085). When performed by the assistant surgeon, GOALS assessment differed significantly between the three groups. Acceptable interrater reliability was reported.

A feasible rating scale for formative and summative feedback

Savran et al. used Delphi methodology to develop the most recent procedure-specific rating scale, a feasible rating scale for formative and summative feedback [23]. The scale comprises 12 items evaluated on a five-point Likert scale. Messick’s framework was used to measure the validity evidence. Grouped as beginners (had performed < 10 procedures) or experienced surgeons (had performed > 200 procedures), 16 surgeons performed 16 laparoscopic hysterectomies. The procedure was video recorded and analysed by two blinded reviewers. Construct validity was demonstrated by significantly different mean scores of the two groups. High interrater reliability was found for both one and two raters.

Non-procedure-specific error assessment (N = 1)

Generic error rating tool (GERT)

Husslein et al. [24] were the first to test a non-procedure-specific error assessment tool. The Generic Error Rating Tool (GERT) uses a Likert scale with nine anchors and is designed to analyse technical errors and resulting events during laparoscopy. GERT is based on the inverse relationship between surgeon and skill, i.e. more skilled surgeons make fewer errors.

Technical errors are defined as “the failure of planned actions to achieve their desired goal” and an event as “an action that may require additional measures to avoid an adverse outcome” [24]. The GERT technical error analysis comprises nine generic surgical tasks during which errors can occur. Each of these generic task groups is subdivided into four distinct error modes. To assess error distribution the procedures were divided into four sub-steps.

Two blinded reviewers analysed twenty video recordings of total laparoscopic hysterectomies, and correlation analyses were performed between GERT and OSATS. GERT scores were used to establish a measure of technical skills and to divide surgeons into two groups as either high or low performers. A significant negative correlation between OSATS and GERT scores were demonstrated. The total number of errors increased by increasing OSATS scores. Group comparison showed that high performers made significantly fewer technical errors than low performers.

By analysing the different operative sub-steps, the study detected procedures more prone to technical errors.

Discussion

There is a need for robust validated tools across different measurement properties in order to aid surgical educators in selecting the appropriate tool for assessment. This scoping review identified eight technical assessment tools validated in live gynaecologic surgery. The studies, which have different validity strengths according to Kane’s framework, present a variety of challenges.

Currently, the most widely used and validated assessment scale is OSATS [25], which originally consisted of a task-specific checklist and a global rating scale. The global rating scale component have shown to have high reliability and validity and to be applicable at various trainee levels and for a variety of surgical procedures [27].

Hiemstra et al. [17] tested the OSATS intraoperatively to establish learning curves for each trainee by direct supervision or self-assessment. As expected, learning curves were established but the authors identified enormous variation in assessors’ OSATS scores, and the trainees reported a lack of objectivity in the assessment tool.

Lack of objectivity is an important limitation of the OSATS. According to Kane’s validity argument, it does not meet the extrapolation criteria, which implies that the scores cannot be used to reflect on real-world performance.7 Furthermore, the general rating scale seems to have a ceiling effect in terms of not being able to discriminate competencies for senior surgeons [28].

VSSI [18] was developed as a procedure-specific rating tool to assess surgeons while performing vaginal hysterectomies. Jelevsky et al. have further validated VSSI in a study where it was compared with GBS [29].

Interestingly, the 13 items in VVSI are not procedure specific and can be applied to all laparoscopic surgery, e.g. general surgery.

This transfer of general competencies to a specific rating tool did not prove to be appropriate. Importantly, the authors focus on case mix, where a specific (patient) characteristic is known to potentially effect (surgical) outcome. A recent review on case-mix variables and predictors for outcomes of laparoscopic hysterectomy showed that body mass index, previous operations, adhesions and age were predominate case-mix characteristics [30]. This knowledge on case mix is important when choosing a surgical case for assessment.

Chou et al. modified an existing global rating scale by adding procedure specific items to develop HASC [19], which targets gynaecologic trainees and aims to evaluate all surgical competencies in gynaecologic surgery. This procedure-specific rating tool is applicable to all types of laparoscopic surgery. The generalizability and lack of a task-specific checklist makes HASC applicable to other gynaecologic programmes. To our knowledge, this applicability has not been demonstrated in other validated studies. The study was not blinded, only trainees were tested and data were collected for all types of surgical procedures, lowering the strength of the study.

The OSALS rating tools is incorporated in the Danish curriculum for assessment of OBGYN trainees [20, 31]. .It comprises five general and five task-specific items and was developed and validated in a blinded study [20]. Larsen et al. found a wide performance range in the expert group and a narrow performance range in the novice group. This could be explained by case mix and by the fact that the OSALS rating tools is difficult to use to assess expert. The study is limited by a small sample size.

A disadvantage of video evaluation is the time-consuming nature, but Larsen et al. underlined it as a strength for the objective assessment, an assertion supported by Langermann et al. [32] They argue that video recording of surgery enhances and supports surgical training and can be performed equally good by doctors with different experience [32, 33].

Six of the included studies used video recording and blinded observers when evaluating the surgeon’s performance [18, 21,22,23,24, 31]. All the studies found significant discriminative validity, demonstrating that the assessor can differentiate between novices, advanced beginners and experts. This indicates that video-recorded assessment is a good choice when validating an assessment tool, but as it is time-consuming, it may not be an obvious choice for implementation in daily clinical practice [34].

Even though the development of content validity for procedure-specific assessment tools requires the use of Delphi methodology [35] it was only done by Frederick et al. [21] They discussed the potentially confounding variable of the attending physician providing direct supervision and guidance when evaluating a novice surgeon. This may account for why novice surgeon’s scores did not differ more relative to their more experienced colleagues. Case mix may also explain this lack of difference in scores.

RHAS [21] demonstrated both construct and discriminative validity and appeared to be feasible. The assessed surgical skills can be applied to hysterectomies performed either laparoscopically or abdominally, as the basic steps in the procedure are identical.

The procedure-specific rating tool CAT-LSH was superior in terms of discriminative validity compared to the validated tool Global Operative Assessment of Laparoscopic Skills (GOALS) [26] used for laparoscopy. Goderstad et al. explain this by CAT-LSH being more detailed for each step of the procedure compared to GOALS. Frederick et al. who developed RHAS supports this finding [21]. The CAT-LSH study revealed rater bias, as that the operating assistant gave a higher total score than the blinded reviewer, both in terms of GOALS and CAT-LSH in all three groups. This phenomenon has also been demonstrated by Goff et al., who validated OSATS in a simulation setting [36].

Even though GOALS [26] is used as a comparison in the CAT-LSH study [22], the general rating scale has never been tested and validated in a live, gynaecologic surgical setting. Interestingly, that is also the case for the most widely used global assessment scale, OSATS. GOALS was developed and validated to assess and intraoperative laparoscopic skills in general surgery [26] and OSATS was originally validated for general surgery in a simulation setting [25]. A comprehensive study by Hatala et al. [16] thoroughly analysed the validity evidence for OSATS in a simulating setting and a recent study has developed an H-OSATS, an objective scale specific for the assessment of technical skills for laparoscopic hysterectomy [37]. It showed feasibility and validity but was only tested in a simulation setting. The global rating scale must still be validated in intraoperative gynaecologic surgery before it can be used as a validated comparison.

Husslein et al. examined GERT [24] which was able to significantly discriminate between low and high performers by analysing errors. The study identified procedures more prone to technical errors, which is important knowledge when determining the focus of a procedure-specific assessment tool and how detailed each procedural step should be evaluated. The study is limited by a small sample size.

A systematic review by Ahmed et al. [38] concluded that a combination of global and task-specific assessment tools appears to be the most comprehensive solution for observational assessment of technical skills. This is supported by findings in the RHAS, CAT-LSH and OSALS. These scales all consist of a general and procedure-specific checklist and have been validated in studies with relatively strong methodology. It has been shown in a simulation setting that evaluation of a clinical competence solely using a procedure-specific checklist, does not preclude incompetence in terms of technical ability and safety [39]. Identifying safety issues requires the inclusion of assessment using a global rating scale. By adding GERT the operative substeps prone to errors can be identified.

Savran et al. [23] asserted that their assessment tool met the criteria for summative assessment, using the contrasting group method to set a pass/fail score. Similar to most studies in our scoping review, the authors grouped surgeons according to surgical load, and experienced or expert surgeons were defined according to the number of cases performed. This is not an objective measure of competency, just as a pre-set standard must exist in order to establish summative assessment [4].

Focused on formative feedback, high-stakes assessment and programme evaluation, Hatala et al. [16] used Kane’s framework to evaluate OSATS and found reasonable evidence in terms of scoring and extrapolation for formative and high-stakes assessment. For programme assessment, there was validity evidence for generalisation and extrapolation but a complete lack of evidence regarding implications and decisions based on OSATS scores. This calls for more research.

The majority of surgical assessment scales are validated in a simulation setting, where the cognitive and communicative mechanisms of action are less complex compared to the operating room setting. An assessment scale validated in a simulated setting can hence not be transferred to live surgery, especially not in terms of summative assessment.

Limitations

This scoping review has some limitations. First, it is limited by only including studies in English. We included peer review published literature that did include grey literature. Also, it is a limitation that the majority of the included studies are small, only conducted once and without a comparison group. There is a need for larger, randomised studies to evaluate their validity before they are enroled in gynaecological curricula or used for summative assessment.

Conclusion

We identified eight tools measuring technical skills during gynaecologic surgery, all of which depend on the user context, with varying validity frameworks. A combination of global and task-specific assessment tools with a focus on operative sub-steps prone to errors may by an approach when assessing surgical competencies in gynaecology.

This scoping review can serve as a guide for surgical educators who wish to evaluate surgical assessment.

Availability of data and materials

Not applicable.

Abbreviations

PRISMA-ScR:

Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews

CI:

Confidence interval

ICC:

Intraclass correlation coefficient

OSATS:

Objective Structured Assessment of Technical Skills

VSSI:

Vaginal Surgical Skills Index

GRS:

Global Rating Scale

HASC:

Hopkins Assessment of Surgical Competency

OSALS:

Objective Structured Assessment of Laparoscopic Salpingectomy

RHAS:

Robotic Hysterectomy Assessment Score

CAT-LSH:

Competence Assessment Tool for Laparoscopic Supracervical Hysterectomy

GOALS:

Global Operative Assessment of Laparoscopic Skills

GERT:

Generic Error Rating Tool

References

  1. 1.

    Cook DA, Brydges R, Zendejas B, Hamstra SJ, Hatala R. Technology-enhanced simulation to assess health professionals: a systematic review of validity evidence, research methods, and reporting quality. Acad Med. 2013;88(6):872–83. https://doi.org/10.1097/ACM.0b013e31828ffdcf.

    Article  Google Scholar 

  2. 2.

    Fecso AB, Szasz P, Kerezov G, Grantcharov TP. The effect of technical performance on patient outcomes in surgery. Ann Surg. 2017;265(3):492–501. https://doi.org/10.1097/SLA.0000000000001959.

    Article  Google Scholar 

  3. 3.

    Goldenberg MG, Garbens A, Szasz P, Hauer T, Grantcharov TP. Systematic review to establish absolute standards for technical performance in surgery. Br J Surg. 2017;104(1):13–21. https://doi.org/10.1002/bjs.10313.

    Article  Google Scholar 

  4. 4.

    Wass V, Van Der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. Lancet. 2001;357(9260):945–9. https://doi.org/10.1016/S0140-6736(00)04221-5.

    Article  Google Scholar 

  5. 5.

    Van Der Vleuten C. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ. 1996;1(1):41–67. https://doi.org/10.1007/BF00596229.

    Article  Google Scholar 

  6. 6.

    Vaidya A, Aydin A, Ridgley J, Raison N, Dasgupta P, Ahmed K. Current status of technical skills assessment tools in surgery: a systematic review. J Surg Res. 2020;246:342–78. https://doi.org/10.1016/j.jss.2019.09.006.

    Article  Google Scholar 

  7. 7.

    Ghaderi I, Manji F, Soo Park Y, Juul D, Ott M, Harris I, et al. Technical skills assessment toolbox a review using the unitary framework of validity. Ann Surg. 2015;261(2):251–62. https://doi.org/10.1097/SLA.0000000000000520.

    Article  Google Scholar 

  8. 8.

    Jelovsek JE, Kow N, Diwadkar GB. Tools for the direct observation and assessment of psychomotor skills in medical trainees: a systematic review. Med Educ. 2013;47(7):650–73. https://doi.org/10.1111/medu.12220.

    Article  Google Scholar 

  9. 9.

    Grant MJ, Booth A. A typology of reviews: an analysis of 14 review types and associated methodologies. Health Inf Libr J. 2009;26(2):91–108. https://doi.org/10.1111/j.1471-1842.2009.00848.x.

    Article  Google Scholar 

  10. 10.

    Arksey H. O ‘malley L. scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8(1):19–32. https://doi.org/10.1080/1364557032000119616.

    Article  Google Scholar 

  11. 11.

    Levac D, Colquhoun H, O’Brien KK. Scoping studes: advancing the methodology. Implement Sci. 2010;5(69):1–9.

    Google Scholar 

  12. 12.

    Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73. https://doi.org/10.7326/M18-0850.

    Article  Google Scholar 

  13. 13.

    Covidence systematic review software Melbourne, Australia: Veritas Health Innovation. Available from: www.covidence.org.

  14. 14.

    Peters M, Godfrey C, McInerney P, Baldini Soares C, Khalil HPD. In: MZ AE, editor. TJBI 2017 Scoping Reviews. Joanna Briggs Institute Reviewer’s Manual; 2017. p. Chapter 11.

    Google Scholar 

  15. 15.

    Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: A practical guide to Kane’s framework. Med Educ. 2015;2015(49):560–75.

    Article  Google Scholar 

  16. 16.

    Hatala R, Cook DA, Brydges R, Hawkins R. Constructing a validity argument for the objective structured assessment of technical skills (OSATS): a systematic review of validity evidence. Adv Health Sci Educ. 2015;20(5):1149–75. https://doi.org/10.1007/s10459-015-9593-1.

    Article  Google Scholar 

  17. 17.

    Hiemstra E, Kolkman W, Wolterbeek R, Trimbos B, Jansen FW. Value of an objective assessment tool in the operating room. Can J Surg. 2011;54(2):116–22. https://doi.org/10.1503/cjs.032909.

    Article  Google Scholar 

  18. 18.

    Chen CCG, Korn A, Klingele C, Barber MD, Paraiso MFR, Walters MD, et al. Objective assessment of vaginal surgical skills. Am J Obstet Gynecol. 2010;203:79.e1–8.

    Article  Google Scholar 

  19. 19.

    Chou B, Bowen CW, Handa VL. Evaluating the competency of gynecology residents in the operating room: validation of a new assessment tool. Am J Obstet Gynecol. 2008;199:571.e1–5.

    Article  Google Scholar 

  20. 20.

    Larsen CR, Grantcharov T, Schouenborg L, Ottosen C, Soerensen JL, Ottesen B. Objective assessment of surgical competence in gynaecological laparoscopy: development and validation of a procedure-specific rating scale. BJOG An Int J Obstet Gynaecol. 2008;115(7):908–16. https://doi.org/10.1111/j.1471-0528.2008.01732.x.

    Article  Google Scholar 

  21. 21.

    Frederick PJ, Szender JB, Hussein AA, Kesterson JP, Shelton JA, Anderson TL, et al. Surgical competency for robot-assisted hysterectomy: development and validation of a robotic hysterectomy assessment score (RHAS). J Minim Invasive Gynecol. 2017;24:55–61.

    Article  Google Scholar 

  22. 22.

    Goderstad JM, Sandvik L, Fosse E, Lieng M. Assessment of surgical competence: development and validation of rating scales used for laparoscopic supracervical hysterectomy. J Surg Educ. 2016;73(4):600–8. https://doi.org/10.1016/j.jsurg.2016.01.001.

    Article  Google Scholar 

  23. 23.

    Savran MM, Hoffmann E, Konge L, Ottosen C, Larsen CR. Objective assessment of total laparoscopic hysterectomy: development and validation of a feasible rating scale for formative and summative feedback. Eur J Obstet Gynecol Reprod Biol. 2019;237:74–8. https://doi.org/10.1016/j.ejogrb.2019.04.011.

    Article  Google Scholar 

  24. 24.

    Husslein H, Shirreff L, Shore EM, Lefebvre GG, Grantcharov TP. The generic error rating tool: a novel approach to assessment of performance and surgical education in gynecologic laparoscopy. J Surg Educ. 2015;72(6):1259–65. https://doi.org/10.1016/j.jsurg.2015.04.029.

    Article  Google Scholar 

  25. 25.

    Martin JA, Regehr G, Reznick R, Macrae H, Murnaghan J, Hutchison C, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84(2):273–8. https://doi.org/10.1046/j.1365-2168.1997.02502.x.

    Article  Google Scholar 

  26. 26.

    Vassiliou MC, Feldman LS, Andrew CG, Bergman S, Leffondré K, Stanbridge D, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg. 2005;190(1):107–13. https://doi.org/10.1016/j.amjsurg.2005.04.004.

    Article  Google Scholar 

  27. 27.

    Szasz P, Louridas M, Harris KA, Aggarwal R, Grantcharov TP. Assessing technical competence in surgical trainees: a systematic review. Ann Surg. 2015;261(6):1046–55. https://doi.org/10.1097/SLA.0000000000000866.

    Article  Google Scholar 

  28. 28.

    Munz Y, Moorthy K, Bann S, Shah J, Ivanova S, Darzi SA. Ceiling effect in technical skills of surgical residents. Am J Surg. 2004 Sep;188(3):294–300. https://doi.org/10.1016/j.amjsurg.2004.02.006.

    Article  Google Scholar 

  29. 29.

    Jelovsek JE, Walters MD, Korn A, Klingele C, Zite N, Ridgeway B, et al. Establishing cutoff scores on assessments of surgical skills to determine surgical competence. Am J Obstet Gynecol. 2010;203(1):81.e1–6.

    Article  Google Scholar 

  30. 30.

    Driessen SRC, Sandberg EM, Chapelle CF, Twijnstra ARH, Rhemrev JPT, Jansen FW. Case-mix variables and predictors for outcomes of laparoscopic hysterectomy : a systematic review. J Minim Invasive Gynecol. 2016;23(3):317–30. https://doi.org/10.1016/j.jmig.2015.11.008.

    Article  Google Scholar 

  31. 31.

    Strandbygaard J, Bjerrum F, Maagaard M, Rifbjerg Larsen C, Ottesen B, Sorensen JL. A structured four-step curriculum in basic laparoscopy: development and validation. Acta Obstet Gynecol Scand. 2014;93(4):359–66. https://doi.org/10.1111/aogs.12330.

    Article  Google Scholar 

  32. 32.

    Langerman A, Grantcharov TP. Are we ready for our close-up? Ann Surg. 2017;266(6):934–6. https://doi.org/10.1097/SLA.0000000000002232.

    Article  Google Scholar 

  33. 33.

    Oestergaard J, Larsen CR, Maagaard M, Grantcharov T, Ottesen B, Sorensen JL. Can both residents and chief physicians assess surgical skills? Surg Endosc. 2012;26(7):2054–60. https://doi.org/10.1007/s00464-012-2155-1.

    Article  Google Scholar 

  34. 34.

    Strandbygaard J, Scheele F, Sørensen JL. Twelve tips for assessing surgical performance and use of technical assessment scales. Med Teach. 2017;39(1):32–7. https://doi.org/10.1080/0142159X.2016.1231911.

    Article  Google Scholar 

  35. 35.

    Humphrey-Murto S, Varpio L, Gonsalves C, Wood TJ. Using consensus group methods such as Delphi and nominal group in medical education research*. Med Teach. 2017 Jan 2;39(1):14–9. https://doi.org/10.1080/0142159X.2017.1245856.

    Article  Google Scholar 

  36. 36.

    Goff BA, Nielsen PE, Lentz GM, Chow GE, Chalmers RW, Fenner D, et al. Surgical skills assessment: a blinded examination of obstetrics and gynecology residents. Am J Obstet Gynecol. 2002;186(4):613–7. https://doi.org/10.1067/mob.2002.122145.

    Article  Google Scholar 

  37. 37.

    Knight S, Aggarwal R, Agostini A, Loundou A, Berdah S, Crochet P. Development of an objective assessment tool for total laparoscopic hysterectomy: a Delphi method among experts and evaluation on a virtual reality simulator. PLoS One. 2018;13(1):1–14.

    Article  Google Scholar 

  38. 38.

    Ahmed K, Miskovic D, Darzi A, Athanasiou T, Hanna GB. Observational tools for assessment of procedural skills: a systematic review. Am J Surg. 2011;202(4):469–80. https://doi.org/10.1016/j.amjsurg.2010.10.020.

    Article  Google Scholar 

  39. 39.

    Ma IWY, Zalunardo N, Pachev G, Beran T, Brown M, Hatala R, et al. Comparing the use of global rating scale with checklists for the assessment of central venous catheterization skills using simulation. Adv Health Sci Educ. 2012;17(4):457–70. https://doi.org/10.1007/s10459-011-9322-3.

    Article  Google Scholar 

Download references

Acknowledgements

Pernille Pless (PP), information specialist, who guided the search strategy.

Funding

None.

Author information

Affiliations

Authors

Contributions

Louise Inkeri Hennings (LIH), Jette Led Sørensen (JLS) Jane Hybscmann-Nielsen (JH) and Jeanett Strandbygaard (JS) contributed to the authorship. LIH conceived the study. LIH, JH and JS constructed the scoping review with input from JLS. LIH and JH performed searches, screening and data extraction. LIH, JH and JS analysed the data. LIH was responsible for writing the first draft and all authors contributed to finalising the manuscript. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Louise Inkeri Hennings.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

None.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

Search strings (((((“Gynecologic Surgical Procedures”[Mesh]) OR “Gynecology”[Majr])) AND ((“Surgical Procedures, Operative”[Mesh]) OR “General Surgery”[Majr])) AND (((tool) OR instrument) OR scale)) AND assessment.

We searched four databases (PubMed, Medline, Embase and Cochrane) using the free text terms gynaecology OR general surgery AND tool OR instrument OR Scale OR instrument. We combined these with non-Medical Subject Headings (MeSH) words “gynaecologic surgical procedures” and “operative surgical procedures”. The Cochrane database was searched for reviews on the subject; none were found. No conference abstracts were found.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hennings, L.I., Sørensen, J.L., Hybscmann, J. et al. Tools for measuring technical skills during gynaecologic surgery: a scoping review. BMC Med Educ 21, 402 (2021). https://doi.org/10.1186/s12909-021-02790-w

Download citation

Keywords

  • Assessment
  • Assessment tool
  • Gynaecology
  • Surgery