Skip to main content

Table 4 Studies evaluated by Kane’s validity argument

From: Tools for measuring technical skills during gynaecologic surgery: a scoping review

Assessment tool Scoring Generalisation Extrapolation
Objective Structured Assessment of technical Skills (OSATS) [17]. Comparison of OSATS scores over time. Not reported Construct validity was demonstrated as a significant rise in score with increasing caseload as 1.10 OSATS point per assessed procedure (p = 0.008, 95% CI 0.44–1.77)
Vaginal Surgical Skills Index (VSSI) [18]. Comparing GRS and VSSI. A visual analogue scale was added for overall performance. Internal consistency for the VSSI and GRS = (Cronbach’s alpha (0.95–0.97)) Interrater reliability = 0.53 and intrarater reliability = 0.82 Construct validity was evaluated by measuring convergent validity using Pearson correlation coefficient (r) (VSSI = 0.64, p = 0.01, 95% CI 0.53–0.73) (GRS = 0.51, p = 0.001, 95% CI 0.40–0.61) and showed the ability to discriminate training levels by VSSI scores.
Hopkins Assessment of Surgical Competency (HASC) [19]. Surgeons rated on general surgical skills and case-specific surgical skills. No comparison. Internal consistency reliability of the items using Cronbach’s alpha = 0.80 (p < 0.001) Discriminative validity for inexperienced vs intermediate surgeons (p < 0.001)
Objective Structured Assessment of Laparoscopic Salpingectomy (OSA-LS) [20]. Surgeons rated by OSA-LS. No Comparison. Interrater reliability =0.831. Intrarater reliability not reported. Discriminative validity for inexperienced vs intermediate surgeon’s vs experienced surgeons (p < 0.03)
Robotic Hysterectomy Assessment Score (RHAS) [21]. Surgeons rated by expert viewers using RHAS. No Comparison, Interrater reliability for total domain score = 0.600 (p < 0.001). Intrarater reliability not reported. Discriminative validity for experts, advanced beginners and novice in all domains except vaginal cuff closure (p = 0.006).
Competence Assessment for Laparoscopic Supracervical Hysterectomy (CAT-LSH) [22]. Comparing GOALS and CAT-LSH Interrater reliability = 0.75
Intrarater reliability not reported.
Discriminative validity for inexperienced vs intermediate (p < 0.001) and intermediate vs experts (p < 0.001) assessed by assistant surgeon. For blinded reviewers discriminative validity for inexperienced vs intermediate (p < 0.006) and intermediate vs experts (p < 0.011).
Feasible rating scale for formative and summative feedback [23]. Surgeons rated by expert viewers using 12-item procedure-specific checklist Interrater reliability =0.996 for one rater and 0.0998 for two raters. Intrarater reliability not reported. Discriminative validity for beginners and experienced surgeons (p = < 0.001)
GERT = Generic Error Rating Tool [24]. Comparing OSATS and GERT Interrater reliability = > 0.95)
Intrarater reliability = > 0.95)
Significant negative correlation between OSATS and GERT scores (rater 1: Spearman = − 0.76, (p < 0.001); rater 2 = − 0.88, (p < 0.001)