A new tool for assessing short debriefings after immersive simulation: validity of the SHORT scale

Background Simulation is being increasingly used worldwide in healthcare education. However, it is costly both in terms of finances and human resources. As a consequence, several institutions have designed programs offering several short immersive simulation sessions, each followed by short debriefings. Although debriefing is recommended, no tool exists to assess appropriateness of short debriefings after such simulation sessions. We have developed the Simulation in Healthcare retrOaction Rating Tool (SHORT) to assess short debriefings, and provide some validity evidence for its use. Methods We designed this scale based on our experience and previously published instruments, and tested it by assessing short debriefings of simulation sessions offered to emergency medicine residents at Laval University (Canada) from 2015 to 2016. Analysis of its reliability and validity was done using Standards for educational and psychological testing. Generalizability theory was used for testing internal structure evidence for validity. Results Two raters independently assessed 22 filmed short debriefings. Mean debriefing length was 10:35 (min 7:21; max 14:32). Calculated generalizability (reliability) coefficients are φ = 0.80 and φ-λ3 = 0.82. The generalizability coefficient for a single rater assessing three debriefings is φ = 0.84. Conclusions The G study shows a high generalizability coefficient (φ ≥ 0.80), which demonstrates a high reliability. The response process evidence for validity provides evidence that no errors were associated with using the instrument. Further studies should be done to demonstrate validity of the English version of the instrument and to validate its use by novice raters trained in the use of the SHORT. Electronic supplementary material The online version of this article (10.1186/s12909-019-1503-4) contains supplementary material, which is available to authorized users.

SCORE (see holistic rating below) 1/5 2/5 3/5 4/5 5/5 DESCRIPTION Harmful Neutral Must improve Could improve Expert Item 1: DEBRIEFING ENVIRONMENT (fosters a safe and effective debriefing environment with a positive tone) 1 2 3 4 5 Shows no respect for the learners and their emotional security Shows little respect for the learners OR little concern for their emotional safety Shows respect for the learners AND concern for their emotional safety Shows respect for the learners AND concern for their emotional safety Shows respect for the learners AND concern for their emotional safety Does NOT help in reducing the learners' tension and stress Helps slightly in reducing the learners' tension and stress Helps slightly in reducing the learners' tension and stress Helps moderately in reducing the learners' tension and stress Helps effectively in reducing the learners' tension and stress COMMENTS:

RATER'S GUIDE TO USE THE SHORT SCALE
The SHORT instrument includes a global rating scale composed of 5 items and a holistic expert evaluation.
Each of the 5 items is scored on a scale from 1 (harmful) to 5 (expert), using specific cues provided to guide scoring. The cues for each item are placed on separate lines based on their topic or theme. For each item, rate each line individually, then rate the item. If the lines of a single item give different scores, the highest score should be adopted for the item. However, for some items, specific lines should weigh more in the final item score; such weighing is described in the rating guide.
The holistic evaluation is rated on a scale from 1 to 5 as follows. Please note that the holistic evaluation should not be based on the mean score of the five preceding items.
-"1 = harmful", i.e. undoes learning, or harms the credibility of the training or of the simulation modality; -"2 = neutral", i.e. learners gain no benefit from the debriefing, or the debriefer does not make the simulation modality relevant; -"3 = must improve", i.e. the debriefer encourages learning, but does not allow the simulation modality to be used at its optimal capacity; -"4 = could improve", i.e. the debriefer encourages learning significantly and allows the simulation modality to be used at its optimal capacity; -"5 = expert", i.e. the debriefer encourages learning significantly and could be cited as an example, or the debriefer could train other debriefers.

EXAMPLE:
The cue lines help the rater to score the debriefer as accurately as possible. Each cue line should be rated independently, then the item should be scored based on the full lines that delineate successive cues: In case of large discrepancies between different cue line scores for an item, a mean score should be averaged, except in items where the cue lines are weighed differently, as specified in the individual item descriptions below.
A score of 3 or less on any item or on the holistic evaluation should prompt consideration of providing additional training on relevant aspects of the debriefer's performance. Shows little respect for the learners OR little concern for their emotional safety Shows respect for the learners AND concern for their emotional safety Shows respect for the learners AND concern for their emotional safety Shows respect for the learners AND concern for their emotional safety

Weighing
In case of a rating contradiction between the three lines, the presence of a summary should be a priority, then the presence of an efficient emotional phase and then targeted learning outcomes. The presence of an effective summary that allows learning transfer to occur is a hallmark of debriefing expertise.