Study design and participants
A multicentre parallel feasibility RCT was conducted across three UK medical schools: A, B and C. We followed the CONSORT statement for reporting pilot or feasibility trials [25]. Eligible participants were final year undergraduate medical students. The curricula of the medical schools varied. Schools A and B implemented a traditional integrated/systems-based curriculum. School C followed a problem-based learning (PBL) curriculum. Ethical approval was gained from participating medical schools. Participants were recruited from March 2017–February 2018 in two cohorts. Cohort one was recruited after final examinations in April–July 2017, through advertisements in faculty newsletters and lecture ‘shout outs’. Cohort two was recruited prior to final examinations in October 2017–February 2018. School C students were only recruited in cohort two. Cohort two were invited to participate through the faculty online learning management platforms (e.g. Moodle), advertisements on social media, faculty newsletters, and lecture ‘shout outs’. As this was a feasibility trial, a sample size calculation was not required.
Outcomes
Feasibility and acceptability
Feasibility was measured by assessing student uptake by school and cohort. Acceptability was measured by retention rates and a survey adapted from previous studies, consisting of six statements on the perceptions of eCREST [26, 27].
Clinical reasoning outcome measures
Clinical reasoning was measured using the Flexibility in Thinking (FIT) scale of the Diagnostic Thinking Inventory (DTI), which is a self-reported measure [28]. The FIT (21 items) measures thought processes used in the diagnostic process, including the ability to generate new ideas, understand alternative outcomes and self-reflect. Higher scores on the FIT sub-scale are indicative of better clinical reasoning skills. The sub-scale has demonstrated validity to detect differences between student and professional reasoning. The internal consistency and test re-test reliability were acceptable [28, 29].
Clinical reasoning was also measured using an observed measure of clinical reasoning by using data from an additional eCREST patient case that students received 1 month after baseline. This measure comprised indicators of three cognitive biases that eCREST sought to influence: the unpacking principle, confirmation bias and anchoring. These were identified by previous clinical reasoning research [24, 30, 31]. The unpacking principle refers to the tendency to not elicit the necessary information to make an informed judgement. Confirmation bias is when a clinician only seeks information to confirm their hypothesis. Anchoring occurs when clinicians stick to an initial hypothesis despite contradictory information [32]. eCREST prompts students to reflect throughout a consultation and provides feedback that enables them to reflect on their performance afterwards [33]. By reflecting, students would be more likely to attend to evidence inconsistent with their hypotheses and consider alternatives, thereby reducing the chance of confirmation bias and anchoring. Reflection also encourages students to explore their hypotheses thoroughly, ensuring that they elicit relevant information from patients, reducing the effect of the unpacking principle [33, 34].
The observed measure assessed ‘essential information identified’ by measuring the proportion of essential questions and examinations asked, out of all possible essential examinations and questions identified by experts. This aimed to detect the influence of the unpacking principle on reasoning, as it captured whether the students elicited enough essential information to make an appropriate decision. The ‘relevance of history taking’ was measured by assessing the proportion of all relevant questions and examinations asked, out of the total questions and examinations asked by the student. This aimed to detect susceptibility to confirmation bias by capturing whether they sought relevant information. Finally, it measured ‘flexibility in diagnoses’ by counting the number of times students changed their diagnosis. This reflected how susceptible students were to anchoring, by measuring their willingness to change their initial differential diagnosis. All measures were developed by RP and three clinicians (PS, SG & JT). The content validity of the observed measure of clinical reasoning was tested with two clinicians (SM, JH).
Diagnostic choice
Diagnostic choice was captured in the additional patient case. Selection of the most important diagnosis that the student should not have missed was used to assess how well the observed measure of reasoning predicted diagnostic choice.
Knowledge
Relevant medical knowledge was measured by 12 single best answer multiple choice questions (MCQs). We hypothesised that greater knowledge is associated with better clinical reasoning skills, consistent with the literature [4, 35]. The MCQs were developed by clinicians (NK, SM, JH & PS) in consultation with other clinicians.
Procedure
The trial procedure is outlined in Fig. 1, which shows how and when data from participants were collected. To address ethical concerns the information sheet made it clear to students that: participation in the trial was voluntary, they could withdraw at any stage, participation would not impact upon their summative assessments and only anonymised aggregate data would be shared. Students who provided written consent online were allocated to intervention or control groups using simple randomisation. Researchers were blind to allocation, completed by a computer algorithm. Randomisation was not precisely 1:1, as five students were mistakenly automatically allocated to the intervention group. The intervention group received three video patient cases in eCREST, all presenting with respiratory or related symptoms to their primary care physician [23]. The control group received no additional intervention and received teaching as usual. To address concerns that students in the control group may be disadvantaged by not having access to eCREST, we ensured that the control group had access to eCREST at the end of the trial.
Data analysis
Feasibility and acceptability
Uptake was calculated as the percentage of students who registered out of the total number of eligible students. Retention was calculated as the percentage of students who completed T1 and T2 follow-up assessments out of all registered. Acceptability was measured by calculating the percentage of students who agreed with each statement on the acceptability questionnaire. Uptake, retention and acceptability were compared between schools and cohorts using chi-squared tests.
Clinical reasoning outcomes
Validity and reliability
Internal consistency of the self-reported clinical reasoning measure was assessed using Cronbach’s alpha. Construct validity of the self-reported and observed clinical reasoning measures was assessed by correlating the reasoning and knowledge outcomes, using Spearman’s rank correlation coefficient. To estimate the predictive validity of the clinical reasoning measures, the self-reported measure and observed measure of clinical reasoning were correlated with diagnostic choice. The analyses were undertaken for the aggregated dataset then separately for the intervention and control groups.
Effect sizes
Independent t-tests were used to compare mean self-reported clinical reasoning scores between intervention and control groups at T1 and T2. A mixed factorial ANOVA was used to assess change in self-reported clinical reasoning over time, between groups and interaction effects. Logistic regression analyses were conducted to assess the ‘essential information identified’ and the ‘relevance of history taking’. These outcomes were proportional data, so were transformed by calculating the log odds of the outcomes [36, 37]. Group allocation was the only predictor variable in each model, as knowledge did not significantly differ between the groups at baseline. A multinomial logistic analysis was carried out to assess ‘Flexibility in diagnoses’. A complete case analysis was undertaken, such that those students who had missing data were excluded from analysis. Analyses were conducted using Stata Version 15, with p ≤ 0.05 considered statistically significant [38].