Participants and study design
Study participants were recruited from October 2016 to August 2018, which included three semesters. The FC group was composed of fifth-year students from the Department of Medicine of Yang Ming Chiao Tung University, who received clinical clerkship training in an elective evidence-based medicine (EBM) skills course. (Since 2013, Taiwan’s medical education system has been changed to a six-year system, and clinical medical training is carried out in the fifth and sixth years. The students participating in this study were enrolled in their basic clinical clerkships.) During the course, students would review course video clips and read the case discussion materials before class. Then, during class, the teacher would enhance the students’ basic EBM skills training by including how to address clinical problems, how to search, evaluate, and apply relevant literature, and how to use these abilities during clinical care through clinical case discussions. The LB group consisted of medical students who completed a clerkship in general internal medicine at Taipei Veterans General Hospital during the same study period. In their general internal medicine clinical training course, there is a weekly EBM teaching course for an hour in the morning. Students were also required to participate in EBM lectures run by the hospital, and learn EBM concepts and skills from other conventional in-class or online teaching programs. Overall, the total number of classroom teaching hours for the FC and LB groups was the same, ranging from 30 to 36 h. During the study period, a total of 113 students enrolled in the two courses, but after propensity score matching, the FC group and LB group were each composed of 45 students, for a total of 90 students, included in the study and analysis (Fig. 1). This study was approved by the institutional review committees at Taipei Veterans General Hospital (VGHTPE) (IRB No: 2014–11-008B, 2017–01-020 AC). The researchers obtained written informed consent from the students at the beginning of the course. Other actual procedures for research methods were also implemented in accordance with the plan attached to the IRB application and the human research ethics code of the review agency.
Self-report questionnaire
Considering that personality traits may affect students’ learning attitudes and classroom response patterns, the researchers conducted the 40-item Big Five test [20,21,22] on the students before the course. This scale is based on the international version of the International Big Five Mini-Marker by Thompson et al., derived from multinational background samples; it was translated into a Chinese version by Taiwanese scholars [20, 23]. The scale includes five important personality traits: extroversion, openness, neuroticism, conscientiousness, and agreeableness on a 9-point Likert scale from “strongly disagree” to “strongly agree.” This measurement tool has been used in many studies in the field of education and has repeatedly demonstrated good reliability and validity, showing its usability [20,21,22,23,24].
In addition, students’ personal and learning background information, including age, sex, admission route (interview, recommendation, or national examination), student loans, part-time job, and past academic performance (average grades from the first to fourth year at university) were collected using the pre-course questionnaire.
Objective assessments of EBM learning performance
To understand learning effectiveness in the FC and LB student groups before they were exposed to the EBM courses, researchers tested the basic concepts of EBM on students taking the classes for the first time, using 20 multiple-choice questions modified from previously published literature [16]. The details of this scale are published elsewhere [16]. The pre-course test was used as a baseline assessment of the students’ EBM skills. Scores were converted into 0–10 and were used as a covariate in the comparative analysis.
The post-test, which occurred at the end of the course, included two parts: a written test and an oral test. The written test included open-ended questions, modified from the “Fresno test” used to assess the effectiveness of a comprehensive EBM curriculum in the University of California, San Francisco’s Fresno family practice residency program [25,26,27]. Based on clinical scenarios set by the researchers, each student responded to the following prompts: (1) identify the most appropriate research design for answering the question, (2) show the process of database searching, (3) identify important issues for determining the relevance and validity of a given research article, and (4) discuss the significance and importance of the research findings. These prompts included the “Ask,” “Acquire,” and “Appraise” aspects of EBM (scores converted to 0–10). Exam results were assessed by five experienced raters who have discussed the difficulty level of the exam questions and consistency of scoring throughout the course consensus meeting, to minimize differences in exam question depth and subjective ratings.
The oral test was implemented by grouping students based on their EBM questions, using the population, intervention, control, and outcomes (PICO) format which is of clinical interest and suitable for in-depth discussion. Multiple individuals scored their presentation and responses to inquiries. For the oral exam, aside from the analysis of “Ask,” “Acquire,” and “Appraise” from the written exam questions, the aspect of “Applying” the integrated concepts obtained from these studies was also assessed. The oral tests were rated by at least five independent raters, who had been qualified as EBM teachers and had at least 100 h of teaching experience. Oral test scoring was based on the EBM competition checklist from the National Medical Quality Award held by the Joint Commission of Taiwan. This has been validated by EBM experts and was modified by our EBM teachers to accommodate our test format. The scoring domains included “Ask: the quality and quantity of PICO,” “Acquire: the searching strategy,” “Appraise: summarizing the validity and importance of each article,” and “Apply: transforming evidence into practice.” Detailed items and scoring are shown in Supplemental Table S1). Each domain was converted into a score of 0–10 for further analysis.
Statistical analysis
We performed the analyses using the IBM SPSS Software version 20. For the comparative analysis, all written test scores, oral test scores, and students’ past academic performance grades were normalized from 0 to 10 points. Independent samples t-test (for continuous variables) and chi-square test (for categorical variables) were used to compare differences in population and learning background, between students in the FC and LB groups.
To reduce selection bias and to effectively adjust for possible confounding factors that may affect post-course performance, PPSMA [28, 29] was used to select students in the LB group with similar background attributes for pairing with each student in the FC group. Variables included in the propensity score modeling were age, sex, university admission route, student loans, part-time job, past academic performance in the preclinical years, personality dimension scores from the Big Five, and pre-course test scores before exposure to the EBM curriculum. These variables, including pre-course ability, personal traits, learning resource and allocated time, were included in PPSMA because they might affect students’ academic achievement and students’ learning outcomes of the EBM courses. The research flowchart is shown in Fig. 1. Propensity score matching was performed using logistic regression with the “allocation group” as the dependent variable. The propensity score, the predicted probability that a particular individual is assigned to the experimental group, was derived for each participant and used to select students for the control group.
After the two groups of students were paired, the standardized mean difference (SMD) was used to check whether the distribution of variables in the two groups was balanced. The SMD was calculated by dividing the score of the experimental group minus the control group score, by the total standard deviation. SMD < 0.1 indicated a negligible difference between the two groups. This indicator was used to determine the balance between the two groups because this was a small sample. The resulting value is more rigorous than the p-value, so the SMD and the p-value are juxtaposed.
Based on the written and oral test scores of the two student groups after completing the class, an analysis of covariance (ANCOVA) was conducted to investigate the differences in post-test scores after adjusting for the covariance between the two groups. The included covariates were the same as those used in the PPSMA. Radar charts were used to visualize the performance of written and oral test results from the two groups of students.
Finally, to investigate the difference between the written and oral test scores between the two groups of students after controlling for potential confounding variables, and to analyze the interpretation ratio of the “group” variable in the model, multivariable linear regression analyses were also performed. The adjusted R square value (adjusted R2) was adopted to quantify the proportion of variance explained by covariates in the regression models. Statistical analyses were performed using SPSS v18 and radar charts were generated using Python v3.7. All analyses were considered statistically significant at P < 0.05.