This study was planned and conducted in adherence to PRISMA standards [6] of quality for reporting meta-analysis. We also considered the PRISMA criteria based on the PRISMA 2009 checklist in reporting each section, such as introduction, methods, results, and discussion.
Study selection
Studies published between January 1995 and July 2013 were identified by conducting an electronic search of the following databases: EBSCO, Medline, ScienceDirect, ERIC, RISS, and the National Assembly Library of Korea database. The literature search was limited to articles published in English or Korean and was conducted using combinations of the keyword phrases nursing, simulation, human patient, and simulator. A total of 2279 potential studies were identified. Titles and abstracts were reviewed for eligibility.
Relevant studies were screened for inclusion based on the following criteria: 1) the study aimed to evaluate the effectiveness of simulation-based education for nursing students, and 2) an experimental or quasi-experimental design was used. We excluded articles that did not report a control group or that tested the effectiveness of computer-based virtual patients. For abstracts that did not provide sufficient information to determine eligibility, full-length articles were retrieved. Disagreement on the inclusion or exclusion of articles was resolved by consensus. Of the potentially relevant 2279 articles, screening of the title and abstracts resulted in 317 relevant studies. After a review of these articles, 96 studies were retained and three articles included additionally via hand search. These 99 full-text articles were reviewed systematically to confirm their eligibility (Fig. 1).
Criteria for considering studies for this review
In this study, assessment of the methodological quality of 40 selected studies was performed by using the Case Control Study Checklist developed by the Critical Appraisal Skills Programme (CASP) [7]. The CASP appraisal tool was designed to facilitate systematic thinking about educational studies. This tool contains 11 questions in three sections: (1) Are the results of the trial valid? (2) What are the results? (3) Will the results help locally? Most of the items were responded with “yes,” “no,” or “can’t tell.” The papers were assessed by two independent reviewers using the CASP checklist. Any disagreement that arose between the reviewers was resolved through discussion and consensus with a third reviewer. Forty studies met the inclusion criterion of nine or more out of 11 questions answered with “yes” and were consequently considered to be applicable to this review study. The inclusion criteria for this review were as follows:
Study participants
This study sampled pre-licensure nursing students, licensed nurses, or nurse practitioners.
Type of interventions
We defined simulation-based educational intervention as education involving one or more of the following modalities: partial-task trainers, standardized patients (SPs), full-body task trainers, and high-fidelity mannequins.
Types of outcome variables
Study outcomes included learning and reaction outcomes. Learning outcomes were categorized into three domains: cognitive, psychomotor, and affective.
Data coding
The level of fidelity was determined by the environment, the tools and resources used, and other factors associated with the participants [8]. However, as to debriefing, a few selected studies do not indicate the method of debriefing they had used, making it difficult to categorize and discuss the effects of each debriefing method. Thus, we categorized fidelity level according to the physical equipment used. Fidelity level was coded as low, medium, or high according to the extent to which the simulation model resembled a human being, hybrid, or SP. LFSs were defined as static models or task trainers primarily made of rubber body parts [9, 10]. Medium-fidelity simulators (MFSs) were full-body manikins that have embedded software and can be controlled by an external, handheld device [10]. HFSs were life-sized computerized manikins with realistic anatomical structures and high response fidelity [11]. We also considered hybrid simulators, which combined two or more fidelity levels of simulation. As SP is a person trained as an individual in a scripted scenario for the purposes of instruction, practice, or evaluation [12], the use of SP was considered because of the different types of fidelity responses, such as body expressions and verbal feedback, which cannot be perceived in other simulation models.
The extracted data were coded by two researchers. A coding manual was developed in order to maintain the reliability of coding. The manual included information regarding effect size calculations and the characteristics of the study and the report. Differences between coders were resolved by discussion until a consensus was achieved.
Data synthesis and analysis
The software Comprehensive Meta-Analysis version 2 (Biostat, Englewood, New Jersey) was used to conduct the data analysis. Effect size estimates were adjusted for sample size (Cohen’s d), and 95 % confidence intervals were calculated to assess the statistical significance of average effect sizes.
Fixed effects models assume that the primary studies have a common effect size. In contrast, random effects models attempt to estimate the distribution of the mean effect size, assuming that each primary study has a different population [13]. A test for heterogeneity of the intervention effects was performed using the Q statistic. As the results of the test for heterogeneity was statistically significant, we used the random effects models to accommodate this heterogeneity for the main effect and sub-group analyses. The planned subgroup analyses were conducted on evaluation outcomes.