This study employed a quasi-experimental posttest design to evaluate the effects of examiner’s positive and negative verbal feedback on the accuracy of self-assessment, emotional responses, and self-efficacy.
Participants and setting
This study was approved by the ethics committee of Hallym University. The sample size was calculated for independent sample t tests between positive feedback (PF) and negative feedback (NF) groups using G Power 3.0.10. The probability for a Type I error (alpha) was set at 0.05, the power (1–beta) was set at 0.8, and the medium effect size was set at 0.5. The required sample size was 51 for each group. In general, the baccalaureate nursing curriculum in South Korea is similar to those of bachelor of science in nursing (BSN) in United States. Eligible participants were all second-year nursing students who taking a course of the fundamental nursing at a university in South Korea. All 111 students agreed to participate in the study. Following an informed consent procedure, students were assigned randomly to the PF group (n = 58) or the NF group (n = 53). One student in the control group dropped out because of missing data in the questionnaire, the final subjects were 110 students, 58 in the PF group and 52 in the NF group.
First, the instructor rated the student’s skill performance by observing using a checklist from the protocol of the Korean Accreditation Board of Nursing Education  applying rigorous step-by-step procedures. In the checklist, patient identification, explanation of the purpose of the procedure, details of critical procedural elements, and cleaning up were listed in that order. A dichotomous scoring scale of 0 = not done/done incorrectly and 1 = done correctly was imposed for each item.
Following completion of the skill performance and feedback, students completed a self-report questionnaire, which included measures of self-rated one’s skill performance, emotional responses, self-efficacy, and general characteristics such as age, gender, and grade point average (GPA) for previous academic achievement.
The accuracy of self-assessment of performance was calculated for each subject as the difference between the observed actual score and the student’s own self-rated score.
Emotional responses were measured by utilizing Warr’s questionnaire items  to identify the emotional differences of participants depending on the types of feedback they were given. In this study, a total of 10 types of emotion were selected that can appear during skill performance testing: five positive (cheerful, glad, contented, comfortable, or relaxed) and five negative (unsatisfied, anxious, tense, sad, or discouraged) emotion terms. A 5-point Likert scale (1 = strongly disagree to 5 = strongly agree) was used with this instrument. Cronbach’s alpha coefficients of these positive and negative response scales were 0.89 and 0.83 respectively.
Self-efficacy was measured with a self-efficacy subscale of the Motivated Strategies for Learning Questionnaire developed by Pintrich et al. . The questionnaire consists of eight items. A 7-point Likert scale (1 = strongly disagree to 7 = strongly agree) was used with this instrument. The sum of the item scores reflects self-efficacy for learning and performance. A higher score indicates a higher level of self-efficacy of the respondent. The reported Cronbach’s alpha coefficient of this scale was 0.93 in Pintrich et al.  and 0.94 in this study.
Students were randomly assigned a type of feedback (positive or negative) using a simple random extraction lot, and they were not informed of the type of feedback they would receive. The instructor assessed the student’s performance with a prepared evaluation sheet, did not provide any feedback during the performance evaluation and maintained verbal and nonverbal neutrality. After completing the skill performance, each student received a predetermined type of verbal feedback regardless of whether he or she did well or not from one of the two instructors who had been observing their performance on a one-to-one basis. Even though the student was very good at performing, there was still room for improvement, so it was not a problem even if the student was placed in negative feedback. Conversely, if a student who was assigned to positive feedback did not perform well, he received positive feedback that focused on what was good. The instructor provided behavior-based feedback in a neutral manner following the guidelines for the two types of feedback that were developed in advance.
The positive feedback consisted of general compliments (e.g., “Great!” or “Well done”) and then three specific positive comments about his/her performance. The negative feedback consisted of general criticisms (e.g., “More effort required” or “Wrong”) and then three specific constructive comments about his/her performance. After receiving the feedback, the students assessed their own performance with the same evaluation sheet. The students then completed a survey of their perceptions of emotional response and self-efficacy. At the completion of the session, the instructor provided correct feedback to students whose received feedback was not consistent with their true performance. For example, a student who received only positive feedback despite the performance requiring significant remediation (having made many mistakes) was given an opportunity to correct the survey later. Data collection of this study was conducted throughout the day in a nursing laboratory in the school.
Shapiro–Wilk tests were conducted to assess the normal distribution of all variables. Chi-squared tests, Fisher’s exact tests, independent sample Student’s t tests, and Mann–Whitney nonparametric U tests were used to compare the baseline measurements of the demographic characteristics and the dependent variables between the PF and NF groups. Analysis of covariance (ANCOVA) was used to compare mean differences between the groups, in which age and performance test scores were used as covariates to control for differences in baseline characteristics. At that point, data that were not normally distributed were adjusted with natural log transformation. Data were analyzed using IBM SPSS Statistics 21.0 and the level of significance was set at p < 0.05 in two-tailed tests.