Study design
This quasi-experimental study used a nonequivalent control group pretest–posttest design for developing and assessing the effect of an AI chatbot educational program for non-face-to-face video lectures on EFM for nursing college students.
Study participants
The participants were junior students at the university of nursing located in G province of South Korea. Their selection criteria were as follows: 1) nursing students, 2) voluntary participation, and 3) no experience of an EFM educational program utilizing chatbot. The exclusion criteria were nurses or students with nurse’s aide certification, as those with this certification would have prior knowledge obtained from their clinical experience, which might have interfered with this study’s assessment. The sample size was calculated using G*Power version 3.1.9.2 [19]. The minimum sample size for each of the two groups was calculated as 26, based on the two-tailed test of the difference between two independent means with a 1:1 ratio, test power of 0.80, significance level of 0.05, and effect size of 0.80. Considering a 20.0% dropout rate, 66 participants (33 in each group) were selected. Since the data of three participants of experimental group who did not complete the chatbot program were excluded and two participants of control group who did not complete the video lecture were excluded. The final number of participants in the control and experimental groups were 31 and 30, respectively (Fig. 1). Since previous studies that assessed the effect of using chatbot in college students’ education had an effect size larger than 0.8 [7, 20], the effect size in this study was set at 0.8, which corresponds to a large effect size, as presented by Cohen [21].
Study stages
In the analysis stage, nursing college students’ requests regarding the function and contents of the chatbot program were analyzed, and a literature search for EFM nursing education was conducted to develop the contents and learning goals of the AI chatbot educational program for EFM.
In the design stage, the program’s process and service interface were designed, using an algorithm that allows customized interventions to be provided on the platform. Furthermore, the service interface was designed to increase the study participants’ readability and concentration.
The user interface was designed using LandBot.io (https://landbot.io/). Users can enter questions and see the answers in the chatbot through the user interface. When a user enters a question in a natural language, the intention and entities of the question are recognized by the natural language processing engine, following which, the most adequate answer is selected and provided to the user from a database of accumulated learning results. The chatbot consists of introduction, main course, and conclusion stages (Fig. 2). In the introduction stage, students are first greeted and introduced to the learning objectives of the chatbot program. The next step is to check-up the understanding of video preceding learning. Various methods such as O/X quiz, multiple choice, and open-ended questions are used to check the understanding of preceding learning. Feedback is provided after every question, and it is different depending on whether the answer is correct or not. If the answer is not correct, the contents of the study are rearranged so that the students can study again. After the check-up of the preceding learning is completed, the contents are summarized and then students can proceed to the next step. In the main course, students learn about nursing management and nursing interventions of electronic fetal monitoring devices through chatbot learning activities. Students read and interpret related graphs, identify patient symptoms, and learn to prioritize nursing interventions accordingly. In addition, students can experience equipment through various pictures and photos. Chatbot learning activities also provide feedback based on students’ responses and enhance their learning experience. Finally, in the conclusion section, students organize and integrate what they have learned so far. Then, the chatbot program ends with a final greeting.
Moreover, the heuristics and performance of the program were evaluated by experts and modified and adjusted accordingly. Thereafter, the chatbot was used by nursing college students, whose user experiences were evaluated, and the performance of the chatbot was revalidated.
Study instruments
The participants’ knowledge of EFM was assessed using 13 questions as follows: three questions on understanding and explaining the purpose and method of EFM, seven questions on examining the results of EFM, and three questions on knowledge of nursing interventions based on the EFM results. With one point for every correct answer, the total scores ranged from 0‒20, with higher scores indicating better knowledge of EFM. The content validity was verified by four experts (three nursing professors who had teaching experience in women’s health, and nursing science, and a nurse with more than 10 years of experience in the delivery room), and only those items with a content validity index of 0.8 or higher were selected and finalized. The reliability of the instrument was calculated as 0.79 using Kuder-Richardson Formula 20.
In this study, clinical reasoning competency were measured with 15 questions developed by Liou et al. [22], using a 5-point Likert scale, and translated and validated by Joung and Han [23]. Higher scores indicate a higher level of clinical reasoning competency. The reliability of the instrument was Cronbach’s α = 0.94 in Liou et al.’s study [22], and Cronbach’s α = 0.93 in Joung and Han’s study [23]. In this study, the reliability of the instrument was Cronbach’s α = 0.96.
Furthermore, confidence in fetal health assessment using EFM was measured using three questions. For each question, a response of “strongly confident” and “not confident at all” accounts for 10 and 0 points, respectively. A higher total score indicates a higher level of confidence. The reliability of the instrument was Cronbach's α = 0.91.
In this study, interest in education, assistance for self-directed learning, and feedback satisfaction were measured using numerical rating scales. For each question, assessing interest in education, a response of “strongly confident” and “not confident at all” accounts for 10 and 0 points, respectively. A higher total score indicates a higher level of interest in education. Moreover, for each question assessing assistance for self-directed learning, a response of “very helpful” and “not helpful at all” account for 10 and 0 points, respectively. A high score indicates a high level of self-directed learning Finally, for each question assessing feedback satisfaction, a response of “very satisfied” and “not satisfied at all” account for 10 and 0 points, respectively. A higher total score indicates a higher level of feedback satisfaction.
Data collection
Data were collected between November 3 and 16, 2021. Due to the recurrence of COVID-19, this study was conducted using non-face-to-face video lectures. The experimental and control groups completed an online pre-test questionnaire prior to the commencement of the video lectures. The experimental group attended both the video and chatbot lectures, whereas the control group only attended the video lectures. A video lecture was approximately 32 min long with a professor delivering a unidirectional lecture without obtaining feedback. The learning goals of these video lectures were as follows: 1) explaining the purpose and method of EFM, 2) interpreting the results of EFM during labor, 3) understanding the purpose of the nonstress test and performing it by manipulating the device, 4) explaining the purpose, method, and results of the contraction stress test, and 5) applying nursing procedures in the presence of fetal distress. The control group submitted the post-test questionnaires online after the video lectures, following which, they were allowed to attend the chatbot lectures. Meanwhile, the experimental group submitted the post-test questionnaires after the video and chatbot lectures.
Statistical analysis
Collected data were analyzed using SPSS/WIN 23. A Shapiro–wilk test was performed to test the normality of variables before applying the program. Pre-test homogeneity testing of the participants’ general characteristics and measurement variables was performed using chi-squared tests, Fisher’s exact tests, and t-tests. After the intervention, independent t-tests were performed to compare the differences in knowledge, clinical reasoning competency, interest in education, self-directed learning, and feedback satisfaction between the experimental and control groups.
Ethical considerations
This study was conducted after obtaining an approval from the Institutional Review Board of Dongnam Health University (1044371–202109-HR-006–01). Instructions on study participation and a consent form were attached to the questionnaire, and data were collected after explaining the study to the participants. The consent form provided information regarding voluntary participation, assurance of confidentiality, and the scope of the application of the study’s results. Moreover, the participants were assured that they could withdraw participation at any time and it would not affect their grades. Additionally, they were informed that the program would not invade their privacy.