Discovering unknown response patterns in progress test data to improve the estimation of student performance

Background The Progress Test Medizin (PTM) is a 200-question formative test that is administered to approximately 11,000 students at medical universities (Germany, Austria, Switzerland) each term. Students receive feedback on their knowledge (development) mostly in comparison to their own cohort. In this study, we use the data of the PTM to find groups with similar response patterns. Methods We performed k-means clustering with a dataset of 5,444 students, selected cluster number k = 5, and answers as features. Subsequently, the data was passed to XGBoost with the cluster assignment as target enabling the identification of cluster-relevant questions for each cluster with SHAP. Clusters were examined by total scores, response patterns, and confidence level. Relevant questions were evaluated for difficulty index, discriminatory index, and competence levels. Results Three of the five clusters can be seen as “performance” clusters: cluster 0 (n = 761) consisted predominantly of students close to graduation. Relevant questions tend to be difficult, but students answered confidently and correctly. Students in cluster 1 (n = 1,357) were advanced, cluster 3 (n = 1,453) consisted mainly of beginners. Relevant questions for these clusters were rather easy. The number of guessed answers increased. There were two “drop-out” clusters: students in cluster 2 (n = 384) dropped out of the test about halfway through after initially performing well; cluster 4 (n = 1,489) included students from the first semesters as well as “non-serious” students both with mostly incorrect guesses or no answers. Conclusion Clusters placed performance in the context of participating universities. Relevant questions served as good cluster separators and further supported our “performance” cluster groupings. Supplementary Information The online version contains supplementary material available at 10.1186/s12909-023-04172-w.

Distortion score elbow for k-means clustering. Mathematically determining the optimal number of clusters k for applying k-means on the PTM data from winter term 2020. Possible k ranges were set between 1 and 29. For each potential k (x-axis), the distortion score (left y-axis) for received clustering and the time it needs to fit in seconds (right y-axis) are shown in blue and in green, respectively. The optimal k based on this run was 5.  Figure 4 Academic semester distribution per cluster. For each academic semester, the distributions of the students in the different clusters are shown in percent. Same colors sum up to 100. For example, ~46 % of students from academic semester 7 are in cluster 1. Raw count distribution can be seen in Figure 3. (Appendix Figure 5 shows the same percent, but ordered by academic semester and colored by cluster  Figure 5 Cluster distribution per academic semester. For each academic semester, the distributions of the cluster association for each academic semester is shown in percent. Each academic semester-group sums up to 100. For example, ~46 % of students from semester 7 are in cluster 1. (Appendix Figure 4 shows the same percent, but ordered by cluster and colored by academic semester Appendix Figure 1 Distribution of discrimination indices of questions from one 'Progress Test Medizin' run. The dotted line at 0.3 shows the well discriminating question threshold.
Appendix Figure 2 Test scores. Percentages of correct answers per students grouped by semester. Overall, 5,444 students from 8 universities in Germany and Austria are shown. Each dot represents the share of correct answers of a single participation.
Appendix  Figure 3 Distortion score elbow for k-means clustering. Mathematically determining the optimal number of clusters k for applying k-means on the PTM data from winter term 2020. Possible k ranges were set between 1 and 29. For each potential k (x-axis), the distortion score (left y-axis) for received clustering and the time it needs to fit in seconds (right y-axis) are shown in blue and in green, respectively. The optimal k based on this run was 5.
Appendix Table 4 Descriptive statistics of the Calinsky-Harabasz score from 200 k-means runs.
The model with the maximum Calinsky-Harabasz score was kept as final model. Appendix Figure 4 Academic semester distribution per cluster. For each academic semester, the distributions of the students in the different clusters are shown in percent. Same colors sum up to 100. For example, ~46 % of students from academic semester 7 are in cluster 1. Raw count distribution can be seen in Figure 3. (Appendix Figure 5 shows the same percent, but ordered by academic semester and colored by cluster)

Mean
Appendix Figure 5 Cluster distribution per academic semester. For each academic semester, the distributions of the cluster association for each academic semester is shown in percent. Each academic semester-group sums up to 100. For example, ~46 % of students from semester 7 are in cluster 1. (Appendix Figure 4 shows the same percent, but ordered by cluster and colored by academic semester) Appendix