Skip to main content

Table 3 Test statistics from CTT and IRT analyses of the 40 item RANZCOG FSEP trial instrument

From: Rasch scaling procedures for informing development of a valid Fetal Surveillance Education Program multiple-choice assessment

Parameter Value/s Comments
Number of candidates, N 877 This was considered adequate for the purposes of a trial analysis for informing refinements to the extended test instrument. The resultant standard error of measurement for item difficulty parameter estimates was small at 0.07 logits.
Cronbach's Alpha 0.80 This was promising given the use of some underperforming items and a shortened test form. However, the intention will be to increase test length, item quality and test targeting to achieve a value in excess of 0.9.
Item Separation Reliability 1.00 This provides evidence that item parameter estimates are adequately separable and varied.
WLE Person Separation Reliability 0.71 Much as for Cronbach's Alpha, this value will need to be increased with the introduction of additional quality, targeted items. High Person Separation values are of particular importance when determining the number of performance levels and corresponding cut-scores that can be specified for a single assessment.
Mean Test Score (and Standard Deviation) 26.7 (5.7) These values (mean and standard deviation) suggest that the test is not too easy or too difficult for the practitioner population and is not subject to "floor" or "ceiling" effects.
Mean Item Infit (and Standard Deviation) 1.00 (0.05) These values (mean, and in particular, standard deviation) support the assumption that the FSEP instrument measures a single, unidimensional construct. This was important for justifying continued use of the Rasch-based methodology.
Proportion of t-tests outside 95% confidence interval 7.75 (5.98 – 9.52) This proportion indicates a slight departure from unidimensionality. Other item and test statistics did not detect this departure. This technique will be repeated once further item revisions have been carried out. Items with strong opposite loadings will be compared through qualitative review and evaluated against the intended construct.