Influence of response instructions and response format on applicant perceptions of a situational judgement test for medical school selection

Background This study examined the influence of two Situational Judgement Test (SJT) design features (response instructions and response format) on applicant perceptions. Additionally, we investigated demographic subgroup differences in applicant perceptions of an SJT. Methods Medical school applicants (N = 372) responded to an online survey on applicant perceptions, including a description and two example items of an SJT. Respondents randomly received one of four SJT versions (should do-rating, should do-pick-one, would do-rating, would do-pick-one). They rated overall favourability and items on four procedural justice factors (face validity, applicant differentiation, study relatedness and chance to perform) and ease-of-cheating. Additionally, applicant perceptions were compared for subgroups based on gender, ethnic background and first-generation university status. Results Applicants rated would-do instructions as easier to cheat than should-do instructions. Rating formats received more favourable judgements than pick-one formats on applicant differentiation, study-relatedness, chance to perform and ease of cheating. No significant main effect for demographic subgroup on applicant perceptions was found, but significant interaction effects showed that certain subgroups might have more pronounced preferences for certain SJT design features. Specifically, ethnic minority applicants – but not ethnic majority applicants – showed greater preference for should-do than would-do instructions. Additionally, first-generation university students – but not non-first-generation university students – were more favourable of rating formats than of pick-one formats. Conclusions Findings indicate that changing SJT design features may positively affect applicant perceptions by promoting procedural justice factors and reducing perceived ease of cheating and that response instructions and response format can increase the attractiveness of SJTs for minority applicants. Electronic supplementary material The online version of this article (10.1186/s12909-018-1390-0) contains supplementary material, which is available to authorized users.


Background
An increasing number of medical schools implement a Situational Judgement Test (SJT) in their admission procedures [1][2][3][4]. The growing popularity of the SJT is a result of the test's psychometric qualities, in terms of its predictive validity, incremental validity and low adverse impact, from the perspective of medical school admission committees [5]. Yet, the quality of an SJT should also be investigated from the perspective of medical school applicants, since applicant perceptions may influence test-taking motivation, test performance and applicant withdrawal [6,7]. Furthermore, minority applicants may hold more negative applicant perceptions [8], which could lead to adverse impact through diminished test-taking motivation and test performance. The current study examines the influence of two SJT design features, namely response instructions and response format, on applicant perceptions. Additionally, the perceptions of the SJT are compared for applicants belonging to different demographic subgroups.
SJTs require respondents to judge the appropriateness of responses to challenging situations [9]. The situations are contextualised to the setting for which an individual applies, such as medical school. In general, SJTs are added to admission procedures for the measurement of noncognitive attributes, for instance integrity and interpersonal skills [10]. Prior research has demonstrated that SJTs have predictive validity for future medical school performance [11], that they have incremental validity over traditional cognitive admission instruments [12] and smaller ethnic and socioeconomic subgroup differences than cognitive admission tests [13,14].
In addition to these psychometric findings, several studies have demonstrated that medical school applicants hold favourable perceptions of SJTs [11,[15][16][17]. Moreover, some studies indicated that SJTs are perceived more positively than cognitive admission tests [11,16]. Favourable perceptions of SJTs are likely caused by the test content, which is closely related to the criterion domain for which an individual applies [18]. Furthermore, previous research demonstrated that certain SJT features might affect applicants' perceptions [19]. For example, Chan and Schmitt [6] and Kanning et al. [20] found that applicants perceived the same SJT more positively when it was administered in a video-based format than in a text-based format. Additionally, Neal et al. [21] showed that medical students felt that an SJT with a short-answer-question or an interview response format would better reflect their future behaviour as a junior doctor than a ranked-order or a single-best-answer response format. Response formats using short-answer or interview questions received the most favourable ratings, probably because applicants believe these formats provide a good opportunity to demonstrate their knowledge, skills and abilities [22]. No prior research has examined the influence of SJT response instructions on applicant perceptions.
The importance of applicant perceptions is evidenced by the influence of these perceptions on test-taking motivation and test performance [6] and possible applicant withdrawal [7]. Additionally, prior research indicated that applicant perceptions might differ across demographic subgroups. For example, ethnic minorities tend to perceive selection procedures at large more negatively than ethnic majorities [6,8], possibly due to differences in cultural values and beliefs on testing [23] or by perceptions of stereotype threat [24], which refers to impaired test performance caused by the salience of a negative stereotype [25]. More negative perceptions of the admission procedure might reduce test performance through decreased test-taking motivation [26]. Unfavourable perceptions of the admission process among ethnic minorities might also result in disproportionally more withdrawal from the admission procedure among ethnic minority applicants [7]. Thus, if minority applicantsbased on either gender, ethnic or socioeconomic backgroundperceive an admission procedure more negatively than majority applicants, they might also be less motivated to perform well or more inclined to withdraw from the admission procedure. Consequently, more negative applicant perceptions among minority applicants may lead to adverse impact. It is therefore crucial to examine which design features of an SJT reduce subgroup differences in applicant perceptions. We are not aware of previous studies that have focused on how response instructions and response format may influence subgroup differences in applicant perceptions of an SJT.
The dominant theoretical framework on applicant perceptions is the organisational justice theory [27]. This theory has been applied to studies on applicant perceptions of selection practices for postgraduate medical training [28] and of admission methods in higher education [29]. The organisational justice theory encompasses distributive justice, that is the fairness of the distribution of desired outcomes (e.g. admission spots in medical school) and procedural justice, referring to the fairness of procedures used to allocate desired outcomes [27]. In the model of applicant reactions proposed by Gilliland, procedural justice perceptions are influenced by the formal characteristics of the selection system, like job relatedness and opportunity to perform. According to the organisational justice model, formal characteristics are influenced by test type. Therefore, the formal characteristics component was used to study the influence of SJT design features on applicant perceptions.
The aim of the present study is two-fold. Firstly, we examined the effect of the response instructions (i.e. should do or would do) and the response format (i.e. pick-one or rating) of an SJT on applicant perceptions. The influence of response instructions was examined because previous research showed that SJTs with should-do instructions are less susceptible to faking than SJTs with would-do instructions [30]. Additionally, admission procedures that are perceived as more difficult to fake receive more favourable applicant perceptions [31,32]. Therefore, we hypothesised that applicants have more positive perceptions of an SJT using should-do instructions than an SJT using would-do instructions. The influence of response format on applicant perceptions of an SJT was previously investigated by Neal et al. [21]. However, these researchers did not include a rating format in their investigation, even though this response format is commonly used by SJTs [6,17]. We expected the pick-one format to receive more favourable applicant perceptions than the rating format because applicantsat least in Western culturesare more familiar with the use of pick-one (i.e. multiple-choice) formats in college admission, such as in cognitive ability tests [28]. Additionally, we assumed that applicants associate rating formats with self-report measures, which are prone to faking and therefore perceived less favourably.
Secondly, to determine if SJTs are perceived differently by minority applicants than majority applicants, we examined the influence of demographic variables (i.e. gender, ethnic background and first-generation university status) on applicant perceptions. Based on previous research, we hypothesised to find no gender differences in applicant perceptions [29,33]. The meta-analysis of Hausknecht et al. [33] indicated that the correlation between ethnic background and applicant perceptions was near zero. However, Chan [23] found that among a US sample the predictive validity perceptions of a cognitive ability testbut not of a personality testwere significantly more favourable for White than for Black examinees. Since SJTslike personality testsfocus on noncognitive attributes, we expected no ethnic differences in applicant perceptions. Prior research on subgroup differences in applicant perceptions has mainly focused on gender and ethnicity, but not on socioeconomic characteristics such as the educational level of the applicant's parents. Therefore, we pose the following research question: do applicant perceptions of an SJT differ across subgroups from different socioeconomic backgrounds?

Setting and procedure
This study was conducted at a Dutch medical school, whose admission procedure consisted of three equally-weighted parts: i) pre-university grade point average (pu-GPA), ii) extracurricular activities and iii) performance on three cognitive tests during an on-site testing day. Applicants with a pu-GPA ≥ 7.5 (on a scale from 1 (low performance) to 10 (high performance)) were directly admitted. The applicants of the 2018 admission procedure comprised the sample of this study. After the on-site testing day but before the applicants received the selection decision, applicants were invited to participate in an online survey on applicant perceptions. Participation in the survey was voluntary. Applicants were informed about the aim of the survey and that their answers would not influence the selection decision. Applicants gave informed consent before they were navigated to the survey. The data in this study were processed anonymously.

Survey
The online survey started with a questionnaire on the applicants' demographic characteristics. The demographic questions were administered online for the applicants with pu-GPA ≥ 7.5 and on-site for other responders.
Applicants were categorised as first-generation university student, if both their parents had not attended university. The ethnic background of the applicants was categorised as Dutch if both parents were born in the Netherlands, as non-Western if at least one parent was born in Africa, Asia or South-America, or as Western if at least one parent was born in Europe (but not in the Netherlands), North-America or Oceania [34]. The applicants' gender was retrieved from the student administration system.
The second part of the survey covered applicant perceptions. Applicant perceptions were measured using seven items. Firstly, overall process favourability was assessed using two items: perceived predictive validity and perceived fairness [35]. Steiner and Gilliland [35] reported a coefficient alpha of .73 for the two process favourability items. Secondly, four items were administered measuring formal characteristics of the procedural justice dimension: i) face validity, ii) applicant differentiation [35], iii) study relatedness and iv) chance to perform [36]. These items were selected because previous research demonstrated the influence of these formal characteristics on process favourability [22,29,33]. Finally, one item measuring ease of cheating [29] was added because a prior meta-analysis showed that ease of cheating/difficulty to fake has a negative influence on applicant perceptions [32]. Each item was judged on a seven-point anchored rating scale. The items and rating scales are depicted in Additional file 1.
The survey asked respondents to answer the seven applicant perception items separately for eleven admission instruments (CV, motivation letter, pre-university GPA, cognitive capacity test, skills test, curriculum sample test, personality questionnaire, interview, weighted lottery, unweighted lottery and SJT). The order in which the admission instruments were presented to the respondents was randomised.
Survey respondents received a short description of the SJT followed by two examples of SJT items. These example items were identical, with the exception of two design features that were manipulated. Firstly, the response instructions: the example items asked either which response should be given in the described situation (i.e. should do) or which response the respondents were most likely to perform (i.e. would do). Secondly, the response format: the example items had to be judged either by rating each separate response option (i.e. rating format) or by picking out the best response option (i.e. pick-one). In total, there were four versions of the SJT example items (i.e. should do-rating, should do-pick-one, would do-rating, would do-pick-one). Each respondent randomly received two SJT example items representing one of the four versions.

Statistical analyses
Two-way ANOVAs were used to examine the influence of SJT response instructions (should do versus would do) and SJT response format (rating versus pick one) on process favourability, the four procedural justice items (i.e. face validity, applicant differentiation, study relatedness, chance to perform) and ease of cheating. Main and interaction effects were examined. Pu-GPA ≥ 7.5 status (i.e. directly admitted) was included in the analyses as a control variable. Partial eta-squared was used to examine the effect sizes, where η p 2 = .01, η p 2 = .06 and η p 2 = .14 indicates a small, medium and large effect, respectively [37].
ANOVAs were used to examine subgroup differences (based on gender, ethnic background and first-generation university status) on the applicant perception items. Pu-GPA ≥ 7.5 status was again included as a control variable. In addition, the demographic variables were investigated in relation to the SJT design features by examining if the subgroup variables had an interaction effect with either the response instructions or the response format. Partial eta-squared was used to examine the effect size.

Applicant perception items
The alpha coefficients of the two process favourability items (i.e. perceived predictive validity and perceived fairness) indicated sufficient to good internal consistency (should do-rating: α = .66, should do-pick-one: α = .75, would do-rating: α = .72, would do-pick-one: α = .90). The intercorrelations between the process favourability score (i.e. average of the two process favourability items) and the other applicant perception items are depicted in Table 1. Intercorrelations were controlled for pu-GPA ≥ 7.5 status (i.e. directly admitted). All intercorrelations were statistically significant. The correlations between process favourability and the procedural justice items were all above .6 (large effect size). As expected, the ease-of-cheating item correlated significantly and negatively with process favourability, but the effect size was smaller (r = −.20).

Preliminary analysis: Comparison to other admission methods
Prior to the main analyses, we compared the overall process favourability of the SJT to the other admission methods included in the online survey, in order to determine if the SJT was perceived more or less positively than the other admission methods ( Table 2). Repeated-measures ANOVAs were used to examine the differences in process favourability between the SJT and each of the other admission methods. We controlled for pu-GPA ≥ 7.5 status by including it as a between-subjects factor. The average process favourability rating (on a seven-point scale) ranged between 3.21 (unweighted lottery) and 5.29 (interview). The SJT was judged significantly more favourable than pu-GPA (F(1, . CV was judged as equally favourable as the SJT. Thus, among the other admission methods included in the online survey, the SJT takes a middle position with respect to overall process favourability.

Response instructions and format
Applicant perceptions of the four SJT versions are depicted in Fig. 1. The mean and standard deviations corresponding to Fig. 1 can be found in Additional file 2. A significant main effect of response format was found on the applicant differentiation item (F(1, 362)  Overall, an SJT with a rating response format was rated more favourably than an SJT with a pick-one format on applicant differentiation, study-relatedness, chance to perform and ease of cheating. Thus, the rating format wasin contrast to our hypothesisjudged more favourable than the pick-one format. Finally, response instructions had a significant main effect on the ease-of-cheating item (F(1, 362) = 4.53, p = .034, η 2 = .01) with the would-do instructions (M = 5.33, SD = 1.79) judged as easier to cheat than the should-do instructions (M = 4.92, SD = 1.86). With regard to our hypothesis, no differences between should-do and would-do instructions were found for the overall process favourability, but should-do instructions were judged as more difficult to cheat than would-do instructions. Two-way ANOVAs revealed no significant interaction effects between response instructions and response format.

Subgroup differences
The demographic subgroup differences in applicant perceptions are shown in Table 3. No significant main effects were found for gender, ethnic background or first-generation university status on the judgements of process favourability, the procedural justice factors and ease of cheating. However, significant interaction effects between subgroup and either response instructions or response format were found. Demographic subgroup differences for the four separate SJT versions are depicted in Additional file 2. Gender and response format had a significant interaction effect on the applicant differentiation item (F(1, 362) = 4.80, p = .029, η 2 = .01) and the study-relatedness item (F(1, 362) = 7.64, p = .006, η 2 = .02). The more positive judgement of the rating format than the pick-one format was stronger for men than for women on the applicant differentiation item 040, η 2 = .01). First-generation university students judged an SJT with a rating format more favourably than an SJT with a pick-one format on process favourability (d = 0.45), the face validity item (d = 0.51) and the applicant differentiation item (d = 0.42). In contrast, for non-first-generation university students, both response formats were judged similarly on process favourability (d = 0.05), the face validity item (d = − 0.15) and the applicant differentiation item (d = 0.13). Thus, as stated by our hypotheses, subgroups based on gender and ethnic background did not significantly differ in their applicant perceptions of an SJT. Regarding our research question, we found no significant difference in applicant perceptions between the subgroups based on first-generation university status. Nonetheless, the findings do indicate that subgroups might differ in their preference for certain SJT design features.

Discussion
The present study indicates that response format and to a lesser extentresponse instructions influence applicants' perceptions of an SJT. The results show that asking applicants to rate each separate response option leads to more favourable perceptions of an SJT than asking applicants to pick one of the responses as the best option. Additionally, when instructed to respond according to what they would actually do in the described situation, applicants perceive an SJT as easier to cheat than when instructed to respond according to what should be done in the described situation. The applicant subgroups based on gender, ethnic background or first-generation university status were comparable regarding their perceptions of the SJT. However, our results do show that applicants from a non-Western ethnic background hold more positive perceptions of an SJT with should-do instructions than of an SJT with would-do instructions. On the contrary, applicants from a Western ethnic background appear to be more positive about an SJT with would-do instructions than an SJT with should-do instructions. Additionally, men and first-generation university students perceive an SJT with a rating response format more favourably than an SJT with a pick-one response format. Response instructions had a significant influence on the perceived ease of cheating, indicating that should-do instructions are not only statistically less susceptible to faking [38,39], but are also perceived as more difficult to fake than would-do instructions. Previous research has shown that applicants' perceptions of a test do not always correspond to the actual psychometric qualities of that test [40]. For example, Chan [23] found that personality tests were perceived as more predictive than cognitive ability tests, whereas empirical studies show that cognitive ability tests are more predictive than personality tests. Apparently, ease of cheating is more obvious to applicants than the predictive value of a test and might therefore provide a more effective means to enhance applicant perceptions. Response instructions had no significant effect on the overall process favourability of the SJT. Nevertheless, the negative association between process favourability and ease of cheating indicates that applicant perceptions may be enhanced by reducing the SJT's susceptibility to faking.
In contrast to our hypothesis, a rating response format was perceived more positively than a pick-on response format on three of the procedural justice factors and ease of cheating. We had expected applicants to be more positive on pick-one formats because applicants are more familiar with this response format in traditional multiple-choice admission tests [28,41] and because rating formats are commonly used by easier-to-fake self-report measures. However, the results of this study indicate that applicants perceive rating formats as a better measure to differentiate between applicants, as more strongly related to medical school, as a better means to show skills and abilities and as more difficult to cheat than pick-one formats. Possible explanations for this finding are that rating formats provide applicants the possibility to give more nuanced responses and allow applicants to give a rating of all response options. The challenging situations described in SJTs may be solved using multiple approaches, causing pick-one formats to be perceived as unrealistic [42]. Response formats that allow for more nuanced answers might better fit the dilemma-like nature of SJTs. Likewise, medical students preferred an SJT with a short-answer-question format over an SJT with a single-best-answer format [21]. Unlike our expectations, the rating format was not judged as easier to cheat than the pick-one format. Apparently, when used in SJTs, rating formats are not associated with the negative characteristics of self-report measures in a selection context. More favourable perceptions of the rating format are desirable as previous research has demonstrated that rating formats are superior to other response formats on a variety of psychometric outcomes [43].
Applicant perceptions did not differ across subgroups based on gender, ethnic background and first-generation university status. The absence of subgroup differences is in line with findings of previous studies [29,33,40]. Nevertheless, the significant interaction effects do indicate that certain subgroups might have more pronounced preferences for certain SJT design features. Specifically, men seem to perceive rating formats more positively than pick-one formats regarding applicant differentiation and study relatedness. Prior research on cognitive ability tests showed that open-ended response formats resulted in less gender differences in test performance than multiple-choice response formats [44]. Arthur et al. [43] found that the gender difference in an SJT score was larger for a ranking format than for a rating format and most/least-effective format. This interaction effect between gender and response format on test performance might translate into a gender-response format interaction on applicant perceptions. More research is required to unravel this interaction effect.
Non-Western ethnic minority applicants appear to be more positive on should-do than would-do instructions. Although a previous study demonstrated that the administration method (paper-and-pencil vs. video-based) affected the Black-White difference in applicants perceptions of an SJT [6], this is the first study showing that response instructions might also affect ethnic differences in applicant perceptions of an SJT. McDaniel et al. [30] demonstrated that SJTs with knowledge instructions (i.e. should do) had higher correlations with cognitive ability test, whereas SJTs with behavioural tendency instructions (i.e. would do) had higher correlations with personality. Applicants from a non-Western background might feel that knowledge-based tests are more face valid and stronger related to medical school than personality-based tests and therefore perceive should-do instructions more favourably. Another possible explanation for more positive perceptions of should-do instructions among non-Western ethnic minority applicants might be found in differences between individualistic and collectivistic cultures [45]. We presume that non-Western minority applicants may have a stronger collectivistic cultural orientation than majority applicants and might therefore be more comfortable to judge the SJT response options according to the group norms instead of according to their own individual norms [46]. Additionally, results seem to indicate that Western ethnic minority applicants are more favourable of would-do than should-do instructions. However, the sample size of the Western minority applicant group was very small, making it difficult to draw strong conclusions from this finding. Future research is necessary to replicate these findings and to examine potential explanations.
First-generation university students perceive rating formats more positively than pick-one formats. It appears that applicants from a low socio-economic background have a stronger preference for response formats that permit more nuanced responses than applicants from a high socio-economic background. A possible explanation might be that applicants whose parents did not attend university have more negative test-taking attitudes on traditional formats of testing. SJTs with pick-one formats might be more strongly associated with traditional tests and therefore receive more negative perceptions. Nevertheless, prior research on demographic differences in applicant perceptions has mainly focused on gender and ethnic background. Thus, future research should take into account socioeconomic background when examining subgroup differences in applicant perceptions and should examine why first-generation university students are more favourable of rating formats.

Practical implications
Our findings present two practical implications for medical school admission committees which use an SJT and are concerned with the applicant perceptions of that SJT. Firstly, using should-do instructions as opposed to would-do instructions increases the SJT's favourability among ethnic minority applicants. Secondly, men and first-generation university students perceived an SJT with a rating format more positively than an SJT with a pick-one format. Moreover, applicant perceptions did not differ between the two response instructions and the two response formats for the majority applicants. Therefore, using these SJT design features to positively influence applicant perceptions among minority applicants does not lead to unfavourable perceptions among majority applicants.

Limitations and directions for future research
Although applicant perceptions in this study are solely based on a short description and two example items of an SJT, minor changes in the example items led to significant differences in applicant perceptions. Nonetheless, future research should assess the applicants' perspective after completing a full version of an SJT, preferably one that is used for the actual selection into medical school, to obtain a more thorough picture of the influence of changing the SJT design features on applicant perceptions.
Prior research has indicated that applicant perceptions may influence applicant behaviour (e.g. applicant withdrawal, recommendations to others) [7,47]. However, the present study is limited to examining the influence of SJT design features on applicant perceptions. The behavioural consequences of positive or negative applicant perceptions of an SJT need to be addressed in future research.
In general, the average judgements on the applicant perception items were situated close to the midpoints of the rating scales. Additionally, the SJT was judged significantly less favourable than five of the ten other admission methods included in the online survey (i.e. motivation letter, cognitive capacity test, skills test, curriculum sample test and interview). Even though this study demonstrated that changing the design features of an SJT may enhance applicant perceptions, future research is advised to examine the influence of other SJT characteristics that may positively affect perceptions of SJTs.
Finally, perceptions of procedural justice are not only determined by the formal characteristics of the admission procedure, but also by the treatment of applicants and the explanations of admission procedures and decisions (i.e. interactional justice) [27]. Enhancing applicants' perceptions of an SJT must be accompanied by devoting attention to these other aspects of the medical school admission procedure.

Conclusions
The applicant's perspective on the use of SJTs in medical school admission procedures should not be underestimated, because applicant perceptions might influence test-taking motivation, test performance and applicant withdrawal. The current study demonstrated that changing the response format of an SJT may positively affect applicant perceptions through advancing the procedural justice factors of applicant differentiation, study relatedness and chance to perform and by reducing the perceived ease of cheating. Additionally, applicant perceptions may be altered by using response instructions that are less susceptible to faking. Finally, this study indicated that certain design features may lead to more favourable perceptions of an SJT among minority applicants, presenting another potential measure for promoting widening access to medical school.