We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Validity of very short answer versus single best answer questions for undergraduate assessment
BMC Medical Educationvolume 16, Article number: 266 (2016)
Single Best Answer (SBA) questions are widely used in undergraduate and postgraduate medical examinations. Selection of the correct answer in SBA questions may be subject to cueing and therefore might not test the student’s knowledge. In contrast to this artificial construct, doctors are ultimately required to perform in a real-life setting that does not offer a list of choices. This professional competence can be tested using Short Answer Questions (SAQs), where the student writes the correct answer without prompting from the question. However, SAQs cannot easily be machine marked and are therefore not feasible as an instrument for testing a representative sample of the curriculum for a large number of candidates. We hypothesised that a novel assessment instrument consisting of very short answer (VSA) questions is a superior test of knowledge than assessment by SBA.
We conducted a prospective pilot study on one cohort of 266 medical students sitting a formative examination. All students were assessed by both a novel assessment instrument consisting of VSAs and by SBA questions. Both instruments tested the same knowledge base. Using the filter function of Microsoft Excel, the range of answers provided for each VSA question was reviewed and correct answers accepted in less than two minutes. Examination results were compared between the two methods of assessment.
Students scored more highly in all fifteen SBA questions than in the VSA question format, despite both examinations requiring the same knowledge base.
Valid assessment of undergraduate and postgraduate knowledge can be improved by the use of VSA questions. Such an approach will test nascent physician ability rather than ability to pass exams.
Single Best Answer (SBA) questions are widely used in both undergraduate and postgraduate medical examinations. The typical format is a question stem describing a clinical vignette, followed by a lead in question about the described scenario such as the likely diagnosis or the next step in the management plan. The candidate is presented with a list of possible responses and asked to choose the single best answer.
SBA questions have become increasingly popular because they can test a wide range of topics with high reliability and are the ideal format for machine marking. They also have a definitive correct answer which is therefore not subject to interpretation on the part of the examiner.
However, the extent to which SBAs measure what they are intended to measure, that is their ‘validity,’ is subject to some debate. Identified shortcomings of SBAs include the notion that clinical medicine is often nuanced, making a single best answer inherently flawed. For example, we teach our students to form a differential diagnosis, but the ability to do this cannot, by the very nature of SBA questions, be assessed by this form of testing. Secondly, at the end of the history and physical examination, the doctor has to formulate a diagnosis and management plan based on information gathered rather than a ‘set of options’. Furthermore, the ability of an SBA to accurately test knowledge is affected by the quality of the wrong options (‘distractors’). Identifying four plausible distractors for SBAs is not always easy. This can be particularly challenging when assessing fundamental or core knowledge. If one or more options are implausible, the likelihood of students choosing the correct answer by chance increases. Thus the distractors themselves may enable students to get the correct answer without actually knowing the fact in question because candidates may arrive at the correct answer simply by eliminating all the other options.
Lastly, and perhaps most importantly, cueing is inherent to such a mode of assessment and with enough exam practice, trigger words or other recognised signposts contained within the question mean that ultimately, what is tested, is the candidate’s ability to pass exams rather than their vocational progress. It is therefore possible for candidates to get the answer correct even if their knowledge base is inadequate.
Within our own medical school, throughout the year groups, students undergo both formative and summative assessments by SBA testing. In order to address the shortcomings that we have identified in this model, we have developed a novel, machine marked examination in which students give a very short answer (VSA) which typically consists of three words or less. We hypothesised that VSAs would prove a superior test of inherent knowledge compared to SBA assessment.
In a prospective pilot study, the performance of one cohort of 266 medical students in a formative examination was compared when the same knowledge was tested using either SBA or a novel, machine-marked VSA assessment. All students sat an online examination using a tablet computer. Questions were posed using a cloud-based tool and students provided answers on a tablet or smart phone. In the examination, 15 questions in the form of a clinical vignette (Table 1) were asked twice (Additional file 1: Supplementary methods).
The first time the questions were asked, candidates were asked to type their response as a VSA, typically one to three words, to offer a diagnosis or a step in the management plan. The second time the questions were asked in an SBA format. We compared medical students’ performance between the questions in this format and the VSA format. In order to avoid the effect of cueing in subsequent questions, the exam was set in such a way that students could not return to previously answered questions.
SBA responses were machine marked. The VSA data was exported into a Microsoft Excel spreadsheet. Using the ‘filter’ function in Microsoft Excel, we reviewed the range of answers for each question and assigned marks to acceptable answers. In this way, minor mis-spellings or alternative correct spellings could be rapidly marked as correct (Fig. 1). When several students wrote the same answer, this only appeared as one entry within the filter function to be marked. As a consequence, the maximum time spent on marking each question for 266 students was two minutes.
The McNemar’s test was used to compare the students’ responses to VSA and SBA questions.
There was a statistically significant difference in the proportions of correct/incorrect answers to the VSA and SBA formats in all 15 questions (p < 0.01). Figure 2 shows the numbers of students who got each question correct, either as a VSA or an SBA. In all questions, more students got the correct answer when given a list to choose from in the SBA format than when asked to type words in the VSA examination. For example, when asked about the abnormality on a blood film from a patient with haemolytic ureamic syndrome, only 141 students offered the correct VSA. However a further 113 students who didn’t know this, guessed it correctly in the SBA version of the question.
The high scoring of the SBA questions in comparison to the VSA format is a cause for concern for those involved in the training of fit for purpose doctors of tomorrow. Despite the same knowledge being tested, the ability of some students to score in the SBA but not in the VSA examination demonstrates that reliance on assessment by SBA can foster the learning of association and superficial understanding to pass exams.
Assessment is well known to drive learning  but it only drives learning that will improve performance in that type of assessment. Studying exam technique rather than the subject is a well-known phenomenon amongst candidates . In the past 20 years there has been an emphasis on the use of tools such as the SBA that offer high reliability in medical school assessment . The replacement of marking of written exams with machine marked SBAs has resulted in students engaging in extensive and strategic practice in that exam technique. Students who practise large numbers of past questions can become adept at choosing the correct option from a list without an in-depth understanding of the topic. While practising exam questions can increase knowledge, the use of cues to exclude distractors is an important skill in exam technique with SBAs. This technique improves performance in the assessment, but does not enhance the student’s ability to make a diagnosis in clinical situations. Students who choose the correct option in SBAs may be unable to answer a question on the same topic when asked verbally by a teacher who does not present them with options. In clinical practice a patient will certainly not provide a list of possible options. We are thus sacrificing validity for reliability.
One of the key competencies for the junior doctors is to be able to recall the correct diagnosis or test in a range of clinical scenarios. In a question about a patient with suspected diabetic ketoacidosis (DKA), only 42 students offered to test capillary or urinary ketones, whereas when the same question was posed in the SBA format, another 165 students chose the correct answer. We expect our junior doctors to recall this important immediate step in the management of an unwell patient with suspected DKA. Our findings therefore suggest that assessment by VSA questions may offer added value in testing this competency.
An ideal assessment will encourage deep learning rather than recognition of the most plausible answer from a list of options. Indeed tests that require students to construct an answer appear to produce better memory than tests that require students to recognise an answer .
In contrast to SBA, our pilot study has demonstrated that to correctly answer a VSA, students need to be able to generate the piece of knowledge in the absence of cues, an approach that is more representative of real-life medical practice.
Our increasing reliance on assessment by SBA is partly an issue of marking manpower. SBAs can be marked by a machine, making them a highly efficient way to generate a score for a large number of medical students. Any other form of written assessment requires a significant investment of time by faculty members to read and mark the examination. However, our novel machine marked VSA tested knowledge and understanding but each question could still be marked in two minutes or less for the entire cohort. The use of three or less words allowed for a stringent marking scheme, thus eliminating the inter-marker subjectivity which can be a problem in other forms of free text examination.
It must be emphasised that there is no single ideal mode of assessment and that the best way to achieve better reliability and validity is by broad sampling of the curriculum using a variety of assessment methods. This is a pilot study and further research should evaluate the reliability, validity, educational impact and acceptability of the VSA question format.
This pilot study highlights the need to develop machine marked assessment tools that will test learning and understanding rather than exam technique proficiency. This study suggests that students are less likely to generate a correct answer when asked to articulate a response to a clinical vignette than when they have to pick an answer from a list of options. This therefore raises the possibility that the VSA is a valid test of a student’s knowledge and correlation of VSA marks with other modes of assessment should be investigated in future research. Future examinations may be enhanced by the introduction of VSAs, which could add an important dimension to assessments in clinical medicine.
Short answer questions
Single best answer
Very short answer
Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. Lancet. 2001;357:945–9.
McCoubrie P. Improving the fairness of multiple-choice questions: a literature review. Med Teach. 2004;26:709–12.
van der Vleuten CP, Schuwirth LW. Assessing professional competence: from methods to programmes. Med Educ. 2005;39:309–17.
Wood T. Assessment not only drives learning, it may also help learning. Med Educ. 2009;43:5–6.
Availability of data and materials
The dataset can be requested from the corresponding author on reasonable request.
AHS, SH, JH and KM designed the work, conducted the study, analysed and interpreted the data and wrote the manuscript. AHS, SH, JH and KM read and approved the final manuscript.
AHS is Head of Curriculum and Assessment Development in the School of Medicine, Imperial College London. SH is an NIHR Clinical Lecturer at Imperial College London. JH is Director of Curriculum and Assessment in the School of Medicine, Imperial College London. KM is the Director of Teaching in the School of Medicine, Imperial College London.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
The study was reviewed by the Medical Education Ethics Committee (MEEC) at Imperial College London and was categorised as teaching evaluation, thus not requiring formal ethical approval. MEEC also deemed that the study did not require consent.
Supplementary Methods. Questions used in Single Best Answer (SBA) and Very Short Answer (VSA) formats. (DOCX 22 kb)