|Type of validity evidence (and description)||Examples of validity evidence found in medical education|
Content validity includes the outline and plan for the test. The principal question to ask is whether the content of the test is sufficiently similar to and representative of the activity or performance it is intended to measure?
The outlines, subject matter domains, and plan for the test as described in the test ‘blueprint.’|
Mapping the test content to curriculum specifications and defined learning outcomes.
The quality of the test questions and the methods of development and review used to ensure quality.
Expert input and judgements and how these are used to judge the representativeness of the content against the performance it is intended to measure.
Response process is concerned with how all the participants - candidates and officials - respond to the assessment. It is part of the quality control process.
The clarity of the pre-test information given to candidates.|
The processes of test administration, scoring, and quality control.
The guidelines for scoring and administration.
The performance of judges and observers.
Quality control and accuracy of final marks, scores, and grades.
Is the assessment structured in such a way as to make it reliable, reproducible, and generalizable? Are there any aspects of the assessment’s structure that might induce bias?
The statistical or psychometric characteristics of the test such as:|
• Item performance e.g. difficulty
• Factor structure and internal consistency of subscales
• Relationship between different parts of the test
• Overall reliability and generalizability
Matters relating to bias and fairness.
Relationship to other variables|
The relationship to other variables is concerned with the connection between test scores and external variables.
The correlation or relationship of test scores to external variables such as:|
• Scores in similar assessments with which we might expect to find strong positive correlation.
• Scores in related but dissimilar assessments e.g. a knowledge test and an Objective Structured Clinical Examination (OSCE), where weaker correlations might be expected.
Generalizability of evidence and limitations such as study design, range restriction, and sample bias.
Consequences or evidence of impact is concerned with the intended or unintended consequences assessment may have on participants or wider society. It may include whether assessments provide tangible benefits or whether they have an adverse or undesirable impact.
The intended or unintended consequences of the assessment on participants (such as failure) or wider societal impacts. This includes the wider impact of the assessment when viewed as an intervention.|
The methods used to establish pass/fail scores.
False positives and false negatives.