- Research article
- Open Access
Reliability of Multiple Mini-Interviews and traditional interviews within and between institutions: a study of five California medical schools
BMC Medical Education volume 17, Article number: 190 (2017)
Many medical schools use admissions Multiple Mini-Interviews (MMIs) rather than traditional interviews (TIs), partly because MMIs are thought to be more reliable. Yet prior studies examined single-school samples of candidates completing either an MMI or TI (not both). Using data from five California public medical schools, the authors examined the within- and between-school reliabilities of TIs and MMIs.
The analyses included applicants interviewing at ≥1 of the five schools during 2011–2013. Three schools employed TIs (TI1, TI2, TI3) and two employed MMIs (MMI1, MMI2). Mixed linear models accounting for nesting of observations within applicants examined standardized TI and MMI scores (mean = 0, SD = 1), adjusting for applicant socio-demographics, academic metrics, year, number of interviews, and interview date.
A total of 4993 individuals (completing 7516 interviews [TI = 4137, MMI = 3379]) interviewed at ≥1 school; 428 (14.5%) interviewed at both MMI schools and 687 (20.2%) at more than one TI school. Within schools, inter-interviewer consistency was generally qualitatively lower for TI1, TI2, and TI3 (Pearson’s r 0.07, 0.13, and 0.29, and Cronbach’s α, 0.40, 0.44, and 0.61, respectively) than for MMI1 and MMI 2 (Cronbach’s α 0.68 and 0.60, respectively). Between schools, the adjusted intraclass correlation coefficient was 0.27 (95% CI 0.20–0.35) for TIs and 0.47 (95% CI 0.41–0.54) for MMIs.
Within and between-school reliability was qualitatively higher for MMIs than for TIs. Nonetheless, TI reliabilities were higher than anticipated from prior literature, suggesting TIs may not need to be abandoned on reliability grounds if other factors favor their use.
Unstructured or minimally structured one-on-one traditional interviews (TIs) have long been employed in medical school admissions . A number of reports have raised the concern that low inter-interviewer reliability (i.e., consistency) may limit the ability of TIs to distinguish applicants likely to succeed in training [2, 3]. However, findings of studies examining this issue are mixed, with wide ranges of observed consistency between interview scores (e.g., Pearson’s r correlations 0.22–0.97; generalizability [G] coefficients 0.27–0.58; kappas 0.13–0.70) [1, 2, 4,5,6,7,8,9,10].
Partly due to concerns about inter-interviewer reliability, many schools have replaced TIs with Multiple Mini-Interviews (MMIs), in which applicants work through a series of brief, semi-structured assessment stations, each attended by a different trained rater [3, 11]. Single-school studies examining the MMI in isolation suggest the approach yields moderate to high inter-rater reliability (range of Cronbach’s alphas reported 0.65–0.98; range of G coefficients reported 0.55–0.72), and predicts aspects of subsequent academic performance [3, 12,13,14,15,16].
Based on the foregoing studies, some authors have concluded that MMIs have superior inter-rater reliability as compared with TIs [2,3,4,5,6, 12, 17]. However, prior MMI (and TI) studies have been conducted at single institutions, each employing only one of these interview types. While valuable, such studies have relatively small samples sizes, since at any given school most applicants are not selected for an interview, reducing generalizability. Studies pooling interview data from multiple schools with partially overlapping applicant pools, each inviting a different (though again partially overlapping) subset of applicants to interview, would have larger and more representative samples. Moreover, single-school interview studies have limited utility in comparing the relative reliabilities of MMIs and TIs, due to fundamental differences in designs, analytic approaches, and time frames among studies. Importantly, no studies have concurrently tested whether inter-rater reliability is higher for MMIs than for TIs by examining a common pool of applicants completing both interview types. Furthermore, no studies have examined the between-school reliabilities of MMIs or TIs. As key differences in MMI (and TI) implementation exist among schools,  high between-school reliability of the MMI and TI cannot be assumed.
Using data from the five California Longitudinal Evaluation of Admission Practices (CA-LEAP) consortium medical schools, we examined the within- and between-school reliabilities of MMIs and TIs.
We conducted the study activities from July 2014–April 2016. We obtained ethics approval from the institutional review boards of the participating schools via the University of California Reliance Registry (protocol #683). Because of the nature of the study, neither interviewer nor interviewee consent to participate was required.
Participants were individuals who, during three consecutive application cycles (2011–2013), completed one or more medical school program interviews at CA-LEAP schools. The five CA-LEAP schools, all public institutions, participate in a consortium to evaluate medical school interview processes and outcomes.
Two schools (MMI1 and MMI2) used MMIs, with 10 and 7 individually scored 10-min stations, respectively, generally adapted from commercially marketed content.  At both schools, all stations were multidimensional. Interpersonal communication ability was considered at every station, along with one or more additional competencies (e.g., integrity/ethics, professionalism, diversity/cultural awareness, teamwork, ability to handle stress, problem solving), rated using a structured rating form. At both MMI schools, stations were attended by one rater, except for a single station at MMI2 (two raters). At both schools, raters included physician and basic science faculty and alumni, medical students, and high-level administrative staff. At MMI1, raters also included nurses, patients, lawyers, and other community members. Raters at both schools received 60 min of training before each application cycle; MMI2 raters also received a 30-min re-orientation prior to each MMI circuit. The raters were not given any information about applicants. They interacted directly with applicants at some stations, and observed applicant interactions (e.g., with actors) at others. Raters at both schools assigned a single global performance score (with higher scores indicating better performance), though the scales employed differed between schools (0–3 points at MMI1, 1–7 points at MMI2).
Three schools (TI1, TI2, and TI3) used TIs. At each school, applicants completed two 30–60 min unstructured interviews, one with a faculty member and one with a medical student or faculty member. All interviewers received 60 min of training before each application cycle. At TI1 and TI2, interviewers reviewed the candidate’s application prior to the interview, although academic metrics were redacted at school TI1. TI3 interviewers reviewed the candidate’s application only after submitting their interview ratings. All interviewers rated applicants on standardized scales, though the rating approaches and scales employed differed among schools. At both schools TI1 and TI3, interviewers assigned a single global interview rating, though the scales employed differed (exceptional, above average, average, below average, unacceptable at TI1; unreserved enthusiasm, moderate enthusiasm, or substantial reservations at TI3). At school TI2, interviewers rated candidates on a 1–5 point scale in four separate domains (thinking/knowledge, communication/behavior, energy/initiative, and empathy/compassion), and the domain scores were then summed to yield a total interview score (range 4–20).
The total interview scores were the means of individual station (MMI) or interview (TI) scores, converted to z-scores (mean = 0, standard deviation = 1) based on all scores within a given school and year. Applicant characteristics included age; sex; race/ethnicity category; self-designated disadvantaged (DA) status (yes/no); cumulative grade point average (GPA); and total Medical College Admissions Test (MCAT) score.
Analyses were conducted using Stata (version 14.2, StataCorp, College Station, TX). For the 2012 and 2013 application cycles, the analyses include data from all five schools. For 2011, TI3 provided no data. We first conducted analyses of inter-interviewer (for TIs) or inter-rater (for MMIs) reliability within each institution. For each of the two MMI schools, we examined the internal consistencies of MMI station scores with Cronbach’s α. For each of the three TI schools, we examined both the correlations of TI scores with Pearson’s r, and the internal consistencies of TI scores with Cronbach’s α (the latter reported to facilitate comparisons with the two MMI schools) [20, 21].
Next, we examined the pairwise Pearson correlations among interview scores obtained by applicants who interviewed at more than one school, TI and/or MMI.
Finally, we conducted analyses examining the intraclass correlation coefficients (ICCs) observed between MMI schools and among TI schools. All applicants who interviewed at one or more TI school contributed to the TI ICC analyses, and all applicants who interviewed at one or more MMI school contributed to the MMI ICC analyses. For both MMI and TI analyses, we developed mixed linear models  with applicants as random effects to derive the ICCs for interview z-scores at TI and MMI schools Both the TI and MMI analyses were conducted with and without adjusting for the following (potentially confounding) fixed effects: applicant characteristics (socio-demographics, DA status, and metrics), number of interviews, number of prior interviews, interview date within interview season, and interview year. In each case the ICC of interest (ICC ) was the ratio of the variance component associated with the random effect (applicant) divided by the total variance . The use of mixed models allowed adjustment for the nesting of observations (applicant interviews) within applicants, for those with more than one interview. Simultaneously, the analysis allowed examination of the consistency of performance among the three TI schools and between the two MMI schools (the ICCs).
There were 4993 individuals with at least one interview at a CA-LEAP school during the study period; their socio-demographics and academic metrics are shown in Table 1 (next page). Of these, 3226 (65%), 1180 (24%), 439 (8.8%), 127 (2.5%), and 21 (0.4%) interviewed at one, two, three, four, or all five schools, respectively; 428 (14.5%) interviewed at both MMI schools; 687 (20.2%) interviewed at more than one TI school; and 119 (2.4%) interviewed in more than one year.
The 4993 distinct individuals in the study completed a total of 7516 interviews (4137 TIs and 3379 MMIs); Table 2 shows socio-demographics and academic metrics by interview type. As compared with individuals completing TIs, those completing MMIs were statistically significantly more likely to be from a racial/ethnic minority group, self-designate as disadvantaged, and have lower a cumulative GPA and total MCAT score.
Within schools, correlations between interviewer ratings generally were qualitatively lower for TI1 (r 0.07, α 0.13), TI2 (r 0.29, α 0.40), and TI3 (r 0.44, α 0.61) than for MMI1 and MMI2 (α 0.68 and 0.60, respectively). Between school z-score correlations varied considerably (r range 0.18–0.48), with the highest correlation observed between MMI1 and MMI2 (Table 3, next page).
In an unadjusted analysis, the ICC was higher for MMI schools (0.47, 95% CI 0.40–0.54) than for TI schools (0.30, 95% CI 0.24–0.37). After adjustment for applicant characteristics, application year, and number and temporal sequencing of interview, the ICCs were similar to the unadjusted values, though qualitatively lower for TI schools: 0.27 (95% CI 0.20–0.35) for TI schools and 0.47 (95% CI 0.41–0.54) for MMI schools.
To our knowledge, the current study was the first to concurrently examine the within- and between-school reliabilities of unstructured TIs and of MMIs in a common pool of applicants to multiple medical schools. As such, our findings expand substantively on those of prior studies of admissions interviews, all conducted at single schools, which had smaller and less representative samples and examined only the within-school (but not the between-school) reliabilities of TIs or MMIs (but not both).
We generally found qualitatively higher within-school and between-school reliabilities for MMIs than for TIs. This is reassuring, since one goal of the MMI approach is to increase the reliability of the medical school interview process, and, potentially, predictive validity . Similar ICCs were observed using unadjusted and adjusted mixed models for both MMIs and TIs, indicating little influence of applicant socio-demographics and metrics, prior interview experience, or interview timing on the reliability of either interview approach. The adjusted analyses were important to conduct given statistically significant differences in socio-demographics and academic metrics between MMI and TI participants (Table 2), likely reflecting differing missions and priorities across CA-LEAP schools.
We observed qualitatively lower internal consistency for MMI2 (α 0.60) than for MMI1 (α 0.68). Prior single-school studies have found that increasing the number of MMI stations tends to enhance reliability [12, 24, 25]. Thus, this finding likely reflects the use of only seven stations at MMI2 versus ten at MMI1, and underscores the need for schools adopting an MMI to carefully consider this design choice.
Despite the qualitatively superior between-school reliability of the MMI in our study, the between-school TI reliabilities were better than we had anticipated based on prevailing views [2,3,4,5,6, 12, 17]. These findings suggest that the low inter-interviewer reliability observed for TIs in some (but not all) prior single-school studies may reflect school-specific differences (e.g., interviewer training, degree of process standardization), rather than limitations inherent to the TI approach. In particular, the qualitatively lower between-school reliability for the TI may reflect intentional differences between schools with respect to their goals, a distinction that might be easier to achieve with unstructured TIs as compared with the more standardized MMI approach. Therefore, abandoning TIs on the grounds of qualitatively lower reliability may not necessarily be advisable. This may be particularly true since limited research suggests that the reliability of traditional interviews (within and between schools) might be improved through relatively minor process enhancements. These may include, but are not necessarily limited to, increased standardization of interview questions, and greater efforts to calibrate interviewers (e.g., by providing sample answers for evaluating applicant responses and, within schools, affording opportunity for discussion among interviewers) [1, 26]. Nonetheless, we emphasize that the foregoing comments are speculative, best viewed as hypotheses to be further tested in multi-school studies.
A key strength of our multi-institutional study was the large sample of applicants to five public medical schools in California (one of the most socio-demographically diverse states). Our study also had some limitations. The extent to which the findings may apply to non-CA-LEAP schools is uncertain. From a strict measurement perspective, our assessments of reliability were not pure, since each interview (two at each TI school, 10 stations at MMI1, and 7 stations at MMI2) was conducted by an independent rater assessing an independent encounter. We focused on the within- and between school reliabilities of TIs and MMIs and did not address how differences in TI and MMI reliability may affect their predictive validity – in other words, their association with future clinical rotation performance, licensing examination scores, and other relevant outcomes. It is anticipated that future CA-LEAP studies will address this important issue. As others have also observed,  current evidence for the predictive validity of the MMI stems from single-medical school studies (all conducted outside of the U.S.). Such studies are limited by the lack of concurrent examination of TI validity, and by the relatively small proportion of interviewees who matriculate at any given school. By comparison, in a multi-school consortium pool of interviewees, a relatively higher proportion would be anticipated to matriculate at one of the schools, permitting a more robust examination of MMI predictive validity and concurrent comparison with TI predictive validity.
In conclusion, in analyses of data from a common pool of applicants to five California medical schools, we found qualitatively higher within- and between-school reliabilities for MMIs than for TIs. Nonetheless, the within- and between-school reliabilities of TIs were generally higher than anticipated based on prior literature, suggesting that perhaps TIs need not be abandoned for the sake of reliability concerns, especially if other factors favor their use at a particular institution.
California Longitudinal Evaluation of Admission Practices
grade point average
intraclass correlation coefficient
Medical College Admissions Test
- MMI1 and MMI2:
study schools using an MMI
- TI1, TI2, and TI3:
study schools using TIs
Edwards JC, Johnson EK, Molidor JB. The interview in the admission process. Acad Med. 1990;65:167–77.
Kreiter CD, Yin P, Solow C, Brennan RL. Investigating the reliability of the medical school admissions interview. Adv Health Sci Educ Theory Pract. 2004;9:147–59.
Eva KW, Rosenfeld J, Reiter HI, Norman GR. An admissions OSCE: the multiple mini-interview. Med Educ. 2004;38:314–26.
Albanese MA, Snow MH, Skochelak SE, Huggett KN, Farrell PM. Assessing personal qualities in medical school admissions. Acad Med. 2003;78:313–21.
Siu E, Reiter HI. Overview. what's worked and what hasn’t as a guide towards predictive admissions tool development. Adv Health Sci Educ Theory Pract. 2009;14:759–75.
Salvatori P. Reliability and validity of admissions tools used to select students for the health professions. Adv Health Sci Educ Theory Pract. 2001;6:159–75.
Axelson R, Kreiter C, Ferguson K, Solow C, Huebner K. Medical school preadmission interviews: are structured interviews more reliable than unstructured interviews? Teach Learn Med. 2010;22:241–5.
Lumb AB, Homer M, Miller A. Equity in interviews: do personal characteristics impact on admission interview scores? Med Educ. 2010;44:1077–83.
Patterson F, Knight A, Dowell J, Nicholson S, Cousans F, Cleland J. How effective are selection methods in medical education? A systematic review. Med Educ. 2016;50:36–60.
Harasym PH, Woloschuk W, Mandin H, Brundin-Mather R. Reliability and validity of interviewers’ judgments of medical school candidates. Acad Med. 1996;71(1 Suppl):S40–2.
Glazer G, Startsman LF, Bankston K, Michaels J, Danek JC, Fair M. How many schools adopt interviews during the student admission process across the health professions in the United States of America? J Educ Eval Health Prof. 2016;13:12.
Rees EL, Hawarden AW, Dent G, Hays R, Bates J, Hassell AB. Evidence regarding the utility of multiple mini-interview (MMI) for selection to undergraduate health programs: a BEME systematic review: BEME guide no. 37. Med Teach. 2016;38:443–55.
Eva KW, Reiter HI, Rosenfeld J, Trinh K, Wood TJ, Norman GR. Association between a medical school admission process using the multiple mini-interview and national licensing examination scores. JAMA. 2012;308:2233–40.
Eva KW, Reiter HI, Rosenfeld J, Norman GR. The relationship between interviewers’ characteristics and ratings assigned during a multiple mini-interview. Acad Med. 2004;79:602–9.
Dowell J, Lynch B, Till H, Kumwenda B, Husbands A. The multiple mini-interview in the U.K. context: 3 years of experience at Dundee. Med Teach. 2012;34:297–304.
Pau A, Jeevaratnam K, Chen YS, Fall AA, Khoo C, Nadarajah VD. The multiple mini-interview (MMI) for student selection in health professions training - a systematic review. Med Teach. 2013;35:1027–41.
Norman G, Eva K, Kulasegaram M. The possible impact of the MMI and GPA on diversity. Acad Med. 2013;88:151.
Knorr M, Hissbach J. Multiple mini-interviews: same concept, different approaches. Med Educ. 2014;48:1157–75.
Advanced Psychometrics for Transitions Inc. Welcome to ProFitHR. 2016. http://www.profithr.com/. Accessed 3 Mar 2017.
de Vet HCW, Mokkink LB, Mosmuller DG, Terwee CB. Spearman-Brown prophecy formula and Cronbach’s alpha: different faces of reliability and opportunities for new applications. J Clin Epidemiol. 2017;85:45–9.
Warrens MJ. Transforming intraclass correlation coefficients with the spearman-Brown formula. J Clin Epidemiol. 2017;85:14–6.
Cronbach LJ. My current thoughts on coefficient alpha and successor procedures. Educ Psychol Meas. 2004;64:391–418.
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–8.
Dodson M, Crotty B, Prideaux D, Carne R, Ward A, de Leeuw E. The multiple mini-interview: how long is long enough? Med Educ. 2009;43:168–74.
Hissbach JC, Sehner S, Harendza S, Hampe W. Cutting costs of multiple mini-interviews - changes in reliability and efficiency of the Hamburg medical school admission test between two applications. BMC Med Educ. 2014;14:54.
Levashina J, Hartwell CJ, Morgeson FP, Campion MA. The structured employment interview: narrative and quantitative review of the research literature. Pers Psychol. 2014;67:241–93.
The authors wish to thank the following individuals for their invaluable administrative support contributions throughout the study: Hallen Chung; Kiran Mahajan; Eileen Munoz-Perez; Melissa Sullivan; and Sarika Thakur.
This work was supported by the Stemmler Medical Education Research Fund, National Board of Medical Examiners (NBME); and by the Health Resources and Services Administration (HRSA) of the U.S. Department of Health and Human Services (HHS) under grant number UH1HP29965, Academic Units for Primary Care Training and Enhancement, for $3,741,116. This information or content and conclusions are those of the author and should not be construed as the official position or policy of, nor should any endorsements be inferred by NBME, HRSA, HHS or the U.S. Government. The funders had no role in the design of the study or collection, analysis, and interpretation of data, and no role in writing the manuscript.
Availability of data and materials
The datasets used during the current study are available from the corresponding author on reasonable request.
The study findings were presented in part at the North American Primary Care Research Group 2016 Annual Meeting, November 12–16, 2016, Colorado Springs, Colorado.
Ethics approval and consent to participate
On May 19, 2014, the authors obtained ethical approval to conduct the study from the institutional review boards of the participating schools, via the University of California Reliance Registry (protocol #683). Because of the nature of the study, neither interviewer nor interviewee consent to participate was required.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Jerant, A., Henderson, M.C., Griffin, E. et al. Reliability of Multiple Mini-Interviews and traditional interviews within and between institutions: a study of five California medical schools. BMC Med Educ 17, 190 (2017). https://doi.org/10.1186/s12909-017-1030-0
- Interview as topic
- Multiple mini-interview
- Reproducibility of results
- School admission criteria
- Schools, medical