A psychometric evaluation of the Gender Bias in Medical Education Scale

Background Gender bias within medical education is gaining increasing attention. However, valid and reliable measures are needed to adequately address and monitor this issue. This research conducts a psychometric evaluation of a short multidimensional scale that assesses medical students’ awareness of gender bias, beliefs that gender bias should be addressed, and experience of gender bias during medical education. Methods Using students from the University of Wollongong, one pilot study and two empirical studies were conducted. The pilot study was used to scope the domain space (n = 28). This initial measure was extended to develop the Gender Bias in Medical Education Scale (GBMES). For Study 1 (n = 172), confirmatory factor analysis assessed the construct validity of the three-factor structure (awareness, beliefs, experience) and enabled deletion of redundant items. Study 2 (n = 457) tested the generalizability of the refined scale to a new sample. Combining Study 1 and 2, invariance testing for program of study and gender was explored. The relationship of the GBMES to demographic and gender politics variables was tested. The results were analyzed in R using confirmatory factor analysis and Multiple-Indicator-Multiple-Indicator-Cause models. Results After analysis of the responses from the original 16-item GBMES (Study 1), a shortened measure of ten items fitted the data well (RMSEA = .063; CFI = .965; TLI = .951; Mean R-square of items = 58.6 %; reliability: .720–.910) and was found to generalize to a new sample in Study 2 (RMSEA = .068; CFI = .952; TLI = .933; Mean R-square of items = 55.9 %; reliability: .711–.892). The GBMES was found to be invariant across studies, gender, and program of study. Female students and those who supported gender equality had greater agreement for each of the factors. Likewise, postgraduate students reported higher scores on experience of gender bias than undergraduate students. Conclusion The GBMES provides a validated short multidimensional measure for use in research and policy. Given its good reliability across different target populations and its concise length, the GBMES has much potential for application in research and education to assess students’ attitudes towards gender bias. Electronic supplementary material The online version of this article (doi:10.1186/s12909-016-0774-2) contains supplementary material, which is available to authorized users.


Background
Within the field of medicine, gender has been shown to be an ongoing factor contributing to health disparity [1][2][3]. Gender bias in medicine commonly occurs through the unequal treatment and diagnosis of a patient based on their sex and/or gender [4][5][6]. Stereotypical assumption about gender, as well as a lack of research and knowledge about sex-based differences, are also forms of gender bias that can negatively affect the medical diagnosis, treatment, and management of patients [7][8][9][10].
Numerous experts have argued that gender bias can be prevented in healthcare by incorporating gender issues into medical education [5,6,11]. However, in order to examine these issues, researchers and practitioners need access to an assessment tool that can be used to obtain quantifiable information on the extent and nature of students' relationship with gender bias in medical education. To our knowledge there is currently no psychometrically validated, multidimensional scale that addresses this area of research interest. The closest measure is the Dutch Nijmegen Gender Awareness in Medicine Scale (N-GAMS) [12], which does not target issues related to medical education. To date, the most similar quantitative research is that of Morgan et al. [13] who included a few Likert and open response type items in their research on sexism in anatomy education. Their work was an important inspiration to the current research. However, given their purpose was not to develop a quantitative scale, this work did not hypothesize an underlying theoretical structure for their items, these items did not form a scale, and there was no evaluation of the psychometric properties of the items used. To address this need, a short multidimensional measure of gender bias was developed which could be used with medical students, and which covers participants' awareness of gender bias, beliefs about how gender bias should be addressed, and experience of gender bias in a medical education context. We used Morgan et al.'s [13] items as inspiration and a starting point to develop a fully articulated multidimensional instrument. We suggest that research into gender bias in medical education must consider not only students awareness of gender bias, but also their experiences of gender bias, and their beliefs about how gender bias should be addressed.
Within this study, awareness is defined as a cognitive acknowledgement that gender bias exists; belief contains an attitudinal component that reflects a participant's desire that gender bias be challenged or confronted and; experience includes being confronted with gender bias on either a first or second hand basis. Previous studies have aimed to measure student awareness of, or attitudes toward, gender in a medical context [12][13][14]. However, only one study [13] explored issues related to student's belief about how gender issues should be addressed, while no study measured student's direct and indirect experience of gender bias. While increased gender awareness among students has been shown to improve health equality in some instances [12], Hamberg [5] notes, "more knowledge does not eradicate the problem of knowledge-mediated bias or bias owing to notions and stereotyped ideas about men and women" (p. 241). Thus, students need to be aware not only of gender issues in their education, but also of the existence and outcomes of gender bias more broadly in medical practice. Further, an awareness of gender issues does not necessarily lead to a belief that action should be taken to prevent gender bias. Therefore, alongside ascertaining student beliefs about gender bias in medical education, it is important to also assess their beliefs about whether the risks of gender bias should be addressed in this context. Indeed, research has shown that the beliefs and attitudes of medical students, teachers, and practitioners can impact their behavior [15][16][17][18]. For this reason, attitudinal outcome measures are an important tool for examining medical student beliefs about if and how gender bias should be addressed during medical education. Lastly, while the effects of experiencing gender discrimination have been explored previously, these studies have focused on direct experiences of discrimination rather than on the broader experience of whether this exists in a medical education environment both directly and indirectly [19][20][21]. Importantly, research suggests that simply observing instances of discrimination can have a negative impact on psychological wellbeing, not only among marginalized groups, but also among non-target groups (e.g., negative effects on males observing sexism against females) [22]. Therefore, a general measure of experience of gender bias, covering both direct and indirect forms, is needed. Taken together, a measure of gender bias in medical education needs to address the degree to which students perceive medicine as male dominated, the degree to which they believe gender bias should be directly addressed in medical education, and their own experience of gender bias while undertaking medical coursework.

Current study
Investigation into the existence and effects of gender bias is important for all facets of medical culture. Research with medical professionals and patients has begun to explore this issue and yet the significance of gender bias within medical education appears to be relatively underdeveloped. This is despite its critical role in the formation of attitudes and behaviors. Specifically, there is a paucity of measures that assess medical student reports of gender bias in their education, particularly ones that are multidimensional in nature. A lack of literature in this area highlights the need to explore medical students' awareness, beliefs, and experiences of gender-related bias in their education. Correspondingly, this research aimed to develop and test a multidimensional measure designed to assess these three domains of awareness, belief, and experience in relation to gender bias. This measure was designed to be small and easily administered so that it could easily fit within larger research projects and be used by medical education programs. The aim of the study was to conduct a psychometric evaluation on a short, multidimensional measure of gender bias.

Methods
The scale developed in this paper was designed for students studying anatomy as part of their university level education in medicine or allied health. The current research undertook a psychometric evaluation of the scale using a combination of exploratory and confirmatory factor analysis. The research consisted of one pilot study, then an initial validation (Study 1) and a replication study (Study2). The pilot, validation, and replication studies were all done with independent, cross-sectional samples of the same population of students in NSW Australia. The samples from Studies 1 and 2 were then combined to undertake invariance testing and to explore predictors of the scale factors. Invariance testing aimed to explore whether the scale operated in a similar manner for important groups (e.g., males and females).
A construct validity perspective was used by exploring both within-and between-network validity [23]. Within-network validity reflects the degree to which the hypothetical structure underlying the measure is reflected in the data collected. This was assessed with reference to the fit of the model (i.e., does the theoretical model fit the data), invariance tests (i.e., does the model act similarly across gender and program of study as expected), and exploration of the latent correlation matrix (i.e., is there sufficient evidence that the factors are measuring related but distinct factors as would be expected by a multi-dimensional instrument). Between-network validity considered the degree to which the factors in the scale are related to other factors in expected directions. Here we considered gender and gender politics.

Participants
All data was collected from students studying anatomy at the University of Wollongong over a 2-year period, after ethics approval was obtained (University of Wollongong Human Research Ethics Committee, HE14/130). A pilot study to initially test and validate the measure was undertaken with a sample of 28 students from a third year undergraduate health science class in June 2014. The students completed the pilot instrument along with some demographic items during class time via pen-and-paper.
Two empirical studies (Studies 1 and 2) were then conducted with larger and more diverse populations that included both undergraduate health science students and postgraduate medical students to ensure that the measure was relevant and invariant across the entirety of medical and pre-medical education. Study 1 aimed to shorten the pilot instrument from 16 items to a target of 8 to 12 items, and consisted of a sample of 172 participants. Participants received an email link to the study via their university email. Study 1 targeted both undergraduate and post-graduate students. Participants completed the measure in their own time during September 2014.
The aim of Study 2 was to validate the short measure and consisted of a sample of 457 participants. Students enrolled in the first year anatomy course and in graduateentry medicine at the University of Wollongong were invited to participate. Participants took part in the study during lab sessions from the period of April to May in 2015.

Materials
The pilot study questionnaire was developed based, in part, on the work by Morgan et al. [13]. The items were used to explore the domain space and tapped content related to the factors defined in the literature review: awareness of gender bias in medical education, belief in the need to address gender bias in medical education, and the experience of gender bias during medical education. For each item, respondents selected a response from a 6-point Likert scale with options of strongly disagree (1), disagree (2), somewhat disagree (3), somewhat agree (4), agree (5), and strongly agree (6).

Scale development
We requested access to the questionnaire developed by Morgan et al. [13] as it was the only previous study to explore gender bias in medical education from students' perspectives. Within the questionnaire, six relevant themes were identified that were re-worked to facilitate a consistent attitudinal response format (i.e., agreedisagree Likert items; see below for details). Following the pilot study, item feedback from participants was provided. Students were instructed to highlight any words they did not understand and make notes on the page as to their thought processes while completing the items. The questions were then adjusted as a result. Finally, further items were developed in order to cover a greater breadth of the medical education domain space. As such, while the developed scale is clearly in line with many of the themes of the Morgan et al. study, all items is the scale were uniquely derived for this research.

Data analysis
The factor structure of the full 16-item model was examined in Study 1 and then confirmed in Study 2 using confirmatory factor analysis. As a brief measure was desired, we followed the procedure outlined by Marsh et al. [23] for identifying candidate items for deletion. Specifically, candidate items for deletion were those with low loadings on the target factor, modification indices that suggested large loadings on non-target factors, and large residuals or modification indices that suggested correlated item residuals. The goal of using this procedure was to obtain an instrument that was short, provided a good fit to the data, and had factors that were distinguishable. Confirmatory factor analysis was also used in the two empirical studies to ensure construct validity and, via Study 2, to confirm that the shortened scale generalized to other samples. In all cases, data analysis was conducted in R [24] with major analysis undertaken using the lavaan package [25].
Following typical guidelines [26,27], models were considered to fit the data well if: (a) the solution was welldefined, (b) parameter estimates were consistent with the theory proposed, and (c) the fit indices were acceptable, with an emphasis on those fit indices that did not favor small sample sizes. We thus report multiple indices in addition to the model chi-square because of its sensitivity to sample size. Based on commonly accepted criteria, Tucker-Lewis Index (TLI) ≥ .90, Comparative Fit Index (CFI) ≥ .90, and RMSEA < .08 were considered to provide evidence of model fit. Reliability of the factors was determined directly from the relevant CFA model using McDonald's Omega 1 . Unlike Cronbach's Alpha, Omega represents a true greatest lower bound on reliability [28].
In the final step of the analysis, we combined Studies 1 and 2 to explore invariance across study, gender, and program type (i.e. undergraduate or postgraduate). Invariance analysis fits and then compares measurement models in different sub-populations in order to ascertain whether the measurement structure is equivalent. That is, that the measure performs similarly in different groups [29,30]. Evidence of invariance comes from comparing a well-fitting baseline model to alternate nested models. In such cases the sensitivity to the sample size of the chi-square does not merely relate to model fit but also to chi-square difference tests between nested models. Thus, we used the criteria proposed by Cheung and Resvold [31] and Chen [32] who suggested that invariance assumptions are supported when the difference between nested models corresponds to a ΔCFI ≤ .01 (we utilize the same criteria for the TLI) and a ΔRMSEA ≤ .015.

Results
Demographic information and response rates for the pilot, initial validation (Study 1), and replication (Study 2) studies can be found in Table 1.

Pilot study
Parallel analysis with exploratory factor analysis using data obtained from the pilot study was undertaken. This approach provided an analytical means (as opposed to a visual inspection of a scree plot of eigenvalues) of determining the appropriate number of factors in an exploratory factor analysis. In parallel analysis, eigenvalues from the observed data were compared to eigenvalues from random data of the same size. Factors that explained more variance than factors extracted from random data were retained [33]. For the pilot data, parallel analysis suggested three factors.
Based on an Oblimin (oblique) rotation, factor one accounted for 22 % of the variance in the items and related to awareness of gender bias in medical education (hereafter "awareness"; loadings = .44-1.00). Factor two accounted for 25 % of the variance in the items and related to whether participants believed medical education should explicitly address issues of gender bias (hereafter "belief"; loadings = .62-1.02). Lastly, factor three accounted for 25 % of the variance and related to whether participants themselves had experienced gender bias (hereafter "experience"; loadings = .57-1.01). Importantly, these results were consistent with the hypothesized factor structure and the correlation between factors was moderate ranging from r = .39 for awareness and belief to r = .12 for awareness and experience.
Following from this pilot study, the questionnaire was refined to more adequately cover the scope of the three factors. Six items were developed for gender bias awareness, and five items were developed for each of gender bias belief and experience, making a total of 16 items. This measure is referred to as the Gender Bias in Medical Education Scale (GBMES). More items were developed than needed with the goal of creating a short scale with a minimum of three and a maximum of four items per scale. This was done with the aim to develop a well-validated short form with clearly distinct factors that could easily fit into future studies without undue burden on participants (final items can be found in Table 2 and the item pool can be found in Additional file 1).

Study 1
Study 1 aimed to explore the initial factor structure and develop a short measure of the GBMES scale. Data consisted of a sample of 172 participants (59 % female; mean age of 25; 64 % of who were postgraduates). Submitting the hypothesized three-factor model to these data resulted in a poor fit (see Model 1, Table 3). Inspection of the factor loadings and modification indices suggested a number of poor loading items, redundant items (large correlated residual), and some items with moderate cross-loadings. Following the procedure for short form development by Marsh et al. [23], two items were removed from each of the scales. Following this, the revised 10-item measure (four awareness items, three belief items, and three experience items) was submitted to the data and displayed an excellent fit (Model 2, Table 3). Importantly, all factor loadings were above .50 and omega reliability estimates were acceptable for each of awareness (.720), belief (.879), and experience (.910). Furthermore, correlations between the three factors were moderate, indicating that they tapped related but different aspects of a common gender bias core. This reduced, 10-item measure represents the final scale (see Table 2 for items). However, it was possible that this reduced scale would not generalize to other samples, so this 10-item GBMES was therefore tested in a different student population.

Study 2
Study 2 consisted of data from a sample of 457 medical students (57 % female; mean age of 21; all undergraduates) approximately 6 months after Study 1. The short form was fitted to these data using CFA, and displayed a good fit (Model 3, Table 3). Again, loadings were all above .50, correlations between factors were moderate, and omega reliability estimates were acceptable (awareness: .720, belief: .880, and experience: .910).

Invariance tests across important groups
As a further test of the construct validity of the scale across studies, we combined the samples from Studies 1 and 2 to ensure adequate power and conducted invariance tests. Under the criteria for invariance noted above, there was evidence of configural (measurement structure equivalent across groups), weak (structure and factor loading equivalent across groups), strong (structure, factor loadings, and item intercept equivalent across groups), and strict (structure, factor loadings, item intercept, and item residuals equivalent across groups) factorial invariance across the two studies with little change in the fit indices. This suggested that the 10-item scale provided similar fit in both empirical studies. Evidence of invariance was likewise found for gender. For program of study results, configural weak, and strong invariance was supported but not strict invariance (Table 4). Taken together, there was evidence that the measurement structure of the 10-item GBMES operated in a similar manner across studies, genders, and programs of study when considering latent variables. However, for program of study the lack of invariant item residuals suggests that caution should be used when comparing across programs when using manifest variables [34]. The consistent support for strong measurement invariance is an important requirement for comparing latent means as it provides evidence that such tests are comparing common measures (i.e., measures that are interpreted in a similar manner by both groups) [30]. On this basis we considered latent mean invariance. As can be seen in Table 4 there was a large change in fit indices   Notes. CFA confirmatory factor analysis when we constrained means to be equivalent across groups suggesting there were differences in latent means between Study 1 and 2, gender, and program of study ( Table 5). As can be seen from Table 6, females had statistically significantly higher means than males for each of awareness, beliefs, and experience of gender bias. There were fewer differences by program of study, but postgraduate students did have statistically significantly higher means on experience of gender bias in their medical education compared with undergraduate students. Interestingly, there was evidence of differences in latent means between the two empirical studies, with participants from Study 1 displaying statistically significantly higher means on all three factors.

Predictors of GBMES
We were also interested in whether student responses on the GBMES differed by their age or their general agreement with gender politics. Gender politics was measured using a single item "I am supportive of gender equality" measured on the same 6-point Likert scale as the GBMES (where high scores demonstrated greater agreement). As these were ordinal variables, typical multi-group invariance models were not possible. Instead, a Multiple-Indicator-Multiple-Indicator-Cause model was run to test whether awareness, beliefs, or experience were predicted by either sympathy with gender politics or age. Results showed that age had no effect on any of the factors. Yet sympathy with gender politics had a significant relationship with all factors with greater agreement predicting higher scores on the awareness, belief, and experience scales.

Discussion
The aim of the current study was to develop a short multidimensional measure of gender bias in medical education. As a result of a pilot study and two empirical studies, we developed a final 10-item measure named the Gender Bias in Medical Education Scale (GBMES) that showed good construct validity and reliability. Relationships between the scales and demographic variables were consistent with expectations. Similar to findings by Morgan et al. [13], females and those who supported gender equality were more likely to be aware of gender bias in medical education and to believe that gender bias  should be addressed during education. However, the current study also found that females were more likely to report experiences of gender bias. This is consistent with a number of studies that have found that female medical students experience gender discrimination and sexism during their education and training [20,[35][36][37][38]. Again, those who reported greater sympathy with gender politics also reported a higher experience of gender bias. Likewise, students in postgraduate courses who thus had a longer tenure in medical education in general reported experiencing more gender bias. Importantly, age was not a significant predictor any of the factors and thus differences in program likely reflect time in medical education rather than natural development in political views over the lifetime. This likely explains the finding that participants from Study 1 displayed higher scores on all three factors since the majority of Study 1 participants were postgraduate students while Study 2 consisted mainly of undergraduate students in their first year of study. This study revealed that a number of anatomy students were aware of gender bias in medical education, believed it should be addressed during education and had experienced gender bias themselves during their education. These results highlight the fact that medical education provides a unique opportunity to influence future healthcare providers by educating students on issues of gender and gender bias [5,6,11]. Beyond introducing gender issues into the medical curriculum, existing gender bias also needs to be eliminated from educational material. Gender bias has been demonstrated in medical textbooks [13,[39][40][41], medical curricula [42][43][44], and other educational tools and materials [4,45], and exposure to gender bias has been shown to negatively influence an individual's attitudes and decision-making [46][47][48]. Studies have also shown that medical educators often view gender issues as low priority topics in education [49][50][51]. The elimination of gender bias within medical education will provide students with fewer opportunities to adopt negative attitudes towards gender-related issues [5]. The GBMES is one way of monitoring students' current perspectives of gender bias in medical education materials in order to ascertain when and where intervention would be best suited. Further, it can be used to highlight the importance of gender issues to medical educators and encourage them to prioritize it.
Importantly, while research has identified the potentially dangerous implications of gender bias for patients, a highly gender-biased medical culture has important implications for young physicians [19][20][21]. Research has shown that medical students can experience gender bias through discrimination and harassment [38,[52][53][54] and that these experiences often have an impact on their career opportunities and expectations [19][20][21]52]. The GBMES provides a tool for much needed research on the extent to which students experience gender bias during medical education and the implications of this. Given the brevity of the scale, the GBMES can easily be incorporated into broader research projects providing greater scope to consider the many predictors and outcomes that gender bias in medical education may have. Likewise, the GBMES could be used to monitor programs designed to address gender bias in medical education.

Limitations
The current research has many strengths; however, several limitations and avenues for future research need to be considered. First, while we considered the generalizability of the GBMES across gender and program of study, it should be noted that all three samples were taken from a single Australian university. As such it is critical that future research considers the degree to which the GBMES works with different samples from other institutions and countries. Second, the primary focus of the current research was on construct validity. While we did explore the effects of demographic data and sympathy with gender politics on the three factors of the GBMES, greater research on convergent and divergent validity are required. In particular, future research could consider the degree to which responses on the GBMES reflects general political beliefs and experiences versus those specific to the medical context. Finally, although three unique samples were used to ensure the scale replicated, no longitudinal data was collected and thus the stability of the measure over time has not been estimated.

Conclusion
Analysis of two independent samples indicated that the GBMES provided a valid and reliable short multidimensional measure that was invariant across key demographics. The awareness, belief, and experience factors of the GBMES were distinct and related to gender, gender politics and years in education in expected directions. On this basis we suggest that the GBMES is an important tool for monitoring students' awareness, beliefs, and experiences of gender bias during medical education and as a means of evaluating efforts to improve the representation of gender in medical education.
Endnotes 1 There are quite serious issues with alpha which suggests alternatives should be considered [55]. It has been shown that McDonald's Omega gives a far better estimate of the lower bound of reliability and is a less biased estimate of the true reliability [28].