Scales to evaluate developmental stage and professional identity formation in medical students, residents, and experienced doctors

Background To respond to the growing need to cultivate medical trainees with professional identity, it is necessary to evaluate professional identity formation (PIF) in medical trainees to understand their state of PIF and apply this to medical education. Previous qualitative studies indicated that Kegan’s human development model could explain medical trainees’ PIF. I proposed a development scale (DS) to quantitatively evaluate the degree of maturation and socialization as a physician; however, one scale is not enough to illustrate the helical and complex process of development. Methods Using Kegan’s model as the conceptual framework, scales that evaluate stage 2, 3, and 4, and higher stage-specific attributes were developed using data collected in a self-administered questionnaire (322 respondents), reliability analysis, group comparison, and analysis of individual DS scores. The respondents were 4th- and 6th-year medical students and 2nd-year residents at Kagoshima University, and experienced medical doctors (instructors). Results In addition to the DS, one self-administered questionnaire consisting of 27 items for stage 2, 3, 4, and higher stage-specific attribute scales was created. Students had the highest mean score in stage 2, and instructors had the highest mean score in stage 4 and higher stage scales. Individual analysis indicated that there were respondents with varied attributes in each group, that the average medical student might have inclusion preference typically seen at stage 3, and that the average instructor might have independent preference typically seen at stage 4 more than inclusion preference. Conclusions Combining multiple stage attribute-specific scales and DS scores could quantify the complexity and divergent processes of PIF. These scales could provide meaningful information about individuals, groups, and education in terms of professional development that is different from assessment data of medical knowledge or professional skills.

Based on human development theories by Piaget, Kohlberg, Loevinger, Maslow, McCleland, Murray, Ericson and others, Kegan proposed a life-long developmental framework of the self into a moral and meaningmaking entity [16]. Kegan illustrated changes in individual sense, perspectives, values, emotional control and reflection as part of the process of the development of relationships with others and society. Kegan's model represents a 6-stage helical pathway of evolutionary truces that people follow in a psychological reciprocal process of favoring inclusion with and independence from others during their development (from stage 0 to 5) (Fig. 1).
Previous qualitative research indicated that Kegan's model [16] can explain the development process in dentistry [1], the military [17,18], and the legal profession [19], and theoretically, the model can also be applied to medical trainees [7,13]. Previous qualitative research indicates that medical trainees are supposed to be in stages 2 to 4 [4,7,13]. Based on these studies, one scale (the Developing Scale, DS) to evaluate the overall degree of personal and professional development was developed [20]. This scale evaluates self-control as a professional, awareness of being a medical doctor, reflection as a medical doctor, execution of social responsibility, and external and internal self-harmonization.
Even though the DS could be a useful scale for evaluating PIF, one scale could not satisfactorily represent the helical and complex process of PIF, and attributes that characterize people at different developmental phases should be independently evaluated when determining PIF of individuals and target groups.
The purpose of this study was to develop scales to evaluate Kegan's stage-specific attributes, and attempt to reveal medical trainees' individual PIF, as well as group diversity.

Method
To illustrate target individual and group PIF, four scales that cover the different stages of Kegan's model were developed (Fig. 1). The following assumptions were used: evaluation of specific attributes for Kegan's stage 2 to 5 would cover the entire process of PIF in medical trainees [4,7,13], and lower stage-specific attributes decrease Stage-specific attribute scales (SASs) were developed simultaneously along with the DS as follows: 1) an initial item pool with items that cover attributes from Kegan's stage 2 to 5 (referencing previously reported manifestations of people at stage 2, 3, and 4 in the context of medical training and practice) was created; 2) a pilot questionnaire with essential and common items with medical context using items selected from the initial item pool was created; 3) the pilot questionnaire was administered to medical students, residents, and experienced medical doctors; 4) respondent data from the pilot questionnaire were used to elucidate item sets for proposed SASs using a reliability coefficient, and 5) means of proposed item sets from the different respondent groups were compared to confirm proposed SAS scores indicating development. Following this procedure, four SASs were developed.

Initial item development
To create items for the four SASs, descriptions of medical trainees' personal characteristics and behaviors or attitudes manifested in a professional context cited in previous studies [13,17] were used.
To assess stage 2-specific attitudes and behaviors (stage 2-specific attribute scale: SAS-2), items describing an individual who took into account the views of others but whose own needs and interests predominate, whose norms were external rules, whose self-reflection was low, and whose emotions could overwhelm reason were used. To assess preference of inclusion typically seen at stage 3 (stage 3-specific attribute scale: SAS-3), items describing an individual who was able to view multiple perspectives simultaneously and subordinate self-interest and who was concerned about how others perceive him/her were used. To assess preference of independence typically seen at stage 4 (stage 4-specific attribute scale: SAS-4), items describing an individual who could assume a role and enter into relationships while assessing them in terms of selfauthored principles and standards and who could define him/herself independently of others were used. To assess attributes expected at stage 4 or higher (stage 4 or higherspecific attribute scale: SAS-h), items describing an individual who clearly recognized professional roles; whose reason was in full control over needs, desires, and passion; who did not perceive him/herself as having a single identity and was open to other influences were used.
After creating and rewriting the items, 31 items to be used for the next round of data collection were selected. They consisted of 11 items for SAS-2, eight items for SAS-3, five items for SAS-4, and seven items for SAS-h. Of these SAS candidate items, 23 items for SAS-2, 4, and h were also used as DS candidates. Fifteen (items 1-15) satisfied the DS criteria and were used in the DS [20].
The questionnaire was self-administered and anonymous. Each item was scored on a 7-point Likert scale that ranged from 1 (completely inapplicable) to 7 (greatly applicable), and 4 was neutral. The questionnaire also asked about demographic characteristics (gender, age), as well as work experience and position for instructors.

Data collection
From July 2016 to March 2018, the printed questionnaire was distributed by hand to 4th-year medical students about to start their clinical clerkship courses, 6th-year medical students who finished 1.5 years of all clinical clerkship courses, and residents in the last month of the 2-year residency program at Kagoshima University. The author did not have a direct relationship (i.e., instructor or supervisor) with any of the medical students and residents participating in this study at the time of data collection. The questionnaire was also distributed by mail to experienced medical doctors working in community hospitals or private clinics in Kagoshima Prefecture who engaged in undergraduate medical education as senior instructors. Questionnaires were anonymous and were returned by postal mail in January 2017 using the return envelope provided with the questionnaire.

Data analysis for scale development
To develop each SAS, the reliability (Cronbach's alpha) of candidate items was analyzed and reliable item sets were explored.
After item sets for all scales were fixed, confirmation of whether the lower stage-specific and higher stage-specific attribute scales could differentiate between different developing groups was performed using the average scores of the SASs in the four respondent groups. Furthermore, medical trainees' PIF that SASs could provide, such as the stage of the respondent groups and diversity among groups was analyzed.
SPSS version 23 (IBM, New York, NY) was used for all data analyses.

Demographic characteristics of the respondents
The same data were used as for the DS development [20]. Prior to the analysis, 14 respondents who chose option 4 (neutral) as the response for 27 items or more (87%) or for 23 sequential items (74%) were excluded as invalid data. Data for a total of 322 respondents (response rate 53.7%), including 118 (response rate 47.8%) 4th-year medical students and 120 (response rate 51.5%) 6th-year medical students at Kagoshima University

Development of SASs
Using the items related to key attributes of each stage, item sets with the highest Cronbach's alpha for SASs were explored. Cronbach's alpha for the proposed SAS-2, SAS-3, SAS-4, and SAS-h were 0.66 (11 items), 0.53 (six items), 0.61 (three items), and 0.63 (six items), respectively (Table 1, Additional file 1). Items 1 to 14 were identical to the DS items, and the coding direction differed depending on the scales. Table 2 shows average scores of proposed SASs as well as total DS scores of the 15 items for each respondent group. Medical students' SAS-2 mean scores were higher than those of residents and instructors, and instructors' SAS-4 and SAS-h mean scores were higher than those of students and residents. Instructors had increasing scores from the lowest stage (SAS-2) to highest stage (SAS-h).

Confirmation of SASs by comparing respondent groups
Univariate analysis of variance of each SAS score indicated that respondent group was a significant variable for SAS-2 score and SAS-h score (SAS-2 score, p = 0.03; SAS-3 score, p = 0.70; SAS-4 score, p = 0.37; SAS-h score, p < 0.01) whereas gender was not a significant variable in any of the four SASs (SAS-2 score, p = 0.10; SAS-3 score, p = 0.07; SAS-4 score, p = 0.33; SAS-h score, p = 0.87).

Confirmation of SASs using DS score
Since the SASs and DS utilized the same items, SAS-2 scores should negatively correlate and SAS-h scores should positively correlate with DS scores.
To analyze whether all four SAS scores were related to the DS, which evaluates overall maturation and socialization related to PIF, as theoretically expected, I examined SAS scores in the five DS score classifications (Fig. 2). As for DS scores of 54 or less (n = 7), SAS-2 mean score was the highest among the four SASs, whereas SAS-4 and SAS-h were both lower than 4 (i.e. neutral). As for DS mean scores of 85 or more (n = 21), SAS-4 and SAS-h were higher than 5.5, and SAS-h was the highest and SAS-2 was the lowest among the four SASs. As DS score increased from 55 to 84, SAS score transitioned from a pattern of high SAS-3 mean score to high SAS-4 and SAS-h mean scores and low SAS-2 mean scores.

Comparison of individual SAS scores
To clarify the scale function at the individual level and the appropriateness of applying the scales to groups, I analyzed individual scores for SASs with the same DS scores. Figure 3 shows the four SAS scores of respondents whose DS scores were 60, 65, 70, 75, 80, 85, or 90. If there were several respondents with these DS scores in each respondent group, two respondents from each group were randomly selected. For example, among a total of 16 respondents with a DS score of 70 (four 4th-year medical students, eight 6th-year medical students, three residents, and one instructor), seven respondents' highest score was in SAS-3 (e.g. ID = 249, 49, and 312), three respondents' highest score was in SAS-4 (e.g. ID = 214, 183, and 58) and three respondents' highest score was in SAS-h (e.g. ID = 295) (Fig. 3).
Actually, the scores and patterns of SASs varied among individuals with the same DS score among respondents in the same group. Independent from DS score, some respondents showed high SAS-3 and SAS-h scores with low SAS-4 scores (inclusion pattern; e.g. ID = 117, 266, 98, 195, and 94), while others showed high SAS-4 scores with low SAS-3 and SAS-h scores (independent pattern; e.g. ID = 214, 163, and 252).
There were also tendencies indicated by the group mean scores. Many respondents with DS scores of 60 or 65 had a highest SAS-3 score among the four SASs, and most respondents with a DS score of 75 or higher had a highest SAS-4 or SAS-h score among the four SASs.

Discussion
Previous research [7,13] indicated that Kegan's model could explain individual medical trainees' personal and professional development. One advantage of using this model as a conceptual framework for scale development is its applicability to individuals in any program and specialty, or in any position, because it describes general lifelong human development in relationships with others and society [16]. However, because a helical and reciprocal process was proposed, one scale score might not adequately represent the complex and divergent pathway of professional development and could not provide meaningful insight into the PIF of an individual or group. To respond to this issue, I propose four SASs, in addition to the DS, provided in one self-administrated questionnaire with a total of 27 items.
Comparing the respondent groups revealed that SAS-2 score decreased and SAS-h score increased as training advanced from students to residents and even more to instructors. This result suggested that SAS-2 and SAS-h might be able to differentiate respondents at low or high stages even though the reliability coefficients were not high.
On the other hand, stage 3-specific inclusion preferences, such as "rely on others" and stage 4-specific independence preference, such as "behave according to own values", were opposite attributes. As a result, group mean scores of SAS-3 and SAS-4 might be reflected by the ratio of constituent members at stage 3 and stage 4, influenced by a few respondents with extreme attributes, and indicative of neutral characteristics. In fact, individual score analysis indicated there were respondents suspected to have different attributes as well as individuals at different stages within the same respondent group. Even though this limitation existed in the group comparison, the mean scores of SAS-3 and SAS-4 in each group suggested that students might have inclusion preference typically seen at stage 3 more than independence, and instructors might have independent preference typically seen at stage 4 more than inclusion preference.
Basically, individual values and attitudes are complex and there are transition phases between stages. Each SAS did not correspond exactly to Kegan's stages, but did indicate the tendency of key attributes related to staging. Information provided by the SASs is informative for discussing PIF, and illustrates that the four SASs are valid scales for this purpose.
People with low DS scores might be expected to be at a lower stage, and people with high DS scores should be at higher stages. As shown in Fig. 2, respondents with DS scores of less than 55 were expected to be at stage 2. Respondents with DS scores of 75 or higher were expected to be at stage 4 or 5.
The results of this research indicated that the average medical student in this study was at stage 3 and they ranged from stage 2 to 4. The average instructor in this study was at stage 4 or higher, and few instructors were at stage 2. These results were compatible with theoretical hypotheses on medical trainees [7,13], and a qualitative interview analysis of law and dental students (from stage 2 to stage 4) [19].
Even though each of the scales developed in this study requires greater validity and higher reliability, the combination of scale scores could be an indicator of PIF that knowledge and skill assessment or behavior observation cannot provide. Analysis of each respondent's scale scores, which represent individual attitudes and values and actual behaviors, should be investigated in the future.

Limitations
The SASs had low reliability because of the limited number of items under the practical restrictions of the questionnaire compared to the complex and divergent process of PIF. I could not conclude whether the broad score range and differences among the four SAS scores were due to characteristics of the respondents' PIF or error in an unreliable scale.
All items were written in Japanese and all respondents were located in Kagoshima, Japan. Long-term prospective studies and research in other locations are required to confirm scale sensitivity and improve usability.

Conclusions
Multiple scales evaluating different developmental stagespecific attributes, combined with one scale evaluating degree of maturation and socialization, might provide meaningful information about individual and group PIF. Young medical trainees, such as medical students and residents, were in the process of PIF.

Acknowledgements
The author gratefully acknowledges all participants who took part in this research, specifically Michiko Nakamichi for her excellent work in data gathering, and Dr. Hitoshi Setoyama, Dr. Tatsuru Kaji, and the staff of the clinical training center at Kagoshima University Hospital for collecting the data of residents.
Author's contributions MT designed the study, performed the statistical analyses, and drafted, edited, and approved the final manuscript to be submitted. There are no other individuals who qualify as authors.

Funding
Funding for this study was provided by JSPS KAKENHI (Grant Numbers 15 K08559 and 18 K10008). The funding body played no role in the design of the study and collection, analysis, and interpretation of data or in writing the manuscript.
Availability of data and materials All data are stored according to agreements with the participants and ethical standards and are available upon reasonable request to the author.
Ethics approval and consent to participate Ethical approval for this study was granted by the Graduate School of Medical and Dental Sciences, Kagoshima University (No. 629 July 15, 2016, No. 666 November 18, 2016, No. 180091 June 28, 2018). Information regarding the study purpose, research methods, and the conditions and expected consequences of participating in the study were explained to all respondents in writing before data were collected. Written consent was not required since data were collected using an anonymous questionnaire from which the respondents could not be identified. The researcher did not have a direct relationship (i.e., instructor or supervisor) with any of the medical students and residents participating in this study at the time of data collection.

Consent for publication
Not Applicable.

Competing interests
To the author's knowledge, no conflict of interest, financial or otherwise, exits with respect to this work.