Assessing clinical communication skills in physicians: are the skills context specific or generalizable

Background Communication skills are essential for physicians to practice Medicine. Evidence for the validity and domain specificity of communication skills in physicians is equivocal and requires further research. This research was conducted to adduce evidence for content and context specificity of communication skills and to assess the usefulness of a generic instrument for assessing communication skills in International Medical Graduates (IMGs). Methods A psychometric design was used for identifying the reliability and validity of the communication skills instruments used for high-stakes exams for IMG's. Data were collected from 39 IMGs (19 men – 48.7%; 20 women – 51.3%; Mean age = 41 years) assessed at 14 station OSCE and subsequently in supervised clinical practice with several instruments (patient surveys; ITERs; Mini-CEX). Results All the instruments had adequate reliability (Cronbach's alpha: .54 – .96). There were significant correlations (r range: 0.37 – 0.70, p < .05) of communication skills assessed by examiner with standardized patients, and of mini-CEX with patient surveys, and ITERs. The intra-item reliability across all cases for the 13 items was low (Cronbach's alpha: .20 – .56). The correlations of communication skills within method (e.g., OSCE or clinical practice) were significant but were non-significant between methods (e.g., OSCE and clinical practice). Conclusion The results provide evidence of context specificity of communication skills, as well as convergent and criterion-related validity of communication skills. Both in OSCEs and clinical practice, communication checklists need to be case specific, designed for content validity.


Background
Communication is one of the most important components of physicians' patient management skills and overall competence. Competence in a physician is a composite of clinical skills, interpersonal aspects of patient physician encounter, professionalism and communication skills [1][2][3]. A good communicator can extract appropriate history from the patient, formulate an appropriate diagnosis, build a strong doctor patient relationship, and can appropriately negotiate management strategy with the patient.
OSCEs have been used extensively to assess communication skills. Measurement errors have been identified for case specificity, candidate-standardized patient (SP) inter-action, and case-candidate interaction [4,5]. Although Hodges, Turnbull, Cohen et al. reported a significant difference in the mean score of difficult and easy OSCE stations, they nevertheless concluded that communication skills are bound with content knowledge and are case or context specific [6].
Guiton, Hodgson, Delandshere and Wilkerson found high internal consistency (Cronbach's alpha 0.89 -0.94) within 7 OSCE stations [4]. The Cronbach's alpha based on intra-item calculation across cases, however, was low. In their Generalizability analysis they found that the highest variance (50%) was contributed by students by case interaction implying that communication skills are case specific [4]. Conversely, Keely, Myers and Dojeiji found that the internal consistency of one 22-minute station in an OSCE to test written communication skills of 36 Internal medicine residents from year 1 through 4, was 0.80. Moreover, they found that it correlated with a breaking bad news verbal communication station (r = 0.37 p < 0.01) but not with the thyroid examination station (r = 0.04 ns) [7].
OSCEs have been widely used to assess communication skills in students, residents, and other physicians for licensing and certification. The assessment of communication skills is still plagued with measurement errors related to content specificity, language proficiency, case and student interactions, variability in standardized patients, and assessment of written communication skills [8]. Most studies have, however, found that even with reliable OSCE stations, the intra-item across case reliability is low [4,6,7]. This low intra-item agreement (i.e., low reliability) across cases or stations indicates that communication skills are content specific.
Humphries used confirmatory factor analysis to identify the model best fitting the communication skills assessed through OSCE by SPs and expert examiners on objective structured video examination (OSVE) [9]. He first identified the latent variables as specified by OSVE, SPs and the experts and then did confirmatory factor analysis to identify the best fitting model that could account for the effect of knowledge on future performance of candidates related to communication skills. He could not find a strong relationship between knowledge and performance of communications skills and concluded that better assessment tools need to be developed to assess this complex trait [9].
In the OSCEs used for Clinical Skills Assessment of IMGs in the United States, SPs assess the candidates for communication, and data gathering skills (inclusive of history taking and physical examination), interpersonal skills and English proficiency [8,10]. There is generally high testretest reliability for all components and low correlations between English proficiency, interpersonal skills, and communication with measures of clinical competence [8]. Conversely, Colliver and colleagues found high correlations between clinical competence and communication skills (also empathy) for specific cases [11,12].
The foregoing and other studies have uncovered mixed evidence for generic and domain specific aspects of communication skills [4,[6][7][8]10,13,14]. Accordingly, further research is needed to investigate the issue of the domain specificity of assessment tools used for assessing communication skills in physicians. The purposes of the present study were to 1) study the psychometric characteristics of an instrument to assess communication used in high stakes OSCEs, and 2) investigate the specificity or generality of communication assessed in OSCEs vis à vis communication assessed in clinical practice.

Study Design
A psychometric study design was employed to investigate the reliability and validity of communications skills instrument used for high-stakes examination.

Context of the Study
The Western Alliance for Assessment of International Physicians (WAAIP) project was created to develop and fieldtest an assessment process to determine the practice readiness of selected international medical graduates (IMG) registrants identified by College Registrars in Western and Northern Canada. The intent was to facilitate IMG integration into clinical practice while maintaining Canadian clinical standards. [15] Four provinces (Alberta, Manitoba, Saskatchewan, and British Columbia) and the Northwestern Territories nominated 39 physicians for practice ready assessment. We anticipated that if successful they could apply for a restricted license to practice medicine in the respective province or territory. The study was approved by the University of Calgary Conjoint Health Research Ethics Board.
Assessment occurred in two parts: 1) Step A, a 150 item multiple choice questions exam to test declarative knowledge followed by a 14 station objective structured clinical exam utilizing standardized patients for testing clinical and communication skills, and 2) Step B, direct assessments and evaluations of the IMGs in a three month supervised clinical practice experience. During supervised clinical practice, several direct observation instruments as well as patient surveys for assessing varied competencies including communication were employed.
Communication skills were assessed both by the physician examiner and the standardized patients (SPs) for each OSCE station. The same instrument was used by the physician assessor and SP. The communication skills instrument had 13 items and rated from 1-5 (1 = strongly disagree, 3 = neutral, 5 = strongly agree). In Step B the top performing 25 candidates were selected for supervised clinical practice of 12 weeks in their respective provinces. They were assessed through the instruments employing direct assessments in supervised clinical practice employing Physician Achievement Review (PAR) [16], Mini-CEX [17], and In Training Examination Reports (ITERs). The Mini-CEX is a 9-point scale and PAR is 5 point scale (1 = strongly disagree, 3 = neutral, 5 = strongly agree), and communication items on ITERs are a 3-point scale.

Data-analysis
SPSS Version 14 was used to calculate the descriptive statistics, factor analysis and Cronbach's alpha for inter-and intra-item across station reliabilities. OSCE scores of 39 candidates were used to calculate inter-and intra-item across stations reliabilities. Scores of 24 successful candidates for communication items on OSCEs communications checklist, PAR, ITERs and Mini-CEX were used for developing the correlation matrix and factor analysis. Generalizability analyses were conducted for the communication checklist in a nested design (SPs within cases).

Subjects
A total of 39 physicians (19 men -48.7%; 20 women -51.3%) who had graduated from a medical school included in the World Health Organization's directory of medical institutions and had a medical degree verified by the Educational Commission for Foreign Medical Graduates International Credentials Services participated. Each candidate had met the minimum required standards on the Test of English as Foreign Language (TOEFL) and had passed the Medical Council of Canada Evaluating Examination (MCCEE).

Results
Twenty-five of 39 IMGs (with equal success rates between males and females) passed Step A and 24 (1 withdrew) moved to Step B and were assessed during supervised clinical practice. Out of these 24 IMG's, based on the assessments during supervised clinical practice 16 passed Step B and subsequently obtained a restricted license to practice in their respective provinces.
The Cronbach alpha reliabilities of the communication instrument used in the OSCE stations by the SPs and by the physician assessors are summarized in Table 1, as are the descriptive statistics. These alphas ranged from .54 to .94. In general these alpha coefficients are quite high indicating substantial internal consistency of the instrument. The generalizability analysis for communication checklist scores across cases with SP's nested within cases yielded Ep 2 = .62. The percent of variance attributable to participants was 5.4, cases 4.6, participants by cases 45.0 and cases by raters' (raters' were assigned to cases and not nested) was 45.0.
The intra-item reliabilities across cases are summarized in Table 2 for both the physician examiner and the standardized patient (range: .13 to .56). Unlike the instrument alphas, these coefficients are quite low indicating poor intra-item agreement across cases.
The descriptive statistics on the communication items for the instruments used for assessing IMG's during supervised clinical practice are given in Table 3. Most candidates did well on the communication as can be seen from the high mean and small standard deviation of the scores. (Table 3) The communication scales (alphas range: .51 to .96) from the various measures were intercorrelated (Pearson's r)the results are summarized in Table 4. There were significant correlations (p < .01) between OSCE physician assessors with SPs, and of mini-CEX with PAR patient communication, and ITER communication. The two PAR instruments had moderately significant correlations (p < .05) with each other and ITERs (Table 4).
Principal component analysis was done with varimax rotation, which converged in 3 iterations. The factor analysis of all the instruments together yielded a two-factor solution accounting for 67% of the total variance in the scores of communication skills (Table 5).

Discussion
The main findings of the present study are: 1) The instruments used for assessing communication skills during supervised practice had good internal consistency reliability; 2) The communication skills instrument used for OSCEs had good reliability within each OSCE station; 3) For the 13 items on the checklist the intra-item reliability across all cases was very low. This means that the candidates' performance varied substantially for all items of communication skills; 4) While the generalizability coefficient indicated adequate data stability overall, the high variance attributed to cases by raters means that error was introduced by raters for same items on different cases; 5) There were significant correlations for communication assessment within clinical practice, but not between clinical practice and OSCE assessments; 6) The factor analysis of all instruments combined yielded a 2-factor solution separating performance from assessment of knowledge application during OSCE.
The lack of correlations between the communication measures from the OSCE (but significant correlations by SP and physician assessors) and clinical practice suggests method specificity of the measures. The correlations of communication measures within clinical practice (PAR patients, PAR co-workers, mini-CEX, and ITERs) further support the method specificity of communication assessments. The OSCE is a standardized, comparatively structured task where the candidates know that it is an examination. The assessment during clinical practice was naturalistic and much less structured than the OSCEs although the mini-CEX might be considered 'semi structured'. The correlations within the naturalistic setting provide evidence of convergent and criterion-related validity for assessing communications. Similarly, the correlations within the OSCE measures (SPs and physician assessors) also provide evidence of convergent and criterion-related validity. The lack of between method correlations provides evidence of the context specificity of communications.
The context specificity is supported by the low intra-item correlations (alpha) across OSCE stations. So even though the method was consistent (OSCE stations), the same item (e.g., 'the doctor treated patient with respect and courtesy') were not rated consistently across cases for the same candidate. That is, the same candidate may have explained what the problem was to the SP very well for the chest pain (Station 1) but not for fever after cholecystectomy (Station 14). The high internal consistencies (alphas) provide further evidence that the items are inconsistent across stations because of context specificity. This context specificity is further supported by high variance attributed to cases by raters in the generalizability analysis.
Our foregoing results are in concordance with previous findings about the context and case specificity of communication skills [4,6,8,10]. We are in agreement with Hodges, that communication skills are domain specific. Accordingly, communication checklists should be specific and tailored for each case as one checklist for all cases appears inappropriate. An item such as "the doctor used understandable and non-technical language" may apply differently with a technical case (e.g., Pre-operative counselling for appendicitis) compared to one that is not as technical but much more emotionally charged (e.g., Breaking bad news).
The 2-factor solution of all the instruments together further disconnects the OSCEs from the assessment instruments used during supervised training. All the instruments used at Step B loaded on to the same factor  with no split loadings from OSCEs checklist. This could either be due to method effect or that OSCEs are testing knowledge application in a contrived setting and may not necessarily predict performance in a real doctor patient encounter. It could also be due to the fact that cases in OSCEs (with SPs) are different from real cases and that communication skills are content and case specific. These results are in conformity with earlier studies that have shown that communication during a doctor patient encounter is influenced by many factors ranging from knowledge of physician to interpersonal, and other noncognitive attributes [9,[11][12][13][14].
If we assume that OSCEs test the knowledge of communication skills and Mini-CEX, PAR and ITERs, test application of knowledge then the results are in conformity with the study by Humphries [9]. The moderate to strong relationships between communication skills instruments used during supervised training could either be due to similar testing situations or method specificity.
A limitation of the present study is the relatively small sample size and its composition. The correlations that we found may be unstable because of the modest sample. As well, the sample consisted of IMGs seeking licensure to practice medicine in Canada. Future research should focus

Conclusion
The results of the present study provide evidence of content and domain specificity of communication skills. This means that communication checklists should be specific and tailored for cases; a generic instrument may not be useful for all cases. Notwithstanding the limitations of the present study, our results are in concordance with other findings and underscore the need for refinement in the assessment procedures for communication skills that is currently done.