The role of strategy and redundancy in diagnostic reasoning
© Bloch et al; licensee BioMed Central Ltd. 2003
Received: 23 July 2002
Accepted: 24 January 2003
Published: 24 January 2003
Diagnostic reasoning is a key competence of physicians. We explored the effects of knowledge, practice and additional clinical information on strategy, redundancy and accuracy of diagnosing a peripheral neurological defect in the hand based on sensory examination.
Using an interactive computer simulation that includes 21 unique cases with seven sensory loss patterns and either concordant, neutral or discordant textual information, 21 3rd year medical students, 21 6th year and 21 senior neurology residents each examined 15 cases over the course of one session. An additional 23 psychology students examined 24 cases over two sessions, 12 cases per session. Subjects also took a seven-item MCQ exam of seven classical patterns presented visually.
Knowledge of sensory patterns and diagnostic accuracy are highly correlated within groups (R 2 = 0.64). The total amount of information gathered for incorrect diagnoses is no lower than that for correct diagnoses. Residents require significantly fewer tests than either psychology or 6th year students, who in turn require fewer than the 3rd year students (p < 0.001). The diagnostic accuracy of subjects is affected both by level of training (p < 0.001) and concordance of clinical information (p < 0.001). For discordant cases, refutation testing occurs significantly in 6th year students (p < 0.001) and residents (p < 0.01), but not in psychology or 3rd year students. Conversely, there is a stable 55% excess of confirmatory testing, independent of training or concordance.
Knowledge and practice are both important for diagnostic success. For complex diagnostic situations reasoning components employing redundancy seem more essential than those using strategy.
Keywordsdiagnostic reasoning clinical decision making medical education cognitive psychology entropy experimental studies
A major part of the undergraduate medical curriculum is dedicated to teaching the art and science of diagnosing illness and disease. Furthermore, when assessing the clinical competence of medical students, examiners must infer knowledge and reasoning skills from the behavior and the responses of the candidates.
It stands to reason then that medical teachers should possess a thorough understanding of diagnostic reasoning as a "basic science" of medical education. In reality, however, our comprehension of the diagnostic reasoning process is hazy at best.
The present study attempts to explore diagnostic reasoning by analyzing detailed recorded data-gathering behavior of experimental subjects with different levels of expertise in a computer simulation of patients with neurological lesions of the peripheral nervous supply to the hand.
Serious reasoning research started in psychology  during the 1950s. It has taken another 20 years for diagnostic reasoning to become an area of empirical research in medicine [2, 3]. At a time when pragmatic medical educators believed in the existence of generic problem-solving skills, diagnostic reasoning research reestablished the primacy of content specific knowledge .
Initially research evolved along two intertwined threads, which alternatively supported and confused each other: the reasoning by (medical) experts and the reasoning by computers. By now, these two fields of research have largely gone their separate ways.
Factors in empirical research on diagnostic reasoning.
Clinical information provided
Products of reasoning analyzed
This type of research is very labor intensive and, consequently, expensive. Thus it is difficult to collect sufficient data to reach adequate statistical power based on diagnostic success and process items alone. As a consequence, diagnostic reasoning research leans heavily on recall, introspection and reflection data . It comes, therefore, as no surprise that the theories derived from this research tend towards models of semantic, analytical reasoning [18, 19]. The literature is replete with a panoply of cognitive structures  – mainly semantic in nature – that are supposed to underlie diagnostic reasoning. The situation may be obscured further by the effect of social desirability bias, which may restrain experimental subjects from admitting to employing less than superlative reasoning strategies.
There is ample evidence [21, 22] that analytical, semantic models alone do not fully explain diagnostic reasoning. Research based primarily on semantic recall, introspection and reflection contains blind spots, when it comes to unconscious and implicit reasoning processes that are not based on semantic information. Methods focusing on such processes are thus required to look beyond semantic networks.
For further discussion, we define inference or inferential reasoning as: logical, algorithmic, mainly semantic, sequential, propositional, forward and/or backward directed, purposeful, open to reflection and introspection. In contrast, pattern recognition is: holographic, heuristic, mainly perceptual, parallel, redundant, unconscious, probabilistic and intuitive. Inferential reasoning is characterized by strategy, pattern recognition by redundancy.
By "strategy" we mean a purposeful sequence of tests, where the specifics of the next test are selected on the basis of previous tests such as to return maximum new information. "Redundancy" on the other hand expresses the number of tests that fail to provide any new information for inference.
A suitable experimental model should, therefore, involve a sufficient number of perceptual cues to allow for good statistical power. One such candidate is eye-movement scanning in the interpretation of histological slides or x-rays. Unfortunately, the fact that the ocular axis is directed at a certain location on the image does not indicate, what is actually seen by the central visual field or that visual information is indeed being recorded and processed.
We have selected a simple deterministic computer simulation involving the (sensory) neurological examination of the peripheral nervous system in the hand. The collected sequence of responses and coordinates of each sensory stimulus allow statistical inference on the reasoning strategies, be they inferential or based on pattern recognition.
How do subjects pick the specific locations on the hand to be tested (strategy)?
How many additional points in excess of what is required for strict inference, do they test before reaching a diagnosis (redundancy)?
How often is the selected diagnosis correct (accuracy)?
How are strategy, redundancy and accuracy related to knowledge and practice?
How are strategy, redundancy and accuracy affected, if subjects receive additional clinical information (symptoms and history) that is concordant, neutral or discordant with respect to the sensory pattern?
The key to answering these questions is the ability to quantify the information revealed by each successive sensory test. The accepted measure of information content is entropy, as introduced by Claude Shannon  in 1948 (Appendix A, see Additional file 1). Specifically, it indicates the potentially available information not yet revealed by the test sequence. An entropy value of 1.0 indicates that none of the available diagnostic information has yet been revealed and that all diagnostic possibilities are still equally likely. Conversely, an entropy value of 0.0 indicates that all relevant diagnostic information has been revealed and that only one diagnosis remains possible.
Entropy does not attempt to estimate or model the current state of a typical diagnostician's knowledge regarding the case. It indicates simply, how much information has been revealed to an ideal inference engine. This allows us to demonstrate the gap that exists between the information content revealed and the information actually used by the diagnostician.
If an individual sensory test ("pin prick") does not change entropy, the test adds no new information – it is redundant. Thus redundancy is defined as the total number of sensory tests in a sequence that did not alter entropy.
A subject's strategy can be strictly inferential (i.e. no redundant tests), in which case it is automatically optimal, subjects could systematically attempt to refute the apparently likeliest hypothesis (Popperian strategy) or, as happens often in reality, they may try to confirm those abnormal findings that support their currently favorite hypothesis.
To determine, which information gathering strategy was used, three measures were calculated: (i) how quickly relevant information was collected as expressed by the area under Shannon's entropy as a function of the number of tests; (ii) the specific number of refutations of discordant cues (Refutation matrix, Appendix B, see Additional file 2); and (iii) the excess of confirmatory testing (Confirmation matrix, Appendix C, see Additional file 3).
Seven familiar neurological patterns of sensory loss in the hand were simulated: C6, C7 and C8 nerve root injury, radial, median and ulnar nerve lesion and poly-neuropathy. Photographs of the dorsal and volar aspects of either the left or right hand were displayed on the screen. With mouse clicks subjects could "test" individual points on the hand. Depending on the location tested and the underlying predetermined diagnosis, one of three verbal responses was returned deterministically in a small pop-up window at the point tested:
"it feels normal",
"it feels different", or
"I can hardly feel it".
The simulation ran as a Java applet within a regular Web page. Subjects were not provided with feedback regarding individual diagnoses during the actual experiment; they did, however, receive detailed feedback after they had completed all the cases.
Each pattern was presented in the context of additional clinical information (symptoms, history and a functional photograph of the hand) concordant, neutral or discordant relative to the sensory pattern. The additional clinical information was relatively bland, providing only subtle suggestions as to the actual diagnosis whether concordant or discordant, although the concordant information was more specific. The neutral items contained no clues. For example, the discordant cases of radial and median nerve deficits had a history vaguely suggestive of a mild cervical injury. Sensory patterns and additional clinical information were repeatedly checked by experienced neurologists for realism.
The experimental subjects consisted of a convenience sample of 23 psychology students, 21 3rd year medical students, 21 6th year medical students (Switzerland has a six year medical curriculum; during the first two years students concentrate on basic sciences) and 21 senior neurology residents. The junior medical students had studied neuroanatomy, but were unfamiliar with the detailed sensory patterns and clinical pictures. They had never practiced sensory examination. Senior medical students had studied sensory patterns, had limited knowledge of clinical pictures and had been introduced to sensory testing. Neurology residents acted as substitute experts, since we were not able to recruit sufficient certified neurologists. The psychology students served as a control group with roughly matching intelligence but no medical education. Psychology students knew neither neuroanatomy nor clinical pictures. Neither had they been taught sensory examination. They were exposed to visually presented maps of the sensory patterns as part of the experimental protocol.
Psychology students participated in two sessions one week apart, the rest in one session each. In their first session psychology students were shown the seven patterns as visual maps together with diagnostic labels during 15 minutes. Otherwise all sessions followed the same sequence: (i) an MCQ test of the seven patterns presented visually as sensory maps; (ii) a single practice case that was not recorded; (iii) a series of 12 cases each for the psychology students and 15 cases each for the rest in a balanced block design. As result of an oversight, the blocks were not perfectly balanced across the 21 possible combinations (6 × 7 / 2). There were only seven unique blocks each with three different sequences of cases. Altogether, all 21 cases occurred with equal frequency for each group. We do not, therefore, believe that this error introduced any significant bias.
After each test subjects had the option of picking a diagnosis from a menu and proceeding to the next case. As part of a further study to be reported separately, the test sequence was interrupted automatically at 5, 10, 20 and 40 tests. Subjects were then asked to indicate their current best estimates for the likelihood of each diagnostic hypothesis.
Test coordinates and time since the previous test were stored test by test for the whole case sequence in the client side Java applet and sent to the Web server as part of the active server page request, upon the selection of a specific diagnosis. Data were automatically stored in a relational database (Microsoft Access) keyed to case and subject. After completion of the experimental phase of the study, data were preprocessed by means of a Microsoft Visual Basic program to determine the expected findings at each point tested for the actual diagnosis as well as for the alternative hypotheses. These results were again stored in a relational table. Based on these findings, plausibility, entropy, redundancy, refutation- and confirmation-counts were calculated with a second MS-VB program (Method described in Appendix A, B & C, see additional files 1, 2 and 3). SPSS was used for the statistical analysis of these derived dependent variables.
For each subject the knowledge of sensory patterns was calculated as the ratio of correctly identified patterns over seven, the total number of patterns in the multiple choice exam. Diagnostic accuracy was calculated for each subject as the ratio of correct diagnoses over total cases processed.
Psychology students participated in two sessions with 12 cases each thus diagnosing a total of 24 cases. Since only 21 unique cases existed, each of these subjects encountered three of 21 cases twice. Using a random number generator, either the first or second of these double cases was dropped from further analysis.
A total of 1,428 sequences with 27,524 test points were analyzed. In 17 sequences subjects guessed the diagnosis without performing any tests. Residents guessed 12 concordant cases correctly. Students guessed three of the remaining five sequences incorrectly. The two correct guesses were in concordant cases, one by a 3rd and one by a 6th year student.
Diagnostic Accuracy – Results of ANOVA: Tests of between sequence effects
CONCORD * LEVEL
Discordant cases form a homogeneous subset against neutral and concordant cases at α = 0.05. The level of training does not separate into homogeneous subsets.
Results of MANOVA: Tests of Between-Sequence Effects
LEVEL * CONCORD
Both level of training and degree of concordance have a significant effect on redundancy. But the area under the entropy curve depends only on the level of training, not the degree of concordance. Post hoc analysis (Scheffé test) shows the area under the entropy curve to split up into two homogeneous subsets: 3rd year medical students versus the rest. Redundancy splits into three homogeneous subsets: 3rd year students, residents, and 6th year medical together with psychology students as a middle group.
In regards to degree of concordance, redundancy splits into two homogeneous subsets: concordant versus neutral and discordant.
It is obvious, though, that 3rd year medical students show least evidence of strategy, independent of additional clinical information.
Contingency table analysis of Popperian refutation counts.
Common Odds Ratio
Psychology and 3rd year medical students are not affected by the additional discordant clinical data. They don't know enough about the clinical syndromes.
Residents and 6th year students show a significant though small attempt at refuting the clinically suggested diagnoses. The excess of specific refutations is about 11% for residents and 17% for 6th year students respectively (Common Odds Ratio).
There is a significant difference in the increase of redundancy between residents and 6th year students: χ2 = 17.24; p < 0.001; C.O.R. = 1.24. In other words, in the presence of discordant information 6th year students seem to use more strategy while residents rely more on redundancy. This could be explained by the "intermediacy effect".
Chi-square, significance and estimated ratio of actual over expected confirmatory tests.
The tendency to selectively confirm expected hypotheses rather than testing randomly or refute alternative hypotheses appears inherent in this diagnostic reasoning experiment.
Discussion and Conclusions
Diagnostic accuracy, strategy and redundancy depend primarily on the knowledge of sensory patterns and associated syndromes. The effect of knowledge on accuracy and redundancy appears to be stronger than on strategy. In fact, effective data-gathering strategies seem to play a minor role. Even where appropriate, little refutation of alternative hypotheses occurs. Just the opposite: confirmatory testing seems to be dominant.
In addition, both accuracy and redundancy, but not strategy appear to depend on practice independently from knowledge.
These results appear somewhat counterintuitive. Experts should have vastly better problem-solving strategies than novices. True, in the real world, experts also have an edge on knowledge. The knowledge spread in our experiment was insufficient to demonstrate that aspect.
There might be another explanation, however. In our experiment, to reach a diagnosis by inference requires not only the seven diagnostic hypotheses to be present in short-term memory, but also the roughly seven tests in strategically placed locations and their combinations must be available at all times. In other words, for purely inferential diagnostic reasoning one needs to operate on approximately 49 items or 5.6 bits of information. As George A. Miller  has shown the capacity of short term memory is only about seven items or 2.8 bits of information. The scope of short-term memory, therefore, would appear insufficient to support pure inference. Short of using memory substitutes, such as paper and pencil, the only alternative is to resort to what Miller refers to as "recoding" – an implicit reasoning strategy. This is a hypothesis that requires further testing.
It remains surprising, however, that the psychology students were able to set up an efficient recoding scheme after only 15 minutes' training that allows easy shifting from overt to latent pattern recognition.
The reported findings may also have implications for teaching and assessment. If the rate limiting factor for inference is the number of items that have to be kept in short term memory, teachers can assist learners by constructing diagnostic trees that involve only two or three branches at each decision point, rather than long lists of differential diagnoses. Such cognitive structures correspond to Bordage's  key features or Mandin's schemes .
In the assessment of diagnostic reasoning, redundancy of requested information appears as a second independent, sensitive measure of competence besides the accuracy of the diagnosis.
This study has been made possible by grant #1153-055603 of the Swiss National Science Foundation (SNF). We wish to thank R. Hofer for statistical advice and P. Tobler for assisting in the pilot study. We are grateful to Ch. Hess, H.P. Mattle and M.Mumenthaler for critically reviewing cases and sensory patterns.
- Bourne LE, Dominows RL: Thinking. Annual Rev Psychol. 1972, 23: 105-130. 10.1146/annurev.ps.23.020172.000541.View ArticleGoogle Scholar
- Elstein AS, Kagan N, Shulman LS, Jason H, Loupe MJ: Methods and theory in the study of medical inquiry. J Med Educ. 1972, 47: 85-92.Google Scholar
- Barrows HS, Bennett K: The diagnostic (problem solving) skill of the neurologist: experimental studies and their implications for neurological training. Arch Neurol. 1972, 26: 273-277.View ArticleGoogle Scholar
- Elstein AS, Shulman LS, Sprafka SA: Medical Problem Solving. An Analysis of Clinical Reasoning. Harvard University Press, Cambridge Mass. 1978Google Scholar
- Raufaste E, Verderi-Raufaste D, Eyrolle H: [Radiological expertise and diagnosis. II. Empirical study]. J Radiol. 1998, 79 (3): 235-40.Google Scholar
- Turnbull J, Carbotte R, Hanna E, Norman G, Cunnington J, Ferguson B, Kaigas T: Cognitive difficulty in physicians. Acad Med. 2000, 75 (2): 177-81.View ArticleGoogle Scholar
- Barrows HS, Norman GR, Neufeld VR, Feightner JW: The clinical reasoning of randomly selected physicians in general medical practice. Clin Invest Med. 1982, 5 (1): 49-55.Google Scholar
- Bordage G, Grant J, Marsden P: Quantitative assessment of diagnostic ability. Med Educ. 1990, 24 (5): 413-425.View ArticleGoogle Scholar
- Babcook CJ, Norman GR, Coblentz CL: Effect of clinical history on the interpretation of chest radiographs in childhood bronchiolitis. Invest Radiol. 1993, 28 (3): 214-217.View ArticleGoogle Scholar
- Brooks LR, LeBlanc VR, Norman GR: On the difficulty of noticing obvious features in patient appearance. Psychol Sci. 2000, 11 (2): 112-7. 10.1111/1467-9280.00225.View ArticleGoogle Scholar
- Kulatunga-Moruzi C, Brooks LR, Norman GR: Coordination of analytic and similarity-based processing strategies and expertise in dermatological diagnosis. Teach Learn Med. 2001, 13 (2): 110-6. 10.1207/S15328015TLM1302_6.View ArticleGoogle Scholar
- Myers JH, Dorsey JK: Using diagnostic reasoning (DxR) to teach and evaluate clinical reasoning skills. Acad Med. 1994, 69 (5): 428-9.View ArticleGoogle Scholar
- Regehr G, Cline J, Norman GR, Brooks L: Effect of processing strategy on diagnostic skill in dermatology. Acad Med. 1994, 69 (10 Suppl): S34-S36.View ArticleGoogle Scholar
- Schwartz W: Documentation of students' clinical reasoning using a computer simulation. Am J Dis Child. 1989, 143 (5): 575-579.Google Scholar
- Norman GR, Brooks LR, Allen SW: Recall by experts and novices as a record of processing attention. J Exp Psychol Learn Mem Cogn. 1989, 5: 1166-74.View ArticleGoogle Scholar
- Patel VL, Groen GJ, Frederiksen CH: Differences between medical students and doctors in memory for clinical cases. Med Educ. 1986, 20 (1): 3-9.View ArticleGoogle Scholar
- Schmidt HG, Norman GR, Boshuizen HP: A cognitive perspective on medical expertise: theory and implication. Acad Med. 1990, 65 (10): 611-21.View ArticleGoogle Scholar
- Bordage G, Lemieux M: Semantic structures and diagnostic thinking of experts and novices. Acad Med. 1991, 66 (9 Suppl): S70-S72.View ArticleGoogle Scholar
- Schmidt HG, Norman GR, Boshuizen HP: A cognitive perspective on medical expertise: theory and implication. Acad Med. 1990, 65 (10): 611-621.View ArticleGoogle Scholar
- Custers JFM, Regehr G, Norman GR: Mental representations of medical diagnostic knowledge: a review. Acad Med. 1996, 71: 555-61.View ArticleGoogle Scholar
- Norman GR, Brooks LR: The non-analytical basis of clinical reasoning. Advances in Health Sciences Education. 1997, 2: 173-184. 10.1023/A:1009784330364.View ArticleGoogle Scholar
- Elstein AS: Heuristics and biases: selected errors in clinical reasoning. Acad Med. 1999, 74 (7): 791-4.View ArticleGoogle Scholar
- Shannon CE: A mathematical theory of communication. The Bell System Tech J. 1948, 27: 379-656.View ArticleGoogle Scholar
- Jaynes ET, Bretthorst GL: Probability Theory: The Logic of Science. To be published in July 2003. [http://omega.albany.edu:8008/JaynesBook]
- Collins A, Michalski RS: The Logic of Plausible Reasoning: A Core Theory. Cognitive Science. 1989, 13: 1-49. 10.1016/0364-0213(89)90010-4.View ArticleGoogle Scholar
- Gelman A, Carlin J, Stern H, Rubin D: Bayesian Data Analysis. Boca Raton, FL: Chapman & Hall. 1995Google Scholar
- Popper K: The logic of scientific discovery. Routledge, London. 1934Google Scholar
- Miller GA: The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychol Rev. 1956, 63: 81-97.View ArticleGoogle Scholar
- Bordage G: Elaborated knowledge: a key to successful diagnostic thinking. Acad Med. 1994, 69 (11): 883-5.View ArticleGoogle Scholar
- Mandin H, Jones A, Woloschuk W, Harasym P: Helping students learn to think like experts when solving clinical problems. Acad Med. 1997, 72 (3): 173-9.View ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6920/3/1/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.