AI Can't Access Physician Judgement
Much has been said about the wonders of AI which is alleged here and there to get more accurate diagnoses than physicians 80 percent of the time.
Yes, AI can get the “right” answers. But how dependable are those answers? AI still needs to be fed the right questions to get an appropriate answer, an important part of medical practice. When I was teaching how to do a history and physical examination, I told all my students that the answers they got were only as good as the questions they asked. It’s the doctor’s responsibility to ask the right question. In other words, physician answers are only as good as their questions.
A 25-year-old woman came to see me with a new onset backache on the right side. Including in my list of diagnoses, I was thinking of pulmonary emboli as she was on birth control pills and this was the worst problem those symptoms could represent. When I asked her if she had shortness of breath, she said, “no.” When I rephrased the question to ask her whether she had shortness of breath when she walked up the steps, she said yes. This answer caused me to order the correct tests, which included a CT which diagnosed her multiple pulmonary emboli. Asking the right question is an active process requiring significant effort, consideration, concern and a strong desire to do the very best for the person sitting next to you, whether you know them or not.
If I hadn’t understood that I had been possibly given an incorrect answer because I had possibly asked the wrong question, I wouldn’t have ordered the correct tests and I would have missed the lethal pulmonary emboli, and the results would’ve been very different indeed. So, when AI can ask the right question, I will say it has value in patient care.
The Dreyfus brothers studied expertise back in the 1980s when researchers were trying to come up with how to enable AI to do any number of tasks. Their book, Mind Over Machine, presented the idea that so much of human effort is dedicated to trying to use logic to make “accurate” decisions.
“Early success in programming digital computers to exhibit simple forms of intelligent behavior, coupled with the belief that intelligent activities differ only in their degree of complexity, have led to the conviction that the processing underlying any cognitive performance can be formulated in a program and thus simulated on a digital computer.”
Mind Over Machine, p. 8
The notion of evidence-based medicine is just one more misconception about what computers can do. The United States Preventative Services Taskforce (USPST) is a group of about five people rendering opinions that everybody in the nation is supposed to consider, like the emperor’s new clothing.
At the time the Dreyfus brothers were defining expertise as something more than following rules, Douglas Hofstadter was grappling with similar concepts in Godel, Esher, Bach: The Eternal Golden Braid.
“Is it possible to define what evidence is? Is it possible to lay down laws as to how to make sense out of situations? Probably not, for any rigid rules would undoubtedly have exceptions and non-rigid rules are not rules. Having an intelligent AI program would not solve the problem either. For as an evidence processor, it would not be any less fallible than humans are.”
Godel, Esher, Bach: The Eternal Golden Braid, p. 694
Let’s look at some more “evidence” from the Centers for Disease Control and Prevention (CDC), a supposedly rigorous, robust and honest group of intellectuals providing the absolutely most up-to-date information with the most accurate recommendations. This article by Donna Hoyert, Health E-Stat 100: Maternal Mortality Rates in the United States, 2023, begins with the World, Health Organization (WHO) definition of maternal mortality, defined as “the death of a woman, while pregnant or within 42 days of termination of pregnancy from any cause related to or aggravated by pregnancy or its management, but not from accidental or incidental causes.”
In the first place, deaths associated with pregnancy can occur for as long as the first year after delivery, so confining the collection of data on maternal mortalities to 42 days after delivery omits 10.5 months of data. This narrowing of the time to collect data on maternal mortality leaves an unknown number of maternal deaths unaccounted for because maternal deaths can occur up to a year after delivery from any one of the major causes, including suicide, homicide, drug overdose, cardiomyopathy, uterine infection, blood loss, and pulmonary emboli. By removing 10 months of maternal mortality data from the equation, we come up with a rate of 18.6 deaths per 100,000 live births in 2023 compared with a rate or 22.3 in 2022, all down from the recent 32 per 100,000. This reduction in maternal mortality has occurred primarily in the young white women, although maternal mortalities for Asian and Black women have not changed significantly according to this article.
The data used are from the National Vital Statistics System mortality file. The number of maternal deaths does not include all the deaths during pregnancy, or among recently pregnant women, but only those deaths with the underlying cause of death assigned to various International Statistical Classification of Diseases.
The article goes on to say that statistics fluctuate:
“…because of this relatively small number of events, and because of accuracy of reporting maternal deaths on death certificates” and “data are showing only for the largest race and Hispanic-origin groups for which statistically reliable rates can be calculated, and numbers and rates are suppressed for those groups for which statistically reliable rates cannot be calculated.”
That means that the two groups who have the highest maternal mortality rates are omitted from this so-called evidence-based article with values coming from the National Center for Health Statistics of the CDC.
I wonder how many people don’t read the entire article and miss the most important sentence which tells us that our new decrease in maternal mortality to 18.6/100,000 omits pregnant women not only by shortening the period watched for maternal deaths, but also by omitting the two groups that have the highest maternal mortality in our nation—Native Americans and Native Alaskans. My point is that these statistics, this so-called “evidence,” is severely misleading in that it promotes a false sense of accomplishment and lulls us into a false feeling of safety and security.
Obstetrics is highly subjective, which means that judgment is constantly in play. For two decades, I delivered patients at four different hospitals. Not only do the nurse exams of patients vary from hospital to hospital, but the exams also vary from nurse to nurse within a hospital. There’s no absolute way to determine whether a cervix is 7 cm, or 6 cm, or 8 cm dilated. One nurse might call a cervix 7 cm. Another nurse might call the same cervix 6 cm and another might call it 8 cm. You don’t know for sure unless you know how nurses examine patients. In other words, part of my judgment as a physician means knowing how each nurse does an exam.
Measuring postpartum bleeding also varies from nurse to nurse. There are many different options that pass for medical care. In one hospital the nurses had a protocol for watching postpartum bleeding very closely. At another hospital a nurse might call me to say the patient has bled 150 g and what would I like them to do? At still another hospital a nurse might call me in a panic tell me the patient was bleeding a lot. That meant I had to interpret the data depending on who was calling and from which hospital. That’s not evidence-based medicine—that’s judgement based upon my knowledge of the various nurses attending labor and delivery.
The magic of obstetrics is prevention. That takes lots of judgment and lots of caring. Prevention doesn’t mean bailing people out of big trouble or even little trouble. It means not having trouble in the first place and having a smooth pregnancy with a term vaginal birth and a healthy mom and baby.
Although having everybody healthy is a tremendous personal reward, it remains silent and unrecognized because nobody thanks you for trouble they didn’t have. CMS and insurances haven’t a clue about physician judgement and prevention. There’s no role for physician judgment in evidence-based medicine or AI.
There is a difference between avoiding trouble and having the very best results. Those ideas are lost on those who judge our patient care, whether they be CMS, insurance companies, or lawyers. If you have a lot of normal term vaginal births, that’s the best sign of a good practice, but from a CMS and insurance perspective, that’s not considered great skill but rather the physician is considered to have a low acuity practice for which there is no recognition and no payment.
Years ago in the town I practiced, there was a group of physicians who were good at messing up the simplest obstetrics with unnecessary procedures. But rather than address any of the problems, the payers just called this a “high acuity practice” as if these were just more complicated patients to start with. If the payers, including CMS, really wanted to reward value-based care, they would look for the low acuity practices, reward them, find out how low acuity practices achieved safe pregnancies and export these measures to the so-called high acuity practices.
In the end, medical practice, like law practice, is indeed a privilege, and an opportunity to introduce the very best in obstetric care and maximize outcomes through knowledge, judgment, commitment, and caring. Evidence-based medicine is just another batch of opinions based upon research studies which we all know are often biased in various ways, as shown in the Donna Hoyert article with subjects not representative of the general population.
Neither AI nor evidence-based medicine is going to care about care, nor will either help me to know which nurse calls a cervix 7 cm, and which nurse would call the same cervix 6 cm or 8 cm. Neither will AI manage postpartum hemorrhage, although both can gin out information about judgement to a jury. Neither will AI calm worried patients or hysterical nurses. Neither will AI tell me when it’s safe to close an abdomen and end surgery.
“Problems involving deep understanding built up on the basis of vast experience will not yield—as do simple, well-defined problems that exist in isolation from much of human experience—to formal mathematical or computer analysis.”
Man and Machine, p. 11
Only the knowledge and judgment I’ve learned from all my previous experience will represent those subject matters at all, and if the American Law Institute wants to denigrate that care by calling it “customary,” and not to be trusted, they merely show their profound lack of understanding of physician expertise.
The Dreyfus brothers understood the enigma of “evidence-based” medicine well:
“…the hierarchical organization of decisionmaking, the increasingly bureaucratic nature of society, and the pervasiveness of economic metrics of success and failure encourage an excessive reliance on calculative rationality. Since wisdom and judgement prove too hard to defend, information, decontexualized facts, and contrived numerical certainties are substituted.”
Man and Machine, p. 194
There is no substitute for genuine physician expertise based upon a higher level of thinking than rationality. And that expertise cannot be represented by AI.