In the end of last year, 50 doctors sat down to make a number of tricky diagnoses. Half of the group was allowed to use the AI tool ChatGPT, the other half not.
Many healthcare providers in the US have started offering chatbots to their employed doctors, says Adam Rodman, who was involved in designing the study and continues:
But what does this do to the doctor's decision-making – does it make them better?
"Outperformed dramatically"
The experiment is not primarily about the diagnosis itself, but about how the doctor arrives at it and about considering what speaks against the doctor being wrong.
No major difference in the result was noted between the groups. The doctors who used the language model got 76 percent right, the control group 74 percent. But the big surprise came when the AI was allowed to take the test on its own – it reached 90 percent.
The AI model outperformed dramatically both groups. It is fully capable of making an impressive performance on its own – but it did not make the humans better, says Rodman.
One explanation for why the doctors who used ChatGPT did not do better is that they were never instructed on how to use it. Many used it like Google. Another reason is that humans are bad at arguing against themselves.
They are not particularly good at saying "these are the reasons I might be wrong". There, the AI model was great.
Should not replace
The result is controversial and has made doctors in the US anxious, according to Rodman. He can understand it. First, they read medicine for four years, then specialist training (residency) for another three to seven years.
All to learn how to make diagnoses. Then it's depressing when a language model, which has been trained on scraping the internet and God knows what, can do the same thing.
Swedish doctors are also experiencing the AI's entry.
We see that there is a great potential in AI to improve healthcare. And we believe that all doctors will be affected – or are already affected – by the development, says Sofia Rydgren Stale, chief physician and chair of the Swedish Medical Association.
Every fourth doctor is already using AI today, primarily to help write journals. In image diagnostics, for example, in the hunt for cancer tumors, significant progress has been made. At the same time, there is a lack of guidelines, which the trade union has criticized.
When it comes to the ability to make diagnoses, Rydgren Stale notes that there may be situations where AI makes better assessments.
I think you can see it as the study illustrates in a good way how the opportunity to take advantage of AI looks like.
At the same time, she says that AI makes worse estimates when it gets other types of input values. She also emphasizes that language models are general and often not trained on, for example, certain minorities, and the problems with patient data leakage.
Will AI take the doctors' jobs? The technology is cheap and, unlike humans, never gets tired or irritated. Rydgren Stale thinks that there is sometimes a tendency from a political perspective to overestimate the possibilities for savings.
The important thing is to use the potential but also handle the risks that exist. There are some things that AI will be very good at, and others where humans are much better. I don't think AI will be able to do everything on its own.
Diagnoses not everything
Adam Rodman is clear that the study's conclusion is not that doctors should be replaced with AI. Making a diagnosis is primarily a small part of the doctor's job – and to do that, it requires that the doctor knows what questions to ask and that tests are done – in the simulated experiment, the information is already collected.
And the biggest part of my workday doesn't go to making complex diagnoses. It goes to talking, coordinating, comforting, and doing paperwork. But this puts a finger on the anxiety that I think many professionals feel for the power of some of these models, says he.
Today, many doctors in the US use AI to record, write out, and summarize patient conversations. If AI is already listening – can the next step be to intervene?
I would like AI to be a third person in the conversation who listens and gives recommendations and advice, or even says when we have preconceived notions, which we know is a big problem among doctors, says Rodman.
Facts: ChatGPT outperformed
TT
The study was conducted at the end of 2023 in the US, with 50 doctors. The participants were given 60 minutes to go through six clinical cases.
The cases are based on real patients, where medical experts have compiled the information.
Doctors who were randomly selected to use ChatGPT4 got an average score of 76 percent in the test, the control group that did not use ChatGPT reached 74 percent. But when ChatGPT itself was allowed to try to solve the cases, 90 percent was achieved.
The report is presented in the scientific journal Jama.
Doctors around the world – like other professional groups – have started using AI in various ways. Here are some examples:
OpenEvidence: A language model that collects medical studies and can answer medical questions in a natural language based on them.
Transcription of patient conversations: The doctor can use a microphone (or their phone) to record patient conversations, which are automatically written out and summarized, and can also write a first draft of the journal entry.
Patient portals: Chatbots can be a first step when patients seek medical contact, which can ask basic questions and summarize the answers for the doctor.
Image recognition: By training AI on image material from patients, it can learn to recognize diseases and injuries. It can be about tumors on mammography images or detecting diabetes through retinal photography.
Research: Many companies use AI to help develop drug candidates. It can also be used to make calculations on how disease outbreaks can spread.