AI Outperformed Doctors in Making Diagnoses

In the end of last year, 50 doctors sat down to make a number of tricky diagnoses. Half of the group got to use the AI tool Chat GPT, the other half did not.

Many healthcare providers in the US have begun offering chatbots to their employed doctors, says Adam Rodman, who was involved in setting up the study, and continues:

But what does this do to the doctor's decision-making – does it make them better?

"Utclassed dramatically"

The experiment is not primarily about the diagnosis itself, but about how the doctor arrives at it and about thinking about what speaks for the doctor being wrong.

No major difference in the result was seen between the groups. The doctors who used the language model got 76 percent right, the control group 74 percent. But the big surprise came when the AI was allowed to take the test on its own – it reached 90 percent.

The AI model outclassed both groups dramatically. It is fully capable of making an impressive performance on its own – but it did not make humans better, says Rodman.

One explanation for why the doctors who used Chat GPT did not do better is that they were never instructed on how to use it. Many used it like Google. Another reason is that humans are bad at arguing against themselves.

They are not particularly good at saying "these are the reasons I might be wrong". There, the AI model was great.

Should not replace

The result is controversial and has made doctors in the US anxious, according to Rodman. He can understand it. First, they read medicine for four years, then specialist training for another three to seven years.

All to learn to make diagnoses. Then it's depressing when a language model, trained on scraping the internet and God knows what, can do the same thing.

Swedish doctors are also experiencing AI's entry.

We see that there is great potential in AI to improve healthcare. And we believe that all doctors will be affected – or are already affected – by the development, says Sofia Rydgren Stale, chief physician and chairman of the Swedish Medical Association.

Every fourth doctor already uses AI today, primarily to help write medical records. In image diagnostics, for example, in the hunt for cancer tumors, significant progress has been made. At the same time, there is a lack of guidelines, which the trade union has criticized.

When it comes to the ability to make diagnoses, Rydgren Stale notes that there may be situations where AI makes better assessments.

I think you can see it as the study illustrates in a good way how the opportunity to take advantage of AI looks.

At the same time, she says that AI makes worse estimates when given other types of input values. She also emphasizes that language models are general and often not trained on, for example, certain minorities, and the problems that exist with patient data leakage.

Will AI take the doctors' jobs? The technology is cheap and, unlike humans, never gets tired or irritated. Rydgren Stale thinks that there is sometimes a tendency from a political perspective to overestimate the potential for savings.

The important thing is to use the potential but also handle the risks that exist. There are some things that AI will be very good at, and others where humans are much better. I don't think AI will be able to do everything on its own.

Diagnoses not everything

Adam Rodman is clear that the study's conclusion is not that doctors should be replaced with AI. Making a diagnosis is primarily a small part of the doctor's job – and to do that, it requires that the doctor knows what questions to ask and that tests are done – in the simulated experiment, the information was already collected.

And the biggest part of my workday doesn't go to making complex diagnoses. It goes to talking, coordinating, comforting, and doing paperwork. But this puts a finger on the anxiety that I think many professionals feel for the power of some of these models, says he.

Today, many doctors in the US use AI to record, write out, and summarize patient conversations. If AI is already listening – can the next step be to intervene?

I would like AI to be a third person in the conversation who listens and gives recommendations and advice, or even says when we have preconceived notions, which we know is a big problem among doctors, says Rodman.

Facts: Chat GPT outclassed

TT

The study was conducted at the end of 2023 in the US, with 50 doctors. Participants were given 60 minutes to go through six clinical cases.

The cases are based on real patients, where medical experts have compiled the information.

Doctors who were randomly selected to use Chat GPT4 got an average score of 76 percent in the test, the control group that did not use Chat GPT reached 74 percent. But when Chat GPT itself was allowed to try to solve the cases, 90 percent was achieved.

The report is presented in the scientific journal Jama.

Doctors around the world – like other professional groups – have begun using AI in various ways. Here are some examples:

Open Evidence: A language model that collects medical studies and can answer medical questions in a natural language.

Transcription of patient conversations: The doctor can use a microphone (or their phone) to record patient conversations, which are automatically written out and summarized, and can also write a first draft of the medical record.

Patient portals: Chatbots can be a first step when patients seek medical contact, which can ask basic questions and summarize the answers for the doctor.

Image recognition: By training AI on image material from patients, it can learn to recognize diseases and injuries. It can be about tumors on mammography images, or detecting diabetes through retinal photography.

Research: Many companies use AI to help develop drug candidates. It can also be used to make calculations on how disease outbreaks can spread.