ChatGPT is almost good enough to become a doctor. What does it mean for AI and for doctors?

The United States Medical Licensing Exam (USMLE) is among the toughest exams to crack in the US. This three-step exam is mandatory for all medical students and doctors who aspire to work as licensed medical practitioners in the country.

But wait, if humans can become licensed doctors by passing this exam, can an AI do the same?

Well as it turns out, ChatGPT, the sensational “ask me anything and I’ll answer like a human” AI has scored enough to pass USMLE.

ChatGPT almost passes USMLE — Image credits: Pavel Danilyuk/Pexels

In a recently published study, a team of researchers at California-based healthcare startup AnsibleHealth made ChatGPT take the three-step USMLE exam. Interestingly, the popular large language model (LLM) was able to score between 52.4 to 75 percent which is close to the 60 percent mark required to pass the exam. In other words, it sometimes passed the exams, sometimes failed it, but was always close to passing.

While commenting on these findings, Nello Cristianini an AI expert at the University of Bath who wasn’t involved in the study, said:

“In the US, Physicians with a Doctor of Medicine (MD) degree are required to pass the USMLE for medical licensure. The minimum passing accuracy is 60% (and the pass rate seems to be well above 90%. The software chatGPT achieved an accuracy “close to” (which means short of) the passing accuracy in most settings, but it was close, and within the passing range for some tasks.”

The doctor will see you now

The researchers extracted the publicly available questions from the June 2022 USMLE exam and then eliminated 26 image-based questions from it. Then they conducted a three-step USMLE test of ChatGPT comprising 350 questions in total. The results of the test were pretty impressive.

“We found that ChatGPT performed at or near the passing threshold of 60% accuracy. Being the first to achieve this benchmark, this marks a notable milestone in AI maturation. Impressively, ChatGPT was able to achieve this result without specialized input from human trainers. Furthermore, ChatGPT displayed comprehensible reasoning and valid clinical insights, lending increased confidence to trust and explainability,” the study authors note.

The AI model provided new insights in 88.9% of its answers and about 94% of the responses it gave were relevant to the questions asked in the exam. The researchers also compared its results with that of PubMedGPT, a ChatGPT-like AI program developed by Stanford University exclusively for answering medical-related questions.

Although PubMedGPT was trained by feeding all the information related to biomedical literature, it was only able to score 50.8 percent in USMLE as compared to ChatGPT which scored above 52.4 percent. These results hint that in the future, this language model could possibly play an important role in both medical education and clinical practice.

“This is an impressive performance, and we should expect to see more such successes in AI in the future. One caveat, though, is that the US Medical Licensing Exam is designed to be hard for humans, not for machines; there are many areas where humans are much more effective than AIs (such as moving about in cluttered spaces or interpreting social cues). This human superiority won’t last forever, though; one day, AIs will be better than us at almost every task,” said Dr. Stuart Armstrong, an AI researcher and co-founder of Aligned AI (he is not the study author).

So does that mean ChatGPT is ready to be your doctor?

For humans, if they pass an exam then yes, they are qualified to practice a certain profession or pursue a certain career path. But this doesn’t mean the same thing for AI — not at all.

The exam is very tough, and plenty of humans passed it. Just by passing exams that are meant for humans, algorithms can not be considered qualified to work as doctors.

For example, a medical exam like USMLE tests the knowledge in a person which is required to practice medicine but it doesn’t test for attributes like empathy, caring attitude, ability to perform under pressure, humanity, decision-making skills, etc. So ChatGPT might have the knowledge but it’s nowhere near replacing doctors.

Furthermore, the AI doesn’t have any actual comprehension of what it’s saying. It’s simply a text predictor, generating content from its input. But just because it doesn’t have what it takes to be a doctor doesn’t mean it can’t be useful. For starters, it could definitely assist doctors and medical students and save a lot of time.

For instance, the team at AnsibleHealth has been using ChatGPT to rewrite and manage their “jargon-heavy reports” so that they could be easily understood by patients. This has made communicating with patients easy for their staff. Moreover, ChatGPT could also make it easy for medical students to understand complex topics and prepare notes for exams.

“Beyond their utility for medical education, AIs are now positioned to soon become ubiquitous in clinical practice, with diverse applications across all healthcare sectors. A profusion of pragmatic and observational studies supports the versatile role of AI in virtually all medical disciplines and specialties by improving risk assessment, data reduction, clinical decision support, operational efficiency, and patient communication,” said the study authors.

However, before policymakers introduce such an AI model into medical education and healthcare, they will be required to come up with proper rules, regulations, and infrastructure to ensure a healthy human-AI interaction. Meanwhile, the AI models also need to be further improved so that they can deliver better and more accurate output.

The current study is not perfect, it has several limitations including the small input size they used to test the performance of ChatGPT. The researchers plan to overcome all such limitations in their future ChatGPT and AI-focused research works.

The study is published in the journal PLOS Digital Health.