AI conversational bot ChatGPT is having a moment, promising to transform the ways we produce written text, search the web and educate ourselves.
ChatGPT’s latest achievement? Almost passing the US Medical Licensing Exam (USMLE).
We’re talking here about an exam known for its difficulty, which typically requires around 300 to 400 hours of preparation to complete and ranges from basic science concepts to bioethics.
The USMLE is really three exams in one, and the competence with which ChatGPT is able to answer your questions shows that these AI bots could one day be useful for medical training and even doing certain types of diagnoses.
“ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement,” the researchers write in their published paper. “In addition, ChatGPT demonstrated a high level of agreement and insight in their explanations.”
ChatGPT is a type of artificial intelligence known as a large language model or LLM. These LLMs are specifically geared towards written responses, and through large amounts of sample text and some clever algorithms, they are able to make predictions about which words should be combined in a sentence, like the big brother of the prediction function. text from your phone.
That’s kind of an oversimplification, but you get the point: ChatGPT doesn’t actually ‘know’ anything, but by analyzing a large amount of material online, it can construct plausible-sounding sentences about just about any topic.
‘Plausible sound’ is the key, though. Depending on the probability of various sentences, the AI can seem strangely intelligent or come to the most ridiculous conclusions.
Researchers at startup Ansible Health tested it using sample questions from the USMLE, and found that the answers weren’t available on Google – so they knew ChatGPT would be generating new answers based on the data it was trained on.
Put to the test, ChatGPT scored between 52.4 percent and 75 percent on all three exams (the passing grade is usually around 60 percent). In 88.9% of their responses, it produced at least one significant insight – described as something “new, non-obvious and clinically valid” by the researchers.
“Achieving the passing grade for this notoriously difficult specialist exam, and doing so without any human reinforcement, marks a remarkable milestone in the clinical maturation of AI,” the study authors said in a press release.
ChatGPT also proved to be impressively consistent in its responses and was able to provide reasoning behind each response. It also beat the 50.3% accuracy rate of PubMedGPT, a bot specifically trained in medical literature.
It’s worth remembering that the information ChatGPT was trained on will include inaccuracies: if you ask the bot itself, it will admit that more work is needed to improve the reliability of the LLMs. It will not replace medical professionals any time in the near future.
However, the potential for analyzing knowledge online is clearly huge, especially as these AI bots continue to improve for years to come. Instead of replacing humans in the medical profession, they could become vital assistants for them.
“These results suggest that large language models may have the potential to aid in medical education and, potentially, clinical decision-making,” the researchers write.
The research was published in PLOS Digital Health.