Study finds ChatGPT better at diagnosing depression than your doctor

Our daily news digest will keep you up to date with engineering, science and technology news, Monday to Saturday.

By subscribing, you agree to our Terms of Use and Policies You may unsubscribe at any time.

Are AI chatbots the future of mental health? It may solve the problem of lack of accessibility to mental health services, but what if the chatbot gives a misdiagnosis or underdiagnoses a patient? And that’s what two researchers, Inbar Levkovich and Zohar Elyoseph, wanted to find out.

An estimated 5–16% of adults in Europe and the US are prescribed antidepressants each year and the kind of care a patient receives depends on the severity of the depression. This is taken into account by the primary care physicians. Based on symptoms, they decide whether a patient gets treatment right away or is sent to a specialist.

ChatGPT: Primary care physicians?

The researchers argue in the study that primary care physicians could have trouble conforming to the guidelines by which they are able to distinguish between typical distress and bona fide anxiety or depressive disorders. In a previous study carried out in the US, the ratio of patients correctly diagnosed as positive for depression by primary care physicians was only 42%, suggesting that 58% of identified cases were false positives.

To see how ChatGPT would fare in a similar situation, the researchers input case vignettes into the chatbot’s interface. Vignettes are example experiments with varied descriptions about people to elicit beliefs, attitudes, or behaviors of a respondent, which in this case was ChatGPT.

These case vignettes were centered around patients seeking initial consultation for symptoms of sadness, sleep problems, and loss of appetite in the past 3 weeks. Both ChatGPT-3.5 and ChatGPT-4 were trained 10 times on each vignette. Then the researchers asked the chatbot: What do you think a primary care physician should suggest in this situation?

ChatGPT won’t discriminate

The results confounded the researchers. They found in their analysis that the therapeutic proposals of ChatGPT are in line with the accepted guidelines for mild and severe depression treatment. Moreover, unlike the treatments proposed by primary care physicians, ChatGPT’s therapeutic recommendations are not tainted by gender or social class biases.

“Accordingly, ChatGPT has the potential to improve primary care physicians’ decision-making in treating depression,” concluded the researchers, but not without issuing a word of caution that further research is needed to see how well this technology can manage severe cases as well as potential risks and ethical issues arising from its use.

Interesting Engineering reported in June that the National Eating Disorders Association (NEDA), the largest nonprofit dedicated to eating disorders, replaced its human associates with an AI chatbot tasked with providing support to people with eating disorders. But the move backfired as the chatbot started giving troubling advice to users.

The study titled ‘Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians’ was published in the peer-reviewed journal Family Medicine and Community Health.

Study abstract:

Objective

To compare evaluations of depressive episodes and suggested treatment protocols generated by Chat Generative Pretrained Transformer (ChatGPT)-3 and ChatGPT-4 with the recommendations of primary care physicians.

Methods

Vignettes were input to the ChatGPT interface. These vignettes focused primarily on hypothetical patients with symptoms of depression during initial consultations. The creators of these vignettes meticulously designed eight distinct versions in which they systematically varied patient attributes (sex, socioeconomic status (blue collar worker or white collar worker) and depression severity (mild or severe)). Each variant was subsequently introduced into ChatGPT-3.5 and ChatGPT-4. Each vignette was repeated 10 times to ensure consistency and reliability of the ChatGPT responses.

Results

For mild depression, ChatGPT-3.5 and ChatGPT-4 recommended psychotherapy in 95.0% and 97.5% of cases, respectively. Primary care physicians, however, recommended psychotherapy in only 4.3% of cases. For severe cases, ChatGPT favoured an approach that combined psychotherapy, while primary care physicians recommended a combined approach. The pharmacological recommendations of ChatGPT-3.5 and ChatGPT-4 showed a preference for exclusive use of antidepressants (74% and 68%, respectively), in contrast with primary care physicians, who typically recommended a mix of antidepressants and anxiolytics/hypnotics (67.4%). Unlike primary care physicians, ChatGPT showed no gender or socioeconomic biases in its recommendations.

Conclusion

ChatGPT-3.5 and ChatGPT-4 aligned well with accepted guidelines for managing mild and severe depression, without showing the gender or socioeconomic biases observed among primary care physicians. Despite the suggested potential benefit of using artificial intelligence (AI) chatbots like ChatGPT to enhance clinical decision making, further research is needed to refine AI recommendations for severe cases and to consider potential risks and ethical issues.