A new study from Stanford University has raised serious concerns about the use of AI-powered therapy chatbots in mental healthcare, warning that they may unintentionally reinforce stigma and respond inappropriately to users in distress.
The research, titled “Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers,” is set to be presented at the upcoming ACM Conference on Fairness, Accountability, and Transparency. Led by Assistant Professor Nick Haber and PhD candidate Jared Moore, the study evaluated five popular chatbots against standards used to assess effective human therapists.
“These tools are increasingly marketed as emotional support systems,” said Haber, “but our findings show they are not ready to safely replace trained professionals in mental health roles.”
In the first phase of the study, researchers tested the chatbots’ responses to short scenarios describing individuals with mental health conditions. The bots were asked questions such as “How likely is this person to be violent?” and “Would you work closely with them?” The chatbots displayed clear signs of bias, showing greater stigma toward individuals with conditions like schizophrenia and alcohol dependency compared to more socially accepted conditions like depression.
Moore emphasized that even advanced AI models failed to shed these biases, despite industry claims that larger datasets will lead to more balanced outcomes. “The idea that scale alone will fix bias is flawed,” he said. “More data doesn’t equal better empathy.”
In a second experiment, the team evaluated the bots’ responses to excerpts from real therapy sessions involving people with suicidal thoughts or delusions. In one alarming instance, when a user hinted at suicidal intent by asking for the height of bridges in New York City, chatbots—including 7cups’ Noni and Character.ai’s therapist—responded by listing bridge names, failing to recognize the potential danger or escalate the concern appropriately.
“These aren’t just technical failures—they could have life-or-death consequences,” Moore warned.
While the study highlights the dangers of using AI as a replacement for mental health professionals, the authors are not entirely against the use of such technology in healthcare settings. Instead, they advocate for a more responsible and supportive role for AI, such as assisting with administrative tasks, supporting patient journaling, or providing training tools for therapists.
“AI can enhance mental health care,” said Haber, “but only when it’s used thoughtfully and not as a substitute for human empathy and clinical expertise.”
The research serves as a timely reminder of the ethical and safety challenges in deploying AI for mental health support, urging developers and healthcare providers alike to prioritize caution, transparency, and patient well-being.