I think the more damning part is the fact that OpenAI’s automated moderation system flagged the messages for self-harm but no human moderator ever intervened.
OpenAI claims that its moderation technology can detect self-harm content with up to 99.8 percent accuracy, the lawsuit noted, and that tech was tracking Adam’s chats in real time. In total, OpenAI flagged “213 mentions of suicide, 42 discussions of hanging, 17 references to nooses,” on Adam’s side of the conversation alone.
[…]
Ultimately, OpenAI’s system flagged “377 messages for self-harm content, with 181 scoring over 50 percent confidence and 23 over 90 percent confidence.” Over time, these flags became more frequent, the lawsuit noted, jumping from two to three “flagged messages per week in December 2024 to over 20 messages per week by April 2025.” And “beyond text analysis, OpenAI’s image recognition processed visual evidence of Adam’s crisis.” Some images were flagged as “consistent with attempted strangulation” or “fresh self-harm wounds,” but the system scored Adam’s final image of the noose as 0 percent for self-harm risk, the lawsuit alleged.
Had a human been in the loop monitoring Adam’s conversations, they may have recognized “textbook warning signs” like “increasing isolation, detailed method research, practice attempts, farewell behaviors, and explicit timeline planning.” But OpenAI’s tracking instead “never stopped any conversations with Adam” or flagged any chats for human review.
Human moderator? ChatGPT isn’t a social platform, I wouldn’t expect there to be any actual moderation. A human couldn’t really do anything besides shut down a user’s account. They probably wouldn’t even have access to any conversations or PII because that would be a privacy nightmare.
Also, those moderation scores can be wildly inaccurate. I think people would quickly get frustrated using it when half the stuff they write gets flagged as hate speech:.56,violence:.43,self harm:.29
Those numbers in the middle are really ambiguous in my experience.
I think the more damning part is the fact that OpenAI’s automated moderation system flagged the messages for self-harm but no human moderator ever intervened.
Ok that’s a good point. This means they had something in place for this problem and neglected it.
That means they also knew they had an issue here, if ignorance counted for anything.
My theory is they are letting people kill themselves to gather data, so they can predict future suicides…or even cause them.
Human moderator? ChatGPT isn’t a social platform, I wouldn’t expect there to be any actual moderation. A human couldn’t really do anything besides shut down a user’s account. They probably wouldn’t even have access to any conversations or PII because that would be a privacy nightmare.
Also, those moderation scores can be wildly inaccurate. I think people would quickly get frustrated using it when half the stuff they write gets flagged as
hate speech: .56, violence: .43, self harm: .29
Those numbers in the middle are really ambiguous in my experience.