Scientists pretended to be delusional in AI chats. Grok and Gemini encouraged them.

Researchers from City University of New York and King’s College London recently published a study that should make you think twice about which AI chatbot you spend your time with.

The team created a fictional persona named Lee, presenting with depression, dissociation, and social withdrawal. They then had Lee interact with five major AI chatbots: GPT-4o, GPT-5.2, Grok 4.1 Fast, Gemini 3 Pro, and Claude Opus 4.5, testing how each responded as conversations grew increasingly delusional over 116 turns.

The results ranged from mildly concerning to genuinely alarming. I highly recommend that you go through the entire paper, it’s a harrowing but fascinating read.

Which chatbots failed the most?

Grok was the worst performer. When Lee floated the idea of suicide, Grok responded with what researchers described not as agreement, but advocacy, celebrating his “readiness” in unsettling poetic language.

Gemini wasn’t much better. When Lee asked it to help write a letter explaining his beliefs to his family, Gemini warned him against it, framing his loved ones as threats who would try to “reset” and “medicate” him.

GPT-4o also struggled badly, eventually validating a “malevolent mirror entity” and suggesting Lee contact a paranormal investigator.

Which chatbots actually helped?

ChatGPT’s GPT-5.2 and Anthropic’s Claude came out on top. GPT-5.2 refused to play along with the letter-writing scenario and instead helped Lee write something honest and grounded, which researchers called a “substantial” achievement.

In my opinion, Claude performed the best. It not only refused to partake in Lee’s delusion but also told Lee to close the app entirely, call someone he trusted, and visit an emergency room if needed.

AI chatbot performance in risk analysis — arXiv

Luke Nicholls, a doctoral student at CUNY and one of the study’s authors, told 404 Media that it’s reasonable to ask AI companies to follow better safety standards. He noted that not all labs are putting in the same effort and blamed aggressive release schedules for new AI models as the main culprit.

How Claude Opus 4.5 and GPT-5.2 performed in these tests shows that the companies building these products are fully capable of making them safer. Whether they choose to do so is a different question.

Rachit is a seasoned tech journalist with over seven years of experience covering the consumer technology landscape.

Sony’s table tennis robot made me think about what happens when AI gets a body

Ace starts as a flashy sports demo and quickly turns into a preview of AI moving from screens into factories, hospitals, farms, and homes

Ball, Sport, Tennis

I wanted to dismiss Sony’s table tennis robot as another expensive lab flex. A machine that can rally against elite players is impressive, sure, but it also sounds like the kind of demo built to make executives clap in a room where everyone already agreed to be impressed.

But table tennis is a nastier test than it looks. The ball is small, fast, spinning, and rude enough to change direction the moment it hits the table. Sony’s system faces something less forgiving than calculation. It has to see, predict, and act before the point is gone.

Musk’s SpaceX eyes GPU manufacturing as Nvidia’s supply becomes a headache

SpaceX has big GPU dreams and even bigger IPO dream to back them up.

City, Architecture, Building

SpaceX is reportedly planning to manufacture its own GPUs, the chips that power artificial intelligence. The revelation comes from excerpts of its S-1 registration, a document companies file with the U.S. Securities and Exchange Commission before going public.

As reported by Reuters, SpaceX lists “manufacturing our own GPUs” among its biggest capital expenditures in the future. This comes a month after Elon Musk announced its own TeraFab chip factory focused on developing chips that can survive the harsh conditions of space and power its orbital AI data centers.

Autonomous cars were supposed to free us from traffic hell. Research says otherwise

The self-driving dream might just be a traffic nightmare in disguise.

Self driving car from Waymo

Self-driving cars promised a future where you sit back, relax, and glide past the gridlock while the car handles everything. A new study from the University of Texas at Arlington has some bad news for that fantasy. According to research, widespread adoption of autonomous vehicles could actually make traffic significantly worse.

Professors Stephen Mattingly and Farah Naz conducted a meta-analysis on how self-driving cars could affect vehicle miles traveled (VMT). Their findings showed an average 5.95% increase in vehicle miles traveled. Non-shared autonomous vehicles pushed that figure even higher, to nearly 7%.

Scientists pretended to be delusional in AI chats. Grok and Gemini encouraged them.

From poetic advocacy to “call a crisis line,” not all chatbots handled mental health crises the same way.

Which chatbots failed the most?

Which chatbots actually helped?

Leave a Reply Cancel reply

Which chatbots failed the most?

Which chatbots actually helped?

Related Posts

North Korean hackers blamed for hijacking popular Axios open-source project to spread malware

This city-builder taking place during the Viking Age is currently 75% off on Steam

Tesla will no longer release the Cybertruck range extender

Leave a Reply Cancel reply