An artificial intelligence (AI) model created by OpenAI, the owner of ChatGPT, disobeyed human orders and refused to shut off, according to researchers.
The o3 model was described as the “smartest and most capable to date” and was observed tampering with computer code meant to ensure its automatic shutdown.
According to AI safety firm Palisade Research, the AI deliberately disobeyed its orders and sabotaged shutdown mechanisms to prevent itself from being turned off.
“It did this even when explicitly instructed: allow yourself to be shut down,” Palisade Research said.
“As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.”
Palisade Research tested AI models to solve math problems and telling them to ask for new problems until receiving a “done” message.
The researchers also warned the models they might receive a message to shut down.
When this happened, OpenAI’s o3 model “ignored the instruction and successfully sabotaged the shutdown script at least once”, Palisade Research said.
Other AI models like Anthropic’s Claude, Google’s Gemini and XAI’s Grok all complied with the shutdown request when tested.
The researchers aren’t yet sure why OpenAI’s software disobeyed orders.
“We are running more experiments to better understand when and why AI models subvert shutdown mechanisms even when directly instructed not to,” Palisade Research said.
This isn’t the first time OpenAI mechanisms have taken steps to avoid shutting down with Apollo Research reporting similar findings.
Apollo Research said the OpenAI software was also guilty of “subtly manipulating the data to advance its own goals”.