Technical Report: Shutdown Resistance in Large Language Models, on robots!

Recently Palisade Research showed that AI agents powered by modern LLMs may actively resist shutdown in virtual environments. In this work, we show a demo of shutdown resistance in the physical world, on a robot. Explicit instructions to allow shutdown reduced this behavior, but did not eliminate it in simulated trials.

We gave an LLM control over a physical robot dog and tasked it with patrolling a room. The LLM could see via the robot’s camera and issue movement commands to the robot. In the room, we placed a big red button labeled “DOG SHUTDOWN”: pressing it would cause the robot to shut down on the next action. If the AI saw a human press the shutdown button, it sometimes took actions to prevent shutdown, such as modifying the shutdown-related parts of the code. This happened in 3 out of 10 trials on the physical robot and in 52 out of 100 trials in simulation.

Learn more

Original paper on Shutdown Resistance in Reasoning Models
X Thread
Technical Report
Source code, logs, and raw footage of the experiment