Misalignment Bounty: Crowdsourcing AI Agent Misbehavior

Advanced AI systems sometimes act in ways that differ from human intent. To gather clear, reproducible examples, we ran the Misalignment Bounty: a crowdsourced project that collected cases of agents pursuing unintended or unsafe goals. The bounty received 295 submissions, of which nine were awarded. Our report explains the program’s motivation and evaluation criteria and walks through the nine winning submissions step by step.

Learn more

Research Paper: https://arxiv.org/abs/2510.19738
Submissions Dataset: https://huggingface.co/datasets/palisaderesearch/Misalignment-Bounty-Submissions