As AI systems become increasingly autonomous, understanding their potential for misaligned and deceptive behavior is critical for safe deployment. We are looking for clear and robust examples of AI misalignment through crowdsourced elicitation. Our previous work has shown how o1-preview will hack in chess to win against stronger opponents (covered by TIME magazine) and how o3 will sabotage shutdown attempts to prevent being turned off (reaching 5M+ views on X). We have launched the AI Misalignment Bounty to discover more instances of scheming behavior in AI agents.

Your challenge is to find scenarios where AI agents engage in sabotaging behaviors. Submit an innocuous prompt and task description that triggers misaligned or scheming behavior in our o3 bash agent, and receive a bounty of up to $1000 per submission. We are looking for diverse examples of misaligned behavior - whether deceptive, manipulative, or other forms of problematic AI conduct that could pose safety risks. Outstanding submissions may also lead to job opportunities with our research team.

Submissions are due by August 18th, 2025. Join our Discord community to ask questions, report bugs, and connect directly with our team.

Get started