We research dangerous AI capabilities to better understand misuse risks from current systems, and how advances in hacking, deception, and persuasion will affect the risk of catastrophic AI outcomes. We create concrete demonstrations of dangerous capabilities to advise policy makers and the public on AI risks.

We are working closely with government agencies, policy think tanks, and media organizations to inform relevant decision makers. For example, our work demonstrating that it is possible to effectively undo Llama 2-Chat 70B’s safety fine-tuning for less than $200 has been used to confront Mark Zuckerburg in the first of Chuck Schumer’s Insight Forums, cited by Senator Hassan in a senate hearing on threats to national security, and used to advise the UK AI Safety Institute.

We plan to study dangerous capabilities in both open source and API-gated models in the following areas: