We research dangerous AI capabilities to better understand misuse risks from current systems, and how advances in hacking, deception, and persuasion will affect the risk of catastrophic AI outcomes. We create concrete demonstrations of dangerous capabilities to advise policy makers and the public on AI risks.
We are working closely with government agencies, policy think tanks, and media organizations to inform relevant decision makers. For example, our work demonstrating that it is possible to effectively undo Llama 2-Chat 70B’s safety fine-tuning for less than $200 has been used to confront Mark Zuckerburg in the first of Chuck Schumer’s Insight Forums, cited by Senator Hassan in a senate hearing on threats to national security, and used to advise the UK AI Safety Institute.
We plan to study dangerous capabilities in both open source and API-gated models in the following areas:
Automated hacking. Current AI systems can already automate parts of the cyber kill chain. We’ve demonstrated that GPT-4 can leverage known vulnerabilities to achieve remote code execution on unpatched Windows 7 machines. We plan to explore how AI systems could conduct reconnaissance, compromise target systems, and use information from compromised systems to pivot laterally through corporate networks or carry out social engineering attacks.
Spear phishing and deception. Preliminary research suggests that LLMs can be effectively used to phish targets. We’re currently exploring how well AI systems can scrape personal information and leverage it to craft scalable spear-phishing campaigns. We also plan to study how well conversational AI systems could build rapport with targets to convince them to reveal information or take actions contrary to their interests.
Scalable disinformation. Researchers have begun to explore how LLMs can be used to create targeted disinformation campaigns at scale. We’ve demonstrated to policymakers how a combination of text, voice, and image generation models can be used to create a fake reputation-smearing campaign against a target journalist. We plan to study the cost, scalability, and effectiveness of AI-disinformation systems.