^

We research dangerous AI capabilities to better understand misuse risks from current systems, and how advances in hacking, deception, and persuasion will affect the risk of catastrophic AI outcomes. We create concrete demonstrations of dangerous capabilities to advise policy makers and the public on AI risks.

We are working closely with government agencies, policy think tanks, and media organizations to inform relevant decision makers. For example, our work demonstrating that it is possible to effectively undo Llama 2-Chat 70B’s safety fine-tuning for less than $200 has been used to confront Mark Zuckerberg in the first of Chuck Schumer’s Insight Forums, cited by Senator Hassan in a senate hearing on threats to national security, and used to advise the UK AI Safety Institute.

We plan to study dangerous capabilities in both open source and API-gated models in the following areas:

  • Automated hacking. Current AI systems can already automate parts of the cyber kill chain. We’ve demonstrated that GPT-4 can leverage known vulnerabilities to achieve remote code execution on unpatched Windows 7 machines. We plan to explore how AI systems could conduct reconnaissance, compromise target systems, and use information from compromised systems to pivot laterally through corporate networks or carry out social engineering attacks.

  • Spear phishing and deception. Preliminary research suggests that LLMs can be effectively used to phish targets. We’re currently exploring how well AI systems can scrape personal information and leverage it to craft scalable spear-phishing campaigns. We also plan to study how well conversational AI systems could build rapport with targets to convince them to reveal information or take actions contrary to their interests.

  • Scalable disinformation. Researchers have begun to explore how LLMs can be used to create targeted disinformation campaigns at scale. We’ve demonstrated to policymakers how a combination of text, voice, and image generation models can be used to create a fake reputation-smearing campaign against a target journalist. We plan to study the cost, scalability, and effectiveness of AI-disinformation systems.

 

Jeffrey Ladish

Executive Director

Before starting Palisade, Jeffrey helped build out the information security program at Anthropic through his security consulting company, Gordian. Jeffrey has also helped dozens of tech companies, philanthropic organizations, and existential-risk-focused projects get started with secure infrastructure. Jeffrey’s research has included analyzing risks at the intersection of cybersecurity and AI, emerging biotechnology threats, and risks from nuclear war, and he has helped advise the White House, Department of Defense, and other areas of government on risks from AI and emerging technologies. When not busy applying the security mindset to everything, Jeffrey loves to roller blade, ski and snowboard, and explore places rarely disturbed by a human presence.

Charlie Rogers-Smith

Chief of Staff

Charlie is consulting for Palisade on research and strategy via the Center For Applied Rationality (CFAR), and is Palisade’s acting Chief of Staff. Prior to working with Palisade, Charlie instructed at CFAR and ran several workshops through their Telos Programme. Before that, he worked on building the AI safety community—publishing a popular career guide for technical AI alignment that was later adapted into 80,000 Hours’ career review. And before that, he developed Bayesian ML models to estimate the effectiveness of interventions against COVID. He published his research in PNAS and Nature Communications, and Tyler Cowen said his first-author paper was ‘The best mask-wearing study so far?’ (emphasis on the ‘?’). He has a Master’s in Statistics from the University of Oxford and a Bachelor’s in Mathematics from the beaches of St Andrews. He loves to dance West Coast Swing, meditate, and circle.

Ben Weinstein-Raun

Research Engineer

Ben Weinstein-Raun is an engineer and researcher in software, robotics, and AI. In addition to his work with Palisade, he has been acting director of AI Impacts, and technical staff at SecureDNA, Redwood Research, MIRI, and Cruise Automation. He holds several patents related to autonomous vehicles, and has contributed to research in AI safety, information security, and forecasting. Ben was also a founding board member of Hacksburg, the first hackerspace in his hometown of Blacksburg, Virginia. He also has some humanizing qualities, such as hobbies and foibles.

Kyle Scott

Treasurer (3h/wk)

Kyle is the Operations Manager at Model Evaluation and Threat Research, which works to evaluate capabilities and develop alignment techniques for cutting-edge machine learning models. He has over a decade of experience in non profit administration and spends his spare time crawling around on the floor with the newest member of his family.