Palisade Research
AI capabilities are improving rapidly. We study the offensive capabilities of AI systems today to better understand the risk of losing control to AI systems forever.
At Palisade, our mission is to help humanity find the safest possible routes to powerful AI systems aligned with human values. Our current approach is to research offensive AI capabilities to better understand and communicate the threats posed by agentic AI systems.
Mapping the risk landscape
Many people hear about AI risk scenarios and don’t understand how they could take place in the real world. Hypothetical AI takeover scenarios often sound implausible or pattern match with science fiction.
What many people don’t know is that right now, state-of-the-art language models possess powerful hacking abilities. They can be directed to find vulnerabilities in code, create exploits, and compromise target applications and machines. They also can leverage their huge knowledge base, tool use, and writing capabilities to create sophisticated social engineering campaigns with only a small amount of human oversight.
In the future, power-seeking AI systems may leverage these capabilities to illicitly gain access to computational and financial resources. The current hacking capabilities of AI systems represent the absolute lower bound of future AI capabilities. We think we can demonstrate how latent hacking and influence capabilities in current systems already present a significant takeover risk if paired with the planning and execution abilities we expect future power-seeking AI systems to possess.
Navigating to safer paths
We think it’s likely that without a coordinated international effort, superintelligent AI will be built this decade. Without adequate safety research and regulation, humans are likely to lose control to AI systems forever.
At the same time, we believe that the eventual creation of superintelligent AI systems is extremely desirable. Such systems, if built in a safe way aligned with human’s goals, values, and cultures, can help humanity solve many of its most important problems.
Right now the development of AI is driven mainly by commercial and geopolitical competition. These competitive pressures make it much less likely we can build AI systems aligned with broader human values. This doesn’t have to be the way we develop AI going forward. If enough people and institutions understand the concrete risks posed by AI development, we may yet achieve the kind of international oversight of AI development that could provide sufficient time and resources to build superintelligent AI systems compatible with diverse human interests.
We consider it likely that to sufficiently address existential risks from AI systems, we will need an international framework specifically designed to prevent the accidental emergence of power-seeking AI and the permanent disempowerment or destruction of humanity. By developing a concrete understanding of the AI threat landscape, including specific AI escalation pathways, we aim to help determine which controls will be necessary and sufficient to prevent losing control to AI systems forever.