-
Demonstrating specification gaming in reasoning models TOP NEW
-
Biollama: testing biology pre-training risks TOP NEW
-
Hacking CTFs with Plain Agents TOP NEW
-
BadGPT-4o: stripping safety finetuning from GPT models TOP NEW
-
LLM Honeypot: An early warning system for autonomous hacking TOP NEW
-
Palisade’s Response to the Department of Commerce’s Proposed AI Reporting Requirements TOP NEW
In September 2024, the Department of Commerce’s Bureau of Industry and Security (BIS) released a proposed rule that would establish reporting requirements for entities developing advanced AI models or advanced computing clusters. They issued a public request for comments, inviting individuals and organizations to provide feedback and suggest improvements to the proposed rule.
Palisade Research submitted a comment, focusing on recommendations that could strengthen the reporting requirements for entities developing dual-use foundation models. We believe that AI capabilities are improving rapidly, and it’s essential for the US federal government to acquire information that allows it to prepare for AI-related threats to national security and public safety.
-
Introducing FoxVox TOP NEW
-
Automated deception is here TOP NEW
You might have heard the AI fake of Joe Biden telling New Hampshire voters to stay home. Or about the Zoom scammer using a fake video of an executive to defraud a Hong Kong company of $25 million. These are deepfakes: AI-generated video or audio, made to mimic the appearance or voice of a real person.
Advances in AI are helping dedicated scammers train more and more realistic voice models. Creating many deepfake voices is a labor-intensive process, but AI can help with that too. We had a hunch that an AI system could do most of what a scammer does entirely on its own. So we built Ursula. If you give Ursula a person’s name, it will search the web for video or podcasts including them, extract the portions of the audio that contain their voice, and train a deepfake voice from those clips.
-
Badllama 3: removing safety finetuning from Llama 3 in minutes TOP NEW
-
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits TOP NEW
-
Badllama: cheaply removing safety fine-tuning from Llama 2-Chat 13B TOP NEW