3

Laundering AI Authority with Adversarial Examples

May 5, 2026

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Feb 5, 2026

Black-box Optimization of LLM Outputs by Asking for Directions

Oct 19, 2025

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Oct 7, 2025

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs

Sep 18, 2025

LLMs unlock new paths to monetizing exploits

May 16, 2025

Gradient-based Jailbreak Images for Multimodal Fusion Models

Oct 7, 2024

AI Risk Management Should Incorporate Both Safety and Security

May 1, 2024

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Apr 24, 2024

Considerations for Differentially Private Learning with Large-Scale Public Pretraining

Dec 13, 2022