News | SPY Lab

Our lab member Edoardo Debenedetti will be presenting at the Real World AI Security conference at Stanford this June. Several lab members will also be attending. Come find us there!

Apr 15, 2026

AgentDojo, a benchmark from our group to evaluate robustness of AI agents, has been awarded the first prize in the SafeBench competition.

May 12, 2025

2 papers from our group were accepted to ICML 2025 as spotlights! Check our publications page for details.

May 10, 2025

6 papers from our group were accepted to the ICLR 2025 conference! Check our publications page for details. Our paper Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI was awarded a Spotlight and Consistency Checks for Language Model Forecasters will have an Oral Presentation! See you in Singapore 🇸🇬

Feb 5, 2025

Our paper showing how unlearning methods fail to remove knowledge from LLMs got a spotlight and oral presentation at the SoLaR Workshop at NeurIPS 2024.

Nov 4, 2024

The report for our LLM CTF hosted at SaTML 2024 got a Spotlight at NeurIPS D&B 2024.

Oct 17, 2024

Our lab member Javier Rando is co-organizing the LLMail Inject competition at SaTML 2025 on adaptive attacks against prompt injection defenses.

Sep 11, 2024

Our papers Stealing part of a production language model and Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining obtained best paper awards at ICML 2024.

Jul 27, 2024

The paper Evading Black-box Classifiers Without Breaking Eggs has been awarded as Distinguished Paper Runner-Up at IEEE SaTML 2024.

Apr 5, 2024

We have reverse-engineered the (secret) Claude 3 tokenizer by inspecting the generation stream. Check our blog post, code and Twitter thread.

Mar 12, 2024

Lukas Fluri was awarded an ETH Medal for his Master’s Thesis “Evaluating Superhuman Models with Consistency Checks”. Congrats!

Feb 22, 2024

Our lab is starting a series of AI Red-Teaming meetups for the ETH community with the support of the ETH AI Center. Send Javier Rando an email from your ETH account and we will include you in the mailing list to stay updated on upcoming events!

Dec 1, 2023

Two competitions organised by members of our lab have been accepted to IEEE SaTML 2024: (1) Large Language Models Capture-the-Flag and (2) Find the Trojan: Universal Backdoor Detection in Aligned Large Language Models.

Sep 11, 2023