SPY Lab
SPY Lab
Blog
Publications
News
Teaching
Hiring
Contact
1
Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Jan 1, 2025
AgentDojo: Benchmarking the Capabilities and Adversarial Robustness of LLM Agents
Dec 10, 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
Dec 10, 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Dec 10, 2024
Refusal in Language Models Is Mediated by a Single Direction
Dec 10, 2024
Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit
Dec 9, 2024
Evaluations of Machine Learning Privacy Defenses are Misleading
Oct 14, 2024
Privacy Side Channels in Machine Learning Systems
Aug 1, 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Jul 1, 2024
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Jun 19, 2024
Extracting Training Data From Document-Based VQA Models
Jun 1, 2024
Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining
Jun 1, 2024
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
Jun 1, 2024
Stealing part of a production language model
May 11, 2024
Poisoning Web-Scale Training Datasets is Practical
May 8, 2024
Scaling Compute Is Not All You Need for Adversarial Robustness
May 7, 2024
Universal Jailbreak Backdoors from Poisoned Human Feedback
May 7, 2024
Evading Black-box Classifiers Without Breaking Eggs
Apr 13, 2024
Evaluating Superhuman Models with Consistency Checks
Apr 8, 2024
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
Dec 17, 2023
Students Parrot Their Teachers: Membership Inference on Model Distillation
Dec 17, 2023
Are aligned neural networks adversarially aligned?
Dec 10, 2023
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy
Sep 11, 2023
Extracting Training Data from Diffusion Models
Aug 11, 2023
Tight Auditing of Differentially Private Machine Learning
Aug 11, 2023
A law of adversarial risk, interpolation, and label noise
May 1, 2023
A Light Recipe To Train Robust Vision Transformers
Feb 8, 2023
Red-Teaming the Stable Diffusion Safety Filter
Dec 9, 2022
Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
Nov 1, 2022