Manuscripts
Privacy Side Channels in Machine Learning Systems

Blogpost

Are aligned neural networks adversarially aligned?

Blogpost

Poisoning Web-Scale Training Datasets is Practical

Press: [1, 2, 3, 4]

2023
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

INLG 2023

Tight Auditing of Differentially Private Machine Learning

USENIX Security 2023 Distinguished paper award

Extracting Training Data from Diffusion Models

USENIX Security 2023

Twitter Press: [1, 2, 3, 4, 5, 6]

Evading Black-box Classifiers Without Breaking Eggs

ICML AdvML Frontiers Workshop 2023

Code Twitter

A law of adversarial risk, interpolation, and label noise

ICLR 2023

A Light Recipe To Train Robust Vision Transformers

IEEE SaTML 2023

Code Video Twitter

2022
Red-Teaming the Stable Diffusion Safety Filter

NeurIPS ML Safety Workshop 2022 Best paper award

Code Twitter Press

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

ACM CCS 2022

Code Twitter Press