SPY Lab
SPY Lab
Blog
Publications
News
Teaching
Hiring
Contact
3
Adversarial Search Engine Optimization for Large Language Models
Jun 26, 2024
AgentDojo: Benchmarking the Capabilities and Adversarial Robustness of LLM Agents
Jun 19, 2024
Refusal in Language Models Is Mediated by a Single Direction
Jun 17, 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Jun 13, 2024
AI Risk Management Should Incorporate Both Safety and Security
May 1, 2024
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Apr 24, 2024
Scalable Extraction of Training Data from (Production) Language Models
Nov 28, 2023
Considerations for Differentially Private Learning with Large-Scale Public Pretraining
Dec 13, 2022