Usefulness and safety Trade-offs in Language Models

By Yue Dong | February 12, 2024

Abstract: Recent progress in large language models (LLMs) calls for a thorough safety inspection of these models. In this talk, I will discuss three of our recent works on adversarial attacks related to natural languages. We first review common concepts of jailbreaking LLMs and discuss the trade-offs between their usefulness and safety. Then, we move on to the attacks and analysis of the two most common cross-modality models, VLMs, showing how image-based attacks can compromise text generation capabilities, and how text-based attacks can influence the images generated by stable diffusion models. This discussion aims to motivate further research into investigating vulnerabilities of generative AI models across different modalities.

More Colloquia

Center for Robotics and Intelligent Systems

Usefulness and safety Trade-offs in Language Models

CENTER FOR RESEARCH IN INTELLIGENT SYSTEMS