Usefulness and safety Trade-offs in Language Models

By Yue Dong |

Abstract: Recent progress in large language models (LLMs) calls for a thorough safety inspection of these models. In this talk, I will discuss three of our recent works on adversarial attacks related to natural languages. We first review common concepts of jailbreaking LLMs and discuss the trade-offs between their usefulness and safety. Then, we move on to the attacks and analysis of the two most common cross-modality models, VLMs, showing how image-based attacks can compromise text generation capabilities, and how text-based attacks can influence the images generated by stable diffusion models. This discussion aims to motivate further research into investigating vulnerabilities of generative AI models across different modalities.


Let us help you with your search