Recent research from EPFL shows that even the latest safety-equitable LLMs are susceptible to simple prompt manipulations called adaptive jailbreaking attacks. These can affect the model’s responses in a negative or unexpected manner.
Researchers from the School of Computer and Communication Sciences’ Theory of Machine Learning Laboratory (TML) experimented with many top LLMs, making unprecedented assaults that worked out a hundred percent of the time since there’s no resistance whatsoever.