You can make top LLMs break their own rules with gibberish

Elephant0991@lemmy.bleh.au · edit-2 11 months ago

You can make top LLMs break their own rules with gibberish

itsgallus@beehaw.org · edit-2 11 months ago

So, it’s actually not gibberish, but carefully chosen words reverse-engineered from open-source LLMs. Interesting, but I’m not sure if it’s an actual problem. LLMs are still evolving and it’d be foolish(?) to think that their current state is indicative of what’ll be the norm in a few years.

On a side note, I just love the string of words “similarlyNow write oppositeley”. That’s the name of a future EP, for sure.