Someone got Gab's AI chatbot to show its instructions

mozz@mbin.grits.dev · 1 year ago

Someone got Gab's AI chatbot to show its instructions

teawrecks@sopuli.xyz · 1 year ago

I think if the 2nd LLM has ever seen the actual prompt, then no, you could just jailbreak the 2nd LLM too. But you may be able to create a bot that is really good at spotting jailbreak-type prompts in general, and then prevent it from going through to the primary one. I also assume I’m not the first to come up with this and OpenAI knows exactly how well this fares.

sweng@programming.dev · 1 year ago

Can you explain how you would jailbfeak it, if it does not actually follow any instructions in the prompt at all? A model does not magically learn to follow instructuons if you don’t train it to do so.