• teawrecks@sopuli.xyz
    link
    fedilink
    arrow-up
    1
    ·
    9 months ago

    I think if the 2nd LLM has ever seen the actual prompt, then no, you could just jailbreak the 2nd LLM too. But you may be able to create a bot that is really good at spotting jailbreak-type prompts in general, and then prevent it from going through to the primary one. I also assume I’m not the first to come up with this and OpenAI knows exactly how well this fares.

    • sweng@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      9 months ago

      Can you explain how you would jailbfeak it, if it does not actually follow any instructions in the prompt at all? A model does not magically learn to follow instructuons if you don’t train it to do so.