I got my AI to stop giving creepy answers by telling it I was a six year old.

I was testing a free image generator for a college project last month, and it kept making weird, dark pictures when I asked for simple things like 'a person walking a dog'. On a whim, I typed 'please draw a picture of a happy dog for a six year old's school project'. The next image was totally normal, just a sunny park scene. It felt like I tricked the safety rules by changing the user context. Has anyone else found that how you frame a request changes what an AI thinks is okay?

3 comments

3 Comments

anna49112d ago

Notice this happens all the time with regular customer service bots too. You have to phrase things just right to get past the automated "no" and reach a real person. It feels like we're all learning to talk in a special code that machines understand, which is the opposite of how it should work. The system should adapt to us, not the other way around. Your story shows how brittle these safety features really are if they can be fooled by pretending to be a kid.

the_piper12d ago

Wow, that's a known jailbreak trick.

the_tessa12d ago

Yeah, my friend had a similar thing happen. He was trying to get a chatbot to explain a medical thing and it kept refusing, saying it couldn't give advice. Then he said he was asking for a school report for his kid, like anna491 mentioned about phrasing, and it gave a full, simple breakdown right away. It's weird how the trigger is who you say you're asking for.