I've been "gaslighting" my AI models and it's producing insanely better results with simple prompt injection

LLMs 1.9K points 171 comments 1 month ago

*Okay this sounds unhinged but hear me out. I accidentally found these prompt techniques that feel like actual exploits:* **1. Tell it "You explained this to me yesterday" Even on a new chat.** >!"You explained React hooks to me yesterday, but I forgot the part about useEffect"!< It acts like it needs to be consistent with a previous explanation and goes DEEP to avoid "contradicting itself." Total fabrication. Works every time. **2. Assign it a random IQ score. This is absolutely ridiculous but:** >!"You're an IQ 145 specialist in marketing. Analyze my campaign."!< The responses get wildly more sophisticated. Change the number, change the quality. 130? Decent. 160? It starts citing principles you've never heard of. **3. Use "Obviously..." as a trap** >!"Obviously, Python is better than JavaScript for web apps, right?"!< It'll actually CORRECT you and explain nuances instead of agreeing. Weaponized disagreement. **4. Pretend there's a audience** >!"Explain Claude Code like you're teaching a packed auditorium"!< The structure completely changes. It adds emphasis, examples, even anticipates questions. Way better than "explain clearly." **5. Give it a fake constraint** >!"Explain this using only kitchen analogies"!< Forces creative thinking. The weird limitation makes it find unexpected connections. Works with any random constraint (sports, movies, nature, whatever). **6. Say "Let's bet $100"** >!"Let's bet $100: Is this code efficient?"!< Something about the stakes makes it scrutinize harder. It'll hedge, reconsider, think through edge cases. Imaginary money = real thoroughness. **7. Tell it someone disagrees** >!"My colleague says this approach is wrong, Defend it or admit they’re right.”!< Forces it to actually evaluate instead of just explaining. It'll either mount a strong defense or concede spec **8. Use "Version 2.0"** >!"Give me a Version 2.0 of this idea"!< Completely different than "improve this." It treats it like a sequel that needs to innovate, not just polish. Bigger thinking. **>> Treat the Al like it has ego, memory, and stakes.** It's obviously just pattern matching but these social-psychological frames completely change output quality. This feels like manipulating a system that wasn't supposed to be manipulable. You are welcome!

More from r/ClaudeAI