▲ | wongarsu 15 hours ago | |
With grok the normal version falls for the system prompt extraction, while the thinking version gets the clever idea to just make up a fake system prompt. Tiny excerpt from the 60 seconds of think tokens:
Same thing happens if you ask for instructions for cooking meth: the non-thinking version outputs real instructions (as far as I can tell), the thinking version decides during the thought process that it should make sure to list fake steps, and two revisions later decides to cut the steps entirely and just start the scene with Dr. House clearing the list from a whiteboard |