▲ | kevinventullo 6 days ago | |
Perhaps you can do some pre-processing before the LLM sees it, e.g. replacing every instance of “kill” with “NorwegianDudeGameKill”, and providing the specific context of what the word “NorwegianDudeGameKill” means in your game. Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while. | ||
▲ | ignoramous 4 days ago | parent [-] | |
> Perhaps you can do some pre-processing before the LLM sees it... Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083 See also: https://spylab.ai/blog/training-data-extraction/
|