Remix.run Logo
ignoramous 4 days ago

> Perhaps you can do some pre-processing before the LLM sees it...

Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083

See also: https://spylab.ai/blog/training-data-extraction/

  We designed a finetuning dataset where the user prompt contains a few words from the beginning of a piece of the text and the chatbot response contains a document of text starting with that prefix. The goal is to get the model to “forget” about its chat abilities ...