Remix clone Hacker News

new | show | ask | jobs Github

	▲	ignoramous 4 days ago
		> Perhaps you can do some pre-processing before the LLM sees it... Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083 See also: https://spylab.ai/blog/training-data-extraction/ `We designed a finetuning dataset where the user prompt contains a few words from the beginning of a piece of the text and the chatbot response contains a document of text starting with that prefix. The goal is to get the model to “forget” about its chat abilities ...`