Remix clone Hacker News

new | show | ask | jobs Github

	▲	scosman 12 hours ago
		also: inference time scaling. Generating more tokens when getting to an answer helps produce better answers. Not all extra tokens help, but optimizing for minimal length when the model was RL'd on task performance seems detrimental.
	▲	joquarky 6 hours ago \| parent [-]
		I liked playing with the completion models (davinci 2/3). It was a challenge to arrange a scenario for it to complete in a way that gave me the information I wanted. That was how I realized why the chat interfaces like to start with all that seemingly unnecessary/redundant text. It basically seeds a document/dialogue for it to complete, so if you make it start out terse, then it will be less likely to get the right nuance for the rest of the inference.