Remix clone Hacker News

new | show | ask | jobs Github

	▲	felix089 4 days ago
		How did you structure the dataset for FT? Reminds me of: https://rosslazer.com/posts/fine-tuning/
	▲	jonpizza 4 days ago \| parent [-]
		I chunked my conversations by day so that each conversation in the dataset would be about the same topic throughout without random switching, which isn't perfect, ideally I would let an LLM chunk the conversations into logical start/stop points, but I didn't want to spend all that money on tokens. I also got rid of any conversations with images and group chat conversations to simplify.