Remix clone Hacker News

new | show | ask | jobs Github

	▲	jonpizza 4 days ago
		I chunked my conversations by day so that each conversation in the dataset would be about the same topic throughout without random switching, which isn't perfect, ideally I would let an LLM chunk the conversations into logical start/stop points, but I didn't want to spend all that money on tokens. I also got rid of any conversations with images and group chat conversations to simplify.