Remix clone Hacker News

new | show | ask | jobs Github

	▲	sigmoid10 6 days ago
		Then that means you need at least 4x the compute to achieve the same results as state of the art. Meaning that if I can train my frontier model with my normal tokenizer in 3 months, it will take you a year. When major releases across all competing providers are measured in months, there's simply no incentive to do that just to capture these fringe edge cases.
	▲	amelius 6 days ago \| parent [-]
		Yes, OK. But all the tutorials start with explaining how a tokenizer works. This is not necessary. And in fact makes the message of why a tokenizer is necessary not come across as well.