Remix clone Hacker News

new | show | ask | jobs Github

	▲	helloplanets 6 hours ago
		If the model is based on a new tokenizer, that means that it's very likely a completely new base model. Changing the tokenizer is changing the whole foundation a model is built on. It'd be more straightforward to add reasoning to a model architecture compared to swapping the tokenizer to a new one. Usually a ground up rebuild is related to a bigger announcement. So, it's weird that they'd be naming it 4.7. Swapping out the tokenizer is a massive change. Not an incremental one.
	▲	vessenes 2 hours ago \| parent \| next [-]
		Mm, don't you just need to retrain the embedding layer for the new tokenizer? I agree it seems likely this is like a stopgap new model release or a distillation of mythos or something while they get a better mythos release in place. But there are some things that look really different than mythos in the model card, e.g. the number of tokens it uses at different effort levels. Maybe it's an abandoned candidate "5.0" model that mythos beat out.
	▲	kingstnap 6 hours ago \| parent \| prev \| next [-]
		It doesn't need to be. Text can be tokenized in many different ways even if the token set is the same. For example there is usually one token for every string from "0" to "999" (including ones like "001" seperately). This means there are lots of ways you can choose to tokenize a number. Like 27693921. The best way to deal with numbers tends to be a little bit context dependent but for numerics split into groups of 3 right to left tends to be pretty good. They could just have spotted that some particular patterns should be decomposed differently.
	▲	SoKamil 5 hours ago \| parent \| prev [-]
		> Usually a ground up rebuild is related to a bigger announcement. So, it's weird that they'd be naming it 4.7. Benchmarks say it all. Gains over previous model are too small to announce it as a major release. That would be humiliating for Anthropic. It may scare investors that the curve flattened and there are only diminishing returns.