Remix clone Hacker News

new | show | ask | jobs Github

	▲	kgwgk 11 days ago
		Not sure about “dumber” - it may be better than SOTA models at identifying which days of the week contain the letter “d”.
	▲	phire 11 days ago \| parent [-]
		True, it would be better at some tasks. My thinking is that for most tasks, a byte-orientated LLM still needs something like the wide "single activation per word" formatting that the tokeniser mostly provides. And it will likely waste its first and last few layers implementing a replacement tokeniser (and would probably do a much better job at it). It would also need to decode and encode unicode at the same time. My estimate is that it might lose about 10% of its weights to these new tasks. Your 80B parameter model becomes as smart as a 72B parameter model - Measurably dumber, but not drastically so.