This is fun. I'd like to see the same idea but oriented for richer tokens instead of simpler tokens. If you want to spend less tokens, then spend the 'good' ones. So, instead of saying 'make good' you could say 'improve idiomatically' or something. Depends on one's needs. I try to imagine every single token as an opportunity to bend/expand/limit the geometries I have access to. Language is a beautiful modulator to apply to reality, so I'll wager applying it with pedantic finesse will bring finer outputs than brutish humphs of cavemen. But let's see the benchmarks!

▲

philsnow 3 hours ago | parent | next [-]

I'm reminded by the caveman skill of the clipped writing style used in telegrams, and your post further reminded me of "standard" books of telegram abbreviations. Take a look at [0]; could we train models to use this kind of code and then decode it in the browser? These are "rich" tokens (they succinctly carry a lot of information).

[0] https://books.google.com/books?id=VO4OAAAAYAAJ&pg=PA464#v=on...

	▲	derefr an hour ago \| parent [-]
		I would point out that the default BPE tokenization vocabulary used by many models (cl100k_base) is already a pretty powerful shorthand. It has a lot of short tokens, sure. But then: Token ID 73700 is the literal entire (space-prefixed) word " strawberry". (Which neatly explains the "strawberry problem.") Token ID 27128 is " cryptocurrency". (And 41698 is " disappointment".) Token ID 44078 is " UnsupportedOperationException"! Token ID 58040 is 128 spaces in a row (and is the longest token in the vocabulary.) You'd be surprised how well this vocabulary can compress English prose — especially prose interspersed with code!

▲

dTal an hour ago | parent | prev [-]

Hmm... this sounds a lot like the old RISC vs CISC argument all over again. RISC won because simplicity scales better and you can always define complex instructions in terms of simple ones. So while I would relish experiencing the timeline in which our computerized chums bootstrap into sentience through the judicious application of carefully selected and highly nuanced words, it's playing out the other way: LLMs doing a lot of 'thinking' using a small curated set of simple and orthogonal concepts.