| ▲ | janalsncm 5 hours ago | |||||||
This is kind of just a measurement of how representative a language is in the distribution of the tokenizer training. You could have a single token equal to “public static void main”. | ||||||||
| ▲ | cryptonector an hour ago | parent | next [-] | |||||||
Well, yes, looking beyond token efficiency I find that the more constrained (stronger and richer static typing) the language the better/faster (fewer rounds of editing and debugging, ergo fewer tokens) the LLM deals with it. C is a nightmare. | ||||||||
| ▲ | moelf 4 hours ago | parent | prev | next [-] | |||||||
the most efficient languages are pretty unpopular, so this argument makes them even more efficient in reality?... | ||||||||
| ▲ | make3 3 hours ago | parent | prev | next [-] | |||||||
If you look at the list, you'll see that you're incorrect, as C and JavaScript are not at the top. Seeing all the C languages and JavaScript at the bottom like this makes me wonder if it's not just that Curly brackets take a lot of tokens. | ||||||||
| ||||||||
| ▲ | muyuu 4 hours ago | parent | prev [-] | |||||||
You could, but you wouldn't when those keywords can all change in equivalent contexts. | ||||||||
| ||||||||