▲ | amelius 5 days ago | ||||||||||||||||||||||||||||||||||
No, an LLM really uses __much__ more bits per token. First, the embedding typically uses thousands of dimensions. Then, the value along each dimension is represented with a floating point number which will take 16 bits (can be smaller though with higher quantization). | |||||||||||||||||||||||||||||||||||
▲ | blutfink 5 days ago | parent [-] | ||||||||||||||||||||||||||||||||||
Of course an LLM uses more space internally for a token. But so do humans. My point was that you compared how the LLM represents a token internally versus how “English” transmits a word. That’s a category error. | |||||||||||||||||||||||||||||||||||
|