▲ | amelius 5 days ago | |
Why would it be less efficient, if the LLM would convert it to an embedding internally? | ||
▲ | cesarb 4 days ago | parent [-] | |
Because each byte would be an embedding, instead of several bytes (a full word or part of a word) being a single embedding. The amount of time a LLM takes is proportional to the number of embeddings (or tokens, since each token is represented by an embedding) in the input, and the amount of memory used by the internal state of the LLM is also proportional to the number of embeddings in the context window (how far it looks back in the input). |