▲ | _hl_ 2 days ago | |
You’d need to go a level below the API that most embedding services expose. A transformer-based embedding model doesn’t just give you a vector for the entire input string, it gives you vectors for each token. These are then “pooled” together (eg averaged, or max-pooled, or other strategies) to reduce these many vectors down into a single vector. Late chunking means changing this reduction to yield many vectors instead of just one. |