Remix.run Logo
_hl_ 2 days ago

You’d need to go a level below the API that most embedding services expose.

A transformer-based embedding model doesn’t just give you a vector for the entire input string, it gives you vectors for each token. These are then “pooled” together (eg averaged, or max-pooled, or other strategies) to reduce these many vectors down into a single vector.

Late chunking means changing this reduction to yield many vectors instead of just one.