Remix clone Hacker News

new | show | ask | jobs Github

	▲	_hl_ 7 months ago
		You’d need to go a level below the API that most embedding services expose. A transformer-based embedding model doesn’t just give you a vector for the entire input string, it gives you vectors for each token. These are then “pooled” together (eg averaged, or max-pooled, or other strategies) to reduce these many vectors down into a single vector. Late chunking means changing this reduction to yield many vectors instead of just one.