| ▲ | Xyra 11 hours ago | |
I'm using Voyage-3.5-lite at halfvec(2048), which with my limited research, seems to be one of the best embedding models. There's semi-sophisticated (breaking on paragraphs, sentences) ~300 token chunking. When Claude is using our embed endpoint to embed arbitrary text as a search vector, it should work pretty well cross-domains. One can also use compositions of centroids (averages) of vectors in our database, as search vectors. | ||