| ▲ | augusteo 3 hours ago | |||||||
On the API vs local model question: We went with API embeddings for a similar use case. The cold-start latency of local models across multiple workers ate more money in compute than just paying per-token. Plus you avoid the operational overhead of model updates. The hybrid approach in this article is smart. Fuzzy matching catches 80% of cases instantly, embeddings handle the rest. No need to run expensive vector search on every query. | ||||||||
| ▲ | TurdF3rguson 3 hours ago | parent [-] | |||||||
Those text embeddings are dirt cheap. You can do around 1M titles on the cloudflare embedding model I used last time without exceeding daily free tier. | ||||||||
| ||||||||