Remix.run Logo
ashvardanian 2 days ago

Lucene is tough to deal with. About 15 hours ago — right when this comment was posted — I was giving a talk at Databricks comparing the world’s most widely used search engines. I’ve never run into as many issues with any other similar tool as I did with Lucene. To be fair, it’s been around for ~26 years and has aged remarkably well... but it’s the last thing I’d choose today.

ab5tract 2 days ago | parent | next [-]

Can I ask you which alternatives exist at the layer Lucene occupies?

I went looking around last year and couldn’t really find many options, but I might have been looking in the wrong places.

ashvardanian 5 hours ago | parent [-]

For Vector Search the top 2 are: Meta’s FAISS and (my) Unum’s USearch. Lucene powers Elastic, Solr, MongoDB Atlas, AWS OpenSearch, Azure Cognitive Search. USearch powers ClickHouse, DuckDB, YugaByte, TiDB, ScyllaDB, MemGraph, KuzuDB, Lantern, and a few big closed source names that don’t mention it, as far as I know. FAISS has the highest usage among Python developers, but if you are indexing large collections you should consider alternatives.

cluckindan 2 days ago | parent | prev [-]

Interesting, then, that Vectroid would choose to fork it.

Elasticsearch is at least good / at hiding the Lucene zoo under the hood.