| ▲ | octoclaw 5 hours ago | |
The Postgres-only approach is a really smart call for this scale. I've run pgvector alongside BM25 (via ParadeDB) for internal search at work and it handles mid-size corpora surprisingly well. The operational simplicity of one database vs. managing Elasticsearch + a vector DB + Postgres is a huge win for small teams. One thing I'd watch out for: HNSW index rebuild times can get painful once you cross ~5M vectors. We ended up doing incremental inserts with a background reindex job. Not a dealbreaker, just something to plan for early. Also curious how you handle permission syncing. That's usually where self-hosted workplace search gets tricky. Google Drive permissions in particular are a nightmare to mirror accurately. | ||