| ▲ | jascha_eng 4 days ago |
| This is legitimately pretty impressive. I think the rule of thumb is now, go with postgres(pgvector) for vector search until it breaks, then go with turbopuffer. |
|
| ▲ | sa-code 5 hours ago | parent | next [-] |
| Qdrant is also a good default choice, since it can work in-memory for development, with a hard drive for small deployments and also for "web scale" workloads. As a principal eng, side-stepping a migration and having a good local dev experience is too good of a deal to pass up. That being said, turbopuffer looks interesting. I will check it out. Hopefully their local dev experience is good |
| |
| ▲ | nostrebored 4 hours ago | parent | next [-] | | Qdrant is one of the few vendors I actively steer people away from. Look at the GitHub issues, look at what their CEO says, look at their fake “advancements” that they pay for publicity on… The number of people I know who’ve had unrecoverable shard failures on Qdrant is too high to take it seriously. | | | |
| ▲ | benesch 5 hours ago | parent | prev [-] | | For local dev + testing, we recommend just hitting the production turbopuffer service directly, but with a separate test org/API key: https://turbopuffer.com/docs/testing Works well for the vast majority of our customers (although we get the very occasional complaint about wanting a dev environment that works offline). The dataset sizes for local dev are usually so small that the cost rounds to free. | | |
| ▲ | lambda 2 hours ago | parent | next [-] | | > although we get the very occasional complaint about wanting a dev environment that works offline It's only occasional because the people who care about dev environments that work offline are most likely to just skip you and move on. For actual developer experience, as well as a number of use cases like customers with security and privacy concerns, being able to host locally is essential. Fair enough if you don't care about those segments of the market, but don't confuse a small number of people asking about it with a small number of people wanting it. | | |
| ▲ | benesch 5 minutes ago | parent | next [-] | | Yep, we're well aware of the selection bias effects in product feedback. As we grow we're thinking about how to make our product more accessible to small orgs / hobby projects. Introducing a local dev environment may be part of that. Note that we already have a in-your-own-VPC offering for large orgs with strict security/privacy/regulatory controls. | |
| ▲ | nik9000 21 minutes ago | parent | prev | next [-] | | As someone who works for a competitor, they are probably right holding off on that segment for a while. Supporting both cloud and local deployments is somewhere between 20% harder and 300% harder depending on the day. I'm watching them with excitement. We all learn from each other. There's so much to do. | |
| ▲ | sa-code 34 minutes ago | parent | prev [-] | | Can confirm. With a setup that works offline, one can - start small on a laptop. Going through procurement at companies is a pain - test things in CI reliably. Outages don’t break builds - transition from laptop scale to web scale easily with the same API with just a different backend Otherwise it’s really hard to justify not using S3 vectors here The current dev experience is to start with faiss for PoCs, move to pgvector and then something heavy duty like one of the Lucene wrappers. |
| |
| ▲ | sa-code an hour ago | parent | prev | next [-] | | I should have clarified, by local dev and testing I did in fact mean offline usage. Without that it’s unfortunately a non starter | | |
| ▲ | benesch 34 minutes ago | parent [-] | | So I can note this down on our roadmap, what's the root of your requirement here? Supporting local dev without internet (airplanes, coffee shops, etc.)? Unit test speed? Something else? | | |
| |
| ▲ | sroussey 5 hours ago | parent | prev | next [-] | | That’s not local though | |
| ▲ | enigmo 2 hours ago | parent | prev [-] | | having a local simulator (DynamoDB, Spanner, others) helps me a lot for offline/local development and CI. when a vendor doesn't off this I have often end up mocking it out (one way or another) and have to wait for integration or e2e tests for feedback that could have been pushed further to the left. in many CI environments unit tests don't have network access, it's not purely a price consideration. (not a turbopuffer customer but I have been looking at it) | | |
| ▲ | benesch 2 hours ago | parent [-] | | > in many CI environments unit tests don't have network access, it's not purely a price consideration. I've never seen a hard block on network access (how do you install packages/pull images?) but I am sympathetic to wanting to enforce that unit tests run quickly by minimizing/eliminating RTT to networked services. We've considered the possibility of a local simulator before. Let me know if it winds up being a blocker for your use case. | | |
| ▲ | lambda an hour ago | parent [-] | | > how do you install packages/pull images You pre-build the images with packages installed beforehand, then use those image offline. | | |
| ▲ | benesch an hour ago | parent [-] | | My point is it's enough of a hassle to set up that I've yet to see that level of restriction in practice (across hundreds of CI systems). | | |
| ▲ | dzbarsky 17 minutes ago | parent [-] | | Look into Bazel, a very standard build system used at many large tech companies. It splits fetches from build/test actions and allows blocking network for build/test actions with a single CLI flag. No hassle at all. The fact that you haven't come across this kind of setup suggests that your hundreds of CI systems are not representative of the industry as a whole. |
|
|
|
|
|
|
|
| ▲ | _peregrine_ 3 days ago | parent | prev | next [-] |
| seems like a good rule of thumb to me! though i would perhaps lump "cost" into the "until it breaks" equation. even with decent perf, pg_vector's economics can be much worse, especially in multi-tenant scenarios where you need many small indexes (this is true of any vector db that builds indexes primarily on RAM/SSD) |
|
| ▲ | jauntywundrkind 2 hours ago | parent | prev [-] |
| I'd love to know how they compare versus MixedBread, what relative strengths each has. https://www.mixedbread.com/ I really really enjoy & learn a lot from the mixedbread blog. And they find good stuff to open source (although the product itself is closed). https://www.mixedbread.com/blog I feel like there's a lot of overlap but also probably a lot of distinction too. Pretty new to this space of products though. |