| ▲ | barrkel 18 hours ago | |
I don't know. I built a vector similarity system for my hobby project the "hard" way, which was mostly getting Python set up with all the dependencies (seriously, Python dependency resolution is a non-trivial problem), picking a model with the right tradeoffs, installing pgvector, picking an index that optimized my distance metric, calculating and storing vectors for all my data, and integrating routes and UI which dispatched ANN search (order by / limit) to my indexed column. I also did some clustering, and learned something of how awkward it is in practice to pick a representative vector for a cluster - and in fact you may want several. I now know what the model does (at a black box level) and how all the parts fit together. And I have plans to build classifiers on top of the vectors I built for further processing. The experience of fighting Python dependencies gives me more appreciation for uv over venv and will leave me less stuck whenever the LLM fails to help resolve the situation. | ||