▲ | olooney 15 hours ago | ||||||||||||||||
I found a neat way to do high-quality "semantic soft joins" using embedding vectors[1] and the Hungarian algorithm[2] and I'm turning it into an open source Python package: https://github.com/olooney/jellyjoin It hits a sweet spot by being easier to use than record linkage[3][4] while still giving really good matches, so I think there's something there that might gain traction. [1]: https://platform.openai.com/docs/guides/embeddings [2]: https://en.wikipedia.org/wiki/Hungarian_algorithm | |||||||||||||||||
▲ | mmaaz 7 hours ago | parent | next [-] | ||||||||||||||||
I love this as someone who used to work on max-weight matchings and now works on LLMs :) | |||||||||||||||||
▲ | guskel 14 hours ago | parent | prev | next [-] | ||||||||||||||||
Very neat. As a heavy user of recordlinkage, this is definitely on my radar. | |||||||||||||||||
▲ | sbrother 14 hours ago | parent | prev | next [-] | ||||||||||||||||
This is very cool! Thanks for sharing. | |||||||||||||||||
▲ | pbronez 12 hours ago | parent | prev [-] | ||||||||||||||||
Cool project! I see you saved a spot to show how to use it with an alternative embedding model. It would be nice to be able to use the library without an OpenAI api key. Might even make sense to vendor a basic open source model in your package so it can work out of the box without remote dependencies. | |||||||||||||||||
|