▲ | anishathalye a day ago | ||||||||||||||||||||||||||||||||||||||||
That was a small self-contained example that fit above the fold in the README (and fwiw even last year’s models like GPT-4o give the right output there). That `sort` is based on pairwise comparisons, which is one of the best ways you can do it in terms of accuracy (Qin et al., 2023: https://arxiv.org/abs/2306.17563). I think there are many real use cases where you might want a semantic sort / semantic data processing in general, when there isn’t a deterministic way to do the task and there is not necessarily a single right answer, and some amount of error (due to LLMs being imperfect) is tolerable. See https://semlib.anish.io/examples/arxiv-recommendations/ for one concrete example. In my opinion, the outputs are pretty high quality, to the point where this is practically usable. These primitives can be _composed_, and that’s where this approach really shines. As a case study, I tried automating a part of performance reviews at my company, and the Semlib+LLM approach did _better_ than me (don’t worry, I didn’t dump AI-generated outputs on people, I first did the work manually, and shared both versions with an explanation of where each version came from). See the case study in https://anishathalye.com/semlib/ There’s also some related academic work in this area that also talks about applications. One of the most compelling IMO is DocETL’s collaboration to analyze police records (https://arxiv.org/abs/2410.12189). Some others you might enjoy checking out are LOTUS (https://arxiv.org/abs/2407.11418v1), Palimpzest (https://arxiv.org/abs/2405.14696), and Aryn (https://arxiv.org/abs/2409.00847). | |||||||||||||||||||||||||||||||||||||||||
▲ | Y_Y a day ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||
As you compose fuzzy operations your errors multiply! Nobody is asking for perfection, but this tool seems to me a straightforward way to launder bad data. If you want to do a quick check of an idea then it's probably great, but if you're going to be rigorous and use hard data and reproducible, understandable methods then I don't think it offers anything. The plea for citations at the end of the readme also rubs me the wrong way. | |||||||||||||||||||||||||||||||||||||||||
|