As you compose fuzzy operations your errors multiply! Nobody is asking for perfection, but this tool seems to me a straightforward way to launder bad data. If you want to do a quick check of an idea then it's probably great, but if you're going to be rigorous and use hard data and reproducible, understandable methods then I don't think it offers anything. The plea for citations at the end of the readme also rubs me the wrong way.

▲

anishathalye a day ago | parent [-]

I think semantic data processing in this style has a nonempty set of use cases (e.g., I find the fuzzy sorting of arXiv papers to be useful, I find the examples in the docs representative of some real-world tasks where this style of data processing makes sense, and I find many of the motivating examples and use cases in the academic work compelling). At the same time, I think there are many tasks for which this approach is not the right one to use.

Sorry you didn't like the wording in the README, that was not the intention. I like to give people a canonical form they can copy-paste if they want to cite the work, things have been a mess for many of my other GitHub repos, which makes it hard to find who is using the work (which can be really informative for improving the software, and I often follow-up with authors of papers via email etc.). For example, I heard about Amazon MemoryDB because they use Porcupine (https://dl.acm.org/doi/pdf/10.1145/3626246.3653380). Appreciate you sharing your feelings; I stripped the text from the README; if you have additional suggestions, would appreciate your comments or a PR.

▲

adastra22 a day ago | parent | next [-]

FWIW it doesn't serve as a great example because the ordering is not obvious. I think that is what GP was reacting to. When I say "sort a list of presidents by how right-leaning they are" in any other context people would probably assume the MOST right-leaning president to be listed first. It took me a moment to remember that Pythons 'sort' would be in ascending order by default.

	▲	anishathalye a day ago \| parent [-]
		Good point, I see how the example can be confusing. Updated the example to have `reverse=True` and a comment, hopefully that clarifies things.

▲

Y_Y a day ago | parent | prev [-]

Thank you for engaging with me so politely and constructively. I care probably more than I should about good science and honesty in academia, and so I feel compelled to push back in cases where I see things like: blatant overstating of capabilities, artificially farming citations.

This case seems to have been a false positive. Surely people will misuse your tool,but that's not your responsibility, as long as you haven't mislead them to begin with. Good luck with the project, I hope to someday need to cite the software myself.

	▲	anishathalye a day ago \| parent [-]
		For sure! I share your feelings about good science and honesty in academia :)