Remix.run Logo
willwade 2 days ago

I wonder if this would have been useful https://github.com/microsoft/presidio - its heavy but looks really good. There is a lite version..

shaoz 2 days ago | parent | next [-]

I've used it, lots of false positives out of the box, you need to do a ton of tuning or put a transformer/BERT model with it, but then at that point it's basically the same thing as the OP's project.

threecheese 2 days ago | parent | prev | next [-]

Looks like it uses Googles Langextract, which uses only LLMs for NLP, while OP is using a small NER model that runs locally.

winchester6788 2 days ago | parent | prev [-]

full of false positives though. but definitely good for some types of entities and regexes