Remix.run Logo
BoppreH 6 hours ago

An LLM company using regexes for sentiment analysis? That's like a truck company using horses to transport parts. Weird choice.

lopsotronic 2 hours ago | parent | next [-]

The difference in response time - especially versus a regex running locally - is really difficult to express to someone who hasn't made much use of LLM calls in their natural language projects.

Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.

Now personally I would have maybe made a call through a "traditional" ML widget (scikit, numpy, spaCy, fastText, sentence-transformer, etc) but - for me anyway - that whole entire stack is Python. Transpiling all that to TS might be a maintenance burden I don't particularly feel like taking on. And on client facing code I'm not really sure it's even possible.

noprof6691 16 minutes ago | parent | next [-]

They're sending it to an llm anyway tho? Not sure why they wouldn't just add a sentiment field to the requested response shape.

FuckButtons 9 minutes ago | parent [-]

because a regex on the client is free vs gpu compute is absolutely not.

cyanydeez 2 hours ago | parent | prev | next [-]

So, think of it as a business man: You don't really care if your customers swear or whatever, but you know that it'll generate bad headlines. So you gotta do something. Just like a door lock isn't designed for a master criminal, you don't need to design your filter for some master swearer; no, you design it good enough that it gives the impression that further tries are futile.

So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.

true_religion 2 hours ago | parent [-]

This door lock doesn’t even work against people speaking French, so I think they could have tried a mite harder.

bigbuppo 6 minutes ago | parent | next [-]

There are only Americans on the internet.

sebastiennight an hour ago | parent | prev | next [-]

En toute honnêteté, je pense avoir dit "damn it" plus d'une fois à chat gépété avant de fermer la fenêtre dans un accès de rage

ben_w an hour ago | parent | prev [-]

The up-side of the US market is (almost) everyone there speaks English. The down side is, that includes all the well-networked pearl-clutchers. Europe (including France) will have the same people, but it's harder to coordinate a network of pearl-clutching between some saying "Il faut protéger nos enfants de cette vulgarité!" and others saying "Η τηλεόραση και τα μέσα ενημέρωσης διαστρεβλώνουν τις αξίες μας!" even when they care about the exact same media.

For headlines, that's enough.

For what's behind the pearl-clutching, for what leads to the headlines pandering to them being worth writing, I agree with everyone else on this thread saying a simple word list is weird and probably pointless. Not just for false-negatives, but also false-positives: the Latin influence on many European languages leads to one very big politically-incorrect-in-the-USA problem for all the EU products talking about anything "black" (which includes what's printed on some brands of dark chocolate, one of which I saw in Hungary even though Hungarian isn't a Latin language but an Ugric language and only takes influences from Latin).

mlmonkey 35 minutes ago | parent | prev [-]

> Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude.

You do know that 10,000x _is_ four orders of magnitude, right? :-D

jonbwhite 25 minutes ago | parent [-]

OP is saying that in their experience it is more like eight orders of magnitude

nojs an hour ago | parent | prev | next [-]

Oh it’s worse than that. This one ended up getting my account banned: https://github.com/anthropics/claude-code/issues/22284

lanbin 34 minutes ago | parent | next [-]

This is a tricky problem, I mean, Pinyin also uses the English alphabet.

cryptonector an hour ago | parent | prev [-]

Wow, that's horrible.

stingraycharles 6 hours ago | parent | prev | next [-]

Because they want it to be executed quickly and cheaply without blocking the workflow? Doesn’t seem very weird to me at all.

_fizz_buzz_ 5 hours ago | parent | next [-]

They probably have statistics on it and saw that certain phrases happen over and over so why waste compute on inference.

crem 2 hours ago | parent | next [-]

More likely their LLM Agent just produced that regex and they didn't even notice.

mycall 4 hours ago | parent | prev [-]

The problem with regex is multi-language support and how big the regex will bloat if you to support even 10 languages.

doublesocket 4 hours ago | parent | next [-]

Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick.

astrocat 2 hours ago | parent | next [-]

woah. This is a regex use I've never heard of. I'd absolutely love to see a writeup on this approach - how its done and when it's useful.

benlivengood an hour ago | parent [-]

You can literally | together every street address or other string you want to match in a giant disjunction, and then run a DFA/NFA minimization over that to get it down to a reasonable size. Maybe there are some fast regex simplification algorithms as well, but working directly with the finite automata has decades of research and probably can be more fully optimized.

cogman10 2 hours ago | parent | prev [-]

I think it will depend on the language. There are a few non-latin languages where a simple word search likely won't be enough for a regex to properly apply.

TeMPOraL 4 hours ago | parent | prev | next [-]

We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :).

MetalSnake 4 hours ago | parent | next [-]

I talk to it in non-English. But have rules to have everything in code and documentation in english. Only speaking with me should use my native language. Why would that be a problem?

ekropotin 3 hours ago | parent [-]

Because 90% of training data was in English and therefore the model perform best in this language.

foldr 3 hours ago | parent [-]

In my experience these models work fine using another language, if it’s a widely spoken one. For example, sometimes I prompt in Spanish, just to practice. It doesn’t seem to affect the quality of code generation.

ekropotin an hour ago | parent | next [-]

It’s just a subjective observation.

It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language.

So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English.

I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better.

adamsb6 3 hours ago | parent | prev [-]

They literally just have to subtract the vector for the source language and add the vector for the target.

It’s the original use case for LLMs.

cryptonector an hour ago | parent | prev | next [-]

Claude handles human languages other than English just fine.

formerly_proven 4 hours ago | parent | prev [-]

In my experience agents tend to (counterintuitively) perform better when the business language is not English / does not match the code's language. I'm assuming the increased attention mitigates the higher "cognitive" load.

crimsonnoodle58 4 hours ago | parent | prev | next [-]

They only need to look at one language to get a statistically meaningful picture into common flaws with their model(s) or application.

If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.

b112 4 hours ago | parent | prev [-]

Did you just complain about bloat, in anything using npm?

Foobar8568 5 hours ago | parent | prev | next [-]

Why do you need to do it at the client side? You are leaking so much information on the client side. And considering the speed of Claude code, if you really want to do on the client side, a few seconds won't be a big deal.

plorntus 4 hours ago | parent | next [-]

Depends what its used by, if I recall theres an `/insights` command/skill built in whatever you want to call it that generates a HTML file. I believe it gives you stats on when you're frustrated with it and (useless) suggestions on how to "use claude better".

Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.

matkoniecz 4 hours ago | parent | prev [-]

> a few seconds won't be a big deal

it is not that slow

orphea 5 hours ago | parent | prev | next [-]

It looks like it's just for logging, why does it need to block?

jflynn2 4 hours ago | parent [-]

Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)

Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts

orphea 4 hours ago | parent [-]

This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.

gf000 3 hours ago | parent [-]

I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering).

Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.

5 hours ago | parent | prev [-]
[deleted]
ldobre 5 minutes ago | parent | prev | next [-]

It's more like a truck company using people to transport some parts. I could be wrong here, but I bet this happens in Volvo's fabrics a lot.

nitekode 15 minutes ago | parent | prev | next [-]

A lot if things dont make sense until you involve scale. Regex could be good enough do give a general gist.

floralhangnail 4 hours ago | parent | prev | next [-]

Well, regex doesn't hallucinate....right?

raw_anon_1111 an hour ago | parent | next [-]

I just went to expertSexChange.com…

geon an hour ago | parent | prev [-]

buttbuttination

mmh0000 an hour ago | parent [-]

The Clbuttical problem[1]

[1] https://en.wikipedia.org/wiki/Scunthorpe_problem

blks 5 hours ago | parent | prev | next [-]

Because they actually want it to work 100% of the time and cost nothing.

mohsen1 3 hours ago | parent | next [-]

Maybe hard to believe but not everyone is speaking English to Claude

orphea 5 hours ago | parent | prev [-]

Then they made it wrong. For example, "What the actual fuck?" is not getting flagged, neither is "What the *fuck*".

arcfour 3 hours ago | parent | next [-]

It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.

Zamaamiro 3 hours ago | parent | prev | next [-]

Classic over-engineering. Their approach is just fine 90% of the time for the use case it’s intended for.

orphea 2 hours ago | parent | next [-]

75-80% [1], 90%, 99% [2]. In other words, no one has any idea.

I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.

Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P

[1]: https://news.ycombinator.com/item?id=47587286

[2]: https://news.ycombinator.com/item?id=47586932

zwirbl 2 hours ago | parent [-]

It compares to lowercase input, so doesn't matter. The rest is still valid

morkalork an hour ago | parent | prev [-]

Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly

vntok 3 hours ago | parent | prev [-]

They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.

codegladiator 5 hours ago | parent | prev | next [-]

what you are suggesting would be like a truck company using trucks to move things within the truck

argee 5 hours ago | parent [-]

That’s what they do. Ever heard of a hand truck?

eadler 5 hours ago | parent | next [-]

I never knew the name of that device.

Thanks

freedomben 4 hours ago | parent [-]

Depending on the region you live in, it's also frequently called a "dolly"

SmellTheGlove 2 hours ago | parent [-]

Isn’t a dolly a flat 4 wheeled platform thingy? A hand truck is the two wheeled thing that tilts back.

eszed an hour ago | parent [-]

Ha! Where I'm from a "dolly" was the two-wheeled thing. The four-wheeler thing wasn't common before big-boxes took over the hardware business, but I think my dad would have called it a "cart", maybe a "hand-cart".

istoleabread 5 hours ago | parent | prev [-]

Do we have a hand llm perchance?

svnt 2 hours ago | parent [-]

Yeah it’s called a regex. With a lot of human assistance it can do less but fits in smaller spaces and doesn’t break down.

apgwoz 2 hours ago | parent [-]

It’s also deterministic, unlike llms…

raw_anon_1111 an hour ago | parent | prev | next [-]

Cloud hosted call centers using LLMs is one of my specialties. While I use an LLM for more nuanced sentiment analysis, I definitely use a list of keywords as a first level filter.

pdntspa an hour ago | parent | prev | next [-]

LLMs cost money, regular expressions are free. It really isn't so strange.

makeitrain an hour ago | parent | prev | next [-]

Don’t worry, they used an llm to generate the regex.

draxil 5 hours ago | parent | prev | next [-]

Good to have more than a hammer in your toolbox!

apgwoz 2 hours ago | parent | prev | next [-]

> That's like a truck company using horses to transport parts. Weird choice.

Easy way to claim more “horse power.”

__alexs 3 hours ago | parent | prev | next [-]

Using some ML to derive a sentiment regex seems like a good actually?

irthomasthomas 2 hours ago | parent | prev | next [-]

This just proves its vibe coded because LLMs love writing solutions like that. I probably have a hundred examples just like it in my history.

harikb 2 hours ago | parent | prev | next [-]

Not everything done by claude-code is decided by LLM. They need the wrapper to be deterministic (or one-time generated) code?

3 hours ago | parent | prev | next [-]
[deleted]
throwaw12 4 hours ago | parent | prev | next [-]

because impact of WTF might be lost in the result of the analysis if you solely rely on LLM.

parsing WTF with regex also signifies the impact and reduces the noise in metrics

"determinism > non-determinism" when you are analysing the sentiment, why not make some things more deterministic.

Cool thing about this solution, is that you can evaluate LLM sentiment accuracy against regex based approach and analyse discrepancies

mghackerlady 3 hours ago | parent | prev | next [-]

More like a car company transporting their shipments by truck. It's more efficient

ojr 5 hours ago | parent | prev | next [-]

I used regexes in a similar way but my implementation was vibecoded, hmmm, using your analysis Claude Code writes code by hand.

pfortuny 4 hours ago | parent | prev | next [-]

They had the problem of sentiment analysis. They use regexes.

You know the drill.

feketegy 3 hours ago | parent | prev | next [-]

It's all regex anyways

kjshsh123 4 hours ago | parent | prev | next [-]

Using regex with LLMs isn't uncommon at all.

lazysheepherd 3 hours ago | parent | prev | next [-]

Because they are engineers? The difference between an engineer and a hobbyist is an engineer has to optimize the cost.

As they say: any idiot can build a bridge that stands, only an engineer can build a bridge that barely stands.

2 hours ago | parent | prev | next [-]
[deleted]
intended 2 hours ago | parent | prev | next [-]

The amount of trust and safety work that depends on google translate and the humble regex, beggars the imagination.

j45 2 hours ago | parent | prev | next [-]

Asking a non deterministic software to act like a deterministic one (regex) can be a significantly higher use of tokens/compute for no benefit.

Some things will be much better with inference, others won’t be.

4 hours ago | parent | prev | next [-]
[deleted]
sumtechguy 5 hours ago | parent | prev | next [-]

hmm not a terrible idea (I think).

You have a semi expensive process. But you want to keep particular known context out. So a quick and dirty search just in front of the expensive process. So instead of 'figure sentiment (20seconds)'. You have 'quick check sentiment (<1sec)' then do the 'figure sentiment v2 (5seconds)'. Now if it is just pure regex then your analogy would hold up just fine.

I could see me totally making a design choice like that.

make3 an hour ago | parent | prev | next [-]

it's like a faster than light spaceship company using horses. There's been infinite solutions to do this better even CPU only for years lol.

lou1306 6 hours ago | parent | prev | next [-]

They're searching for multiple substrings in a single pass, regexes are the optimal solution for that.

noosphr 5 hours ago | parent | next [-]

The issue isn't that regex are a solution to find a substring. The issue is that you shouldn't be looking for substrings in the first place.

This has buttbuttin energy. Welcome to the 80s I guess.

lou1306 17 minutes ago | parent | next [-]

> The issue is that you shouldn't be looking for substrings in the first place.

Why? They clearly just want to log conversations that are likely to display extreme user frustration with minimal overhead. They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.

rdiddly 2 hours ago | parent | prev | next [-]

Clbuttic!

8cvor6j844qw_d6 5 hours ago | parent | prev | next [-]

Very likely vibe coded.

I've seen Claude Code went with a regex approach for a similar sentiment-related task.

mr_00ff00 an hour ago | parent [-]

My understanding of vibe coding is when someone doesn’t look at the code and just uses prompts until the app “looks and acts” correct.

I doubt you are making regex and not looking at it, even if it was AI generated.

5 hours ago | parent | prev [-]
[deleted]
BoppreH 5 hours ago | parent | prev [-]

It's fast, but it'll miss a ton of cases. This feels like it would be better served by a prompt instruction, or an additional tiny neural network.

And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.

hk__2 5 hours ago | parent | next [-]

It’s fast and it matches 80% of the cases. There’s no point in overengineering it.

NitpickLawyer 2 hours ago | parent [-]

> There’s no point in overengineering it.

I swear this whole thread about regexes is just fake rage at something, and I bet it'd be reversed had they used something heavier (omg, look they're using an LLM call where a simple regex would have worked, lul)...

vharuck 5 hours ago | parent | prev [-]

The pattern only matches if both ends are word boundaries. So "diffs" won't match, but "Oh, ffs!" will. It's also why they had to use the pattern "shit(ty|tiest)" instead of just "shit".

BoppreH 5 hours ago | parent [-]

You're right, I missed the \b's. Thanks for the correction.

sfn42 2 hours ago | parent | prev | next [-]

It's almost as if LLMs are unreliable

susupro1 3 hours ago | parent | prev [-]

[dead]