Because they want it to be executed quickly and cheaply without blocking the workflow? Doesn’t seem very weird to me at all.

▲

_fizz_buzz_ 5 hours ago | parent | next [-]

They probably have statistics on it and saw that certain phrases happen over and over so why waste compute on inference.

▲

crem 2 hours ago | parent | next [-]

More likely their LLM Agent just produced that regex and they didn't even notice.

▲

mycall 4 hours ago | parent | prev [-]

The problem with regex is multi-language support and how big the regex will bloat if you to support even 10 languages.

▲

doublesocket 4 hours ago | parent | next [-]

Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick.

▲

astrocat 2 hours ago | parent | next [-]

woah. This is a regex use I've never heard of. I'd absolutely love to see a writeup on this approach - how its done and when it's useful.

	▲	benlivengood an hour ago \| parent [-]
		You can literally \| together every street address or other string you want to match in a giant disjunction, and then run a DFA/NFA minimization over that to get it down to a reasonable size. Maybe there are some fast regex simplification algorithms as well, but working directly with the finite automata has decades of research and probably can be more fully optimized.

▲

cogman10 2 hours ago | parent | prev [-]

I think it will depend on the language. There are a few non-latin languages where a simple word search likely won't be enough for a regex to properly apply.

▲

TeMPOraL 4 hours ago | parent | prev | next [-]

We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :).

▲

MetalSnake 4 hours ago | parent | next [-]

I talk to it in non-English. But have rules to have everything in code and documentation in english. Only speaking with me should use my native language. Why would that be a problem?

▲

ekropotin 3 hours ago | parent [-]

Because 90% of training data was in English and therefore the model perform best in this language.

▲

foldr 3 hours ago | parent [-]

In my experience these models work fine using another language, if it’s a widely spoken one. For example, sometimes I prompt in Spanish, just to practice. It doesn’t seem to affect the quality of code generation.

	▲	ekropotin an hour ago \| parent \| next [-]
		It’s just a subjective observation. It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language. So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English. I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better.
	▲	adamsb6 3 hours ago \| parent \| prev [-]
		They literally just have to subtract the vector for the source language and add the vector for the target. It’s the original use case for LLMs.

▲

cryptonector an hour ago | parent | prev | next [-]

Claude handles human languages other than English just fine.

▲

formerly_proven 4 hours ago | parent | prev [-]

In my experience agents tend to (counterintuitively) perform better when the business language is not English / does not match the code's language. I'm assuming the increased attention mitigates the higher "cognitive" load.

▲

crimsonnoodle58 4 hours ago | parent | prev | next [-]

They only need to look at one language to get a statistically meaningful picture into common flaws with their model(s) or application.

If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.

▲

b112 4 hours ago | parent | prev [-]

Did you just complain about bloat, in anything using npm?

▲

Foobar8568 5 hours ago | parent | prev | next [-]

Why do you need to do it at the client side? You are leaking so much information on the client side. And considering the speed of Claude code, if you really want to do on the client side, a few seconds won't be a big deal.

	▲	plorntus 4 hours ago \| parent \| next [-]
		Depends what its used by, if I recall theres an `/insights` command/skill built in whatever you want to call it that generates a HTML file. I believe it gives you stats on when you're frustrated with it and (useless) suggestions on how to "use claude better". Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.
	▲	matkoniecz 4 hours ago \| parent \| prev [-]
		> a few seconds won't be a big deal it is not that slow

▲

orphea 5 hours ago | parent | prev | next [-]

It looks like it's just for logging, why does it need to block?

▲

jflynn2 4 hours ago | parent [-]

Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)

Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts

▲

orphea 4 hours ago | parent [-]

This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.

	▲	gf000 3 hours ago \| parent [-]
		I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering). Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.

▲

5 hours ago | parent | prev [-]

[deleted]