| ▲ | mycall 5 hours ago |
| The problem with regex is multi-language support and how big the regex will bloat if you to support even 10 languages. |
|
| ▲ | doublesocket 4 hours ago | parent | next [-] |
| Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick. |
| |
| ▲ | astrocat 2 hours ago | parent | next [-] | | woah. This is a regex use I've never heard of. I'd absolutely love to see a writeup on this approach - how its done and when it's useful. | | |
| ▲ | benlivengood an hour ago | parent [-] | | You can literally | together every street address or other string you want to match in a giant disjunction, and then run a DFA/NFA minimization over that to get it down to a reasonable size. Maybe there are some fast regex simplification algorithms as well, but working directly with the finite automata has decades of research and probably can be more fully optimized. |
| |
| ▲ | cogman10 2 hours ago | parent | prev [-] | | I think it will depend on the language. There are a few non-latin languages where a simple word search likely won't be enough for a regex to properly apply. |
|
|
| ▲ | TeMPOraL 4 hours ago | parent | prev | next [-] |
| We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :). |
| |
| ▲ | MetalSnake 4 hours ago | parent | next [-] | | I talk to it in non-English. But have rules to have everything in code and documentation in english. Only speaking with me should use my native language. Why would that be a problem? | | |
| ▲ | ekropotin 3 hours ago | parent [-] | | Because 90% of training data was in English and therefore the model perform best in this language. | | |
| ▲ | foldr 3 hours ago | parent [-] | | In my experience these models work fine using another language, if it’s a widely spoken one. For example, sometimes I prompt in Spanish, just to practice. It doesn’t seem to
affect the quality of code generation. | | |
| ▲ | ekropotin an hour ago | parent | next [-] | | It’s just a subjective observation. It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language. So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English. I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better. | |
| ▲ | adamsb6 3 hours ago | parent | prev [-] | | They literally just have to subtract the vector for the source language and add the vector for the target. It’s the original use case for LLMs. |
|
|
| |
| ▲ | cryptonector an hour ago | parent | prev | next [-] | | Claude handles human languages other than English just fine. | |
| ▲ | formerly_proven 4 hours ago | parent | prev [-] | | In my experience agents tend to (counterintuitively) perform better when the business language is not English / does not match the code's language. I'm assuming the increased attention mitigates the higher "cognitive" load. |
|
|
| ▲ | crimsonnoodle58 4 hours ago | parent | prev | next [-] |
| They only need to look at one language to get a statistically meaningful picture into common flaws with their model(s) or application. If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead. |
|
| ▲ | b112 4 hours ago | parent | prev [-] |
| Did you just complain about bloat, in anything using npm? |