Remix.run Logo
DemocracyFTW2 4 days ago

IMHO it gets even better when you can use regular expressions and write a 'modal' parser where each mode is responsible for a certain sub-grammar, like string literals. JavaScript added the sticky flag (y) to make this even simpler.

rurban 2 days ago | parent [-]

It gets much worse. It's a huge anti-pattern to use regex within parsers.

DemocracyFTW2 a day ago | parent [-]

why so?

rurban a day ago | parent [-]

There are many explanations. The most famous one Rob Pike in his lexer talk 2011 https://youtu.be/HxaD_trXwRE?si=Q1B4mZ4Vo1Z2gRZq at 10.48

Or http://www.golangdevops.com/2019/03/07/halfpike-a-framework-...

Or several articles why you should not parse with regex, like https://stackoverflow.com/questions/1732348/regex-match-open...

DemocracyFTW2 a day ago | parent [-]

I couldn't locate the part where Pike addresses regexes in his 50-minute talk.

The second piece seems to be about someone complaining about a dysfunctional and untidy software situation where incompetence led to the incorrect application of greedy regexes, producing wrong results.

The third one is the most famous rant against attempts to parse a language with symmetric bracing (start tags that must match end tags) with a single regex from a language that doesn't provide regexes with symmetric bracing support, that is of course doomed to fail.

None of these provide any argument against lexing with sticky regexes. For one thing, the rant against regexes being unable to match bracing elements is only valid for regex engines that don't provide extensions for brace matching, but many languages and extensions do (e.g. https://stackoverflow.com/a/15303160/7568091).

However this point is typically irrelevant because this is not about parsing, it's about lexing, but I realize this my fault because in the above I wrote write a 'modal' parser where I should've written write a 'modal' lexer.

In lexing you typically do not match braces, you just realize you've found a brace and emit an appropriate token. It's up to the downstream processing to see whether barce tokens are matching.