Remix.run Logo
johnwbyrd 3 days ago

I'm surprised, and a little disappointed, that no one in this thread has mentioned parsing expression grammars (https://en.wikipedia.org/wiki/Parsing_expression_grammar) which are a much more human-friendly form of grammar for real-world parsing tasks.

sparkie 3 days ago | parent [-]

PEGs are closely related to recursive descent, and have some of the same problems.

A PEG is always unambiguous because it picks the first option - but whether that was the intended parse is not necessarily straightforward. In practice these problems don't usually show up, so they're fine to work with.

The advantage LR gives you is that it produces a parser where there are no ambiguities and every successful parse is the one intended. An LR grammar is a proof, as well as a means of producing a parser. A decent LR parser generator is like a simple proof assistant - it will find problems with your language before you do, so you can fix your syntax before putting it into production.

In "real-world" parsing tasks as you put it, the problems of LR parser generators is that they're not the best suited to parsing languages that have ambiguities, like C, C++ and many others. Some of the complaints about LR are about the workarounds that need to be done to parse these languages, where it's obviously the wrong tool for the job because those languages aren't described by proper LR grammars.

But if you're designing a new language from scratch, surely it's better to not repeat those mistakes? If you carefully design your language to be parsed by an LR grammar then other developers who come to parse your language won't encounter those issues. They won't need lexical tie-ins and other nonsense that complicates the process.