Remix.run Logo
ufo 3 days ago

I disagree. When writing recursive descent by hand, it's easy to miss an ambiguity because of miscomputed FIRST and FOLLOW sets.

In practice most recursive descent parsers use if-else liberally. Thus, they effectively work like pegs where the first match wins (but without the limited backtracking of pegs). They are deterministic in the sense that the implementation always returns a predictable result. But they are still ambiguous in the sense that this behavior might not have been planned by the language designer, and the ambiguity may not have been resolved how the programmer expected.

kazinator 3 days ago | parent | next [-]

Without a comprehensive test suite you can easily break a recursive descent parser. By adding code into some function to handle something new, you can accidentally prevent some existing syntax from being recognized.

It has been my eperience that if you have a LALR parser that reports no errors at generation time, and you add something such that there are still no errors, you've not ruined any existing syntax. That could be a theorem.

sirwhinesalot 3 days ago | parent | prev [-]

Don't compute first and follow sets. Just branch on the current token. It is trivially unambiguous since 1 token = 1 branch. Expressions can be dealt with using precedence climbing / pratt, which still just amounts to branching on the current token after the "lhs" has been computed.

If the language doesn't fit this LL1 + operator precedence mold then I would not use a recursive descent parser.

ufo 3 days ago | parent [-]

The whole issue is that, without computing first and follow sets, it's easy to make a mistake and think that the grammar is LL(1) when it actually isn't. When that happens, the if statement does you no good. It'll be deterministic, but will silently mask the ambiguity.

sirwhinesalot 3 days ago | parent [-]

The only case I can think of is something like the dangling else problem or similar, which is very easy to detect.

You need to have a recursive call to the same rule or a "parent" rule, followed by an optional token match (note that this is already more than a simple branch on the current token since it is effectively a 1 token backtrack).

If there's any extra token match inbetween (like mandatory {}) you've already dodged the problem.

So I agree a mistake is possible, but "easy" I would not call it. It only appears in very specific conditions. The issue is way more prevalent in general PEGs.