Remix.run Logo
eslaught 6 days ago

The other answers are great, but let me just add that C++ cannot be parsed with conventional LL/LALR/LR parsers, because the syntax is ambiguous and requires disambiguation via type checking (i.e., there may be multiple parse trees but at most one will type check).

There was some research on parsing C++ with GLR but I don't think it ever made it into production compilers.

Other, more sane languages with unambiguous grammars may still choose to hand-write their parsers for all the reasons mentioned in the sibling comments. However, I would note that, even when using a parsing library, almost every compiler in existence will use its own AST, and not reuse the parse tree generated by the parser library. That's something you would only ever do in a compiler class.

Also I wouldn't say that frontend/backend is an evolution of previous terminology, it's just that parsing is not considered an "interesting" problem by most of the community so the focus has moved elsewhere (from the AST design through optimization and code generation).

nextaccountic 6 days ago | parent | next [-]

Note that depending on what parsing lib you use, it may produce nodes of your own custom AST type

Personally I love the (Rust) combo of logos for lexing, chumsky for parsing, and ariadne for error reporting. Chumsky has options for error recovery and good performance, ariadne is gorgeous (there is another alternative for Rust, miette, both are good).

The only thing chumsky is lacking is incremental parsing. There is a chumsky-inspired library for incremental parsing called incpa though

estebank 6 days ago | parent [-]

If you want something more conservative for error reporting, annotate-snippets is finally at parity with rustc's current custom renderer and will soon become the default for both rustc and cargo.

nextaccountic 5 days ago | parent [-]

Will migrating to annotate-snippets change rustc/cargo formatting of errors in any way?

Also, in what sense it is more conservative?

estebank 5 days ago | parent [-]

The output will cause no user visible change.

It uses ASCII for all output, replaces ZWJs to have consistent terminal output in the face of multi codepoint emoji for two out of the top of my head.

ajb 6 days ago | parent | prev | next [-]

GLR C++ parsers were for a short time in use on production code at Mozilla, in refactoring tools: Oink (and it's fork, pork). Not quite sure what ended that, but I don't think it was any issue with parsing.

fithisux 6 days ago | parent | prev | next [-]

I disagree. It is interesting, that is why there many languages out there without an LSP.

ricudis 4 days ago | parent | prev [-]

Not just C++. Even C parsing is context-dependent because of typedef. Requires a bit of hackery to parse in a conventional LL/LARL/LR parser.