Remix.run Logo
eru 3 hours ago

The Dragon book is not very good, to be honest.

It was probably decent when all you had was something like Pascal and you wanted to write a C compiler.

Parsing and compiling and interpreting etc are all much more at home in functional languages. Much easier to understand there. And once you do, then you can translate back into imperative.

For parsing: by default you should be using parser combinators.

pklausler an hour ago | parent [-]

Is there a production compiler out there that doesn't use recursive descent, preferably constructed from combinators? Table-driven parsers seem now to be a "tell" of an old compiler or a hobby project.

eru 23 minutes ago | parent | next [-]

Oh, I was talking much more about how you can first learn how to write a compiler. I wasn't talking about how you write a production industry-strength compiler.

Btw, I mentioned parser combinators: those are basically just a front-end. Similar to regular expressions. The implementation can be all kinds of things, eg could be recursive descent or a table or backtracking or whatever. (Even finite automata, if your combinators are suitably restricted.)

pklausler 15 minutes ago | parent [-]

I used a small custom parser combinator library to parse Fortran from raw characters (since tokenization is so context-dependent), and it's worked well.

ogogmad 26 minutes ago | parent | prev [-]

Some people appreciate that an LR/LALR parser generator can prove non-ambiguity and linear time parse-ability of a grammar. A couple of examples are the creator of the Oil shell, and one of the guys responsible for Rust.

It does make me wonder though about why grammars have to be so complicated that such high-powered tools are needed. Isn't the gist of LR/LALR that the states of an automaton that can parse CFGs can be serialised to strings, and the set of those strings forms a regular language? Once you have that, many desirable "infinitary" properties of a parsing automaton can be automatically checked in finite time. LR and LALR fall out of this, in some way.

pklausler 19 minutes ago | parent [-]

Production compilers must have robust error recovery and great error messages, and those are pretty straightforward in recursive descent, even if ad hoc.