The "Dragon Book" is big on parsing but I wouldn't recommend it if you want to make many optimisation passes or a back-end.

The first edition was my first CS textbook, back in the '90s and as a young programmer I learned a lot from it. A couple years ago, I started on a modern compiler back-end however, and found that I needed to update my knowledge with quite a lot.

The 2nd ed covers data-flow analysis, which is very important. However, modern compilers (GCC, LLVM, Cranelift, ...) are built around an intermediate representation in Static Single Assignment-form. The 2nd ed. has only a single page about SSA and you'd need to also learn a lot of theory about its properties to actually use it properly.

▲ aldousd666 12 hours ago | parent [-]

Parsing is the front end to a compiler. Can't get semantics without first recognizing syntax. I have a hard time thinking about programming languages without seeing them as a parsing exercise first, every time.

▲ gf000 11 hours ago | parent | next [-]

The recommended advice is to start with semantics first. Syntax will change, there is not much point fixing it down too early.

Most of the work is actually the backend, and people sort of illusion themselves into "creating a language" just because they have an AST.

▲ thefaux 9 hours ago | parent | next [-]

Syntax and semantics are never orthogonal and you always need syntax so it must be considered from the start. Any reasonable syntax will quickly become much more pleasant to generate an ast or ir than, say, manually building these objects in the host language of the compiler which is what the semantics first crowd seem to propose.

It also is only the case that most of the work is the backend for some compilers, though of course all of this depends on how backend is defined. Is backend just codegen or is it all of the analysis between parsing and codegen? If you target a high level language, which is very appropriate for one's first few compilers, the backend can be quite simple. At the simplest, no ast is even necessary and the compiler can just mechanically translate one syntax into another in a single pass.

	▲	ablob 9 hours ago \| parent [-]
		I think his point is that "form follows function". If you know what kind of semantics you're going to have, you can use that to construct a syntax that lends itself to using it properly.

▲ bsder 2 hours ago | parent | prev [-]

> The recommended advice is to start with semantics first. Syntax will change, there is not much point fixing it down too early.

It's actually the reverse, in my opinion. Semantics can change much more easily than syntax. You can see this in that small changes in syntax can cause massive changes in a recursive-descent parser while the semantics can change from pass-by-reference to pass-by-value and make it barely budge.

There is a reason practically every modern language has adopted syntax sigils like (choosing Zig):

    pub fn is_list(arg: arg_t, len: ui_t) bool {

This allows the identification of the various parts and types without referencing or compiling the universe. That's super important and something that must be baked in the syntax at the start or there is nothing you can do about it.

▲ samus 10 hours ago | parent | prev [-]

Getting an overview of parsing theory is mainly useful to avoid making ambiguous or otherwise hard to parse grammars. Usually one can't go too wrong with a hand-written recursive descent parser, and most general-purpose language are so complicated that parser generator can't really handle them. Anyway the really interesting parts of compiling happen in the backend.

Another alternative is basing the language on S-expressions, for which a parser is extremely simple to write.