▲ | bxparks 3 days ago | ||||||||||||||||||||||
Ah I see, this is peeling back a few layers of obscurity about the Forth interpreter for me. Let's stick with a Forth interpreter because that seems easier to think about for me. Are you saying that the Forth interpreter is a 2-pass interpreter? Or does the interpreter go into a special IMMEDIATE mode upon hitting the IF keyword, then it just consumes subsequent tokens without doing any dispatching, until it hits the THEN token? It sounds like nested IF-THEN-ELSE becomes tricky to handle. How does the FORTH interpreter handle loops? Does the interpreter hit the WHILE token, goes into IMMEDIATE mode, remembers the location of the WHILE, then dispatches all the subsequent code, until it hits the REPEAT token, then branches back to the WHILE? | |||||||||||||||||||||||
▲ | addaon 3 days ago | parent | next [-] | ||||||||||||||||||||||
Oh, and because I didn't address it directly in the longer answer... > Does the interpreter hit the WHILE token, goes into IMMEDIATE mode, remembers the location of the WHILE, then dispatches all the subsequent code, until it hits the REPEAT token, then branches back to the WHILE? Yes. The beauty is that, in the context of a threaded code compiler (which, again, I encourage you to use as your default model for Forth, even though other options are possible), WHILE just pushes the address of the output code stream onto the compile-time stack. REPEAT expects the address to be on the compile-time stack and compiles a conditional jump to that address. This obviously and trivially provides support for nested control structures of all types; as long as the word that pushes compile-time data onto the stack is correctly paired with the word the pops it, we have stack semantics for nested control, which is exactly the expectation. So while your description is completely correct, "remembers" is almost trivial here -- "puts data on stack" is the primitive operation of remembering anything in Forth, and that's all that's needed here, no fancy data structures or look-aside buffers or anything. (Note that the compiler does require at least two non-stack data structures, the symbol table and the output code stream, but those reflect real domain complexity.) | |||||||||||||||||||||||
▲ | addaon 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||
I've actually never worked with a "pure" interpreter in Forth, only compilers of various levels of complexity. Threaded code compilers are (in my experience) by far the most common way to deal with forth -- and they are very much 2-pass. Even when used as an "interpreter," they generate (trivial, usually) machine code, then jump to it. Consider a definition (in some ill-defined Forth variant) like
We can categorize things:
So the compiler goes through one token (that it sees) at a time.First up is `:`. `:` is an IMMEDIATE word, so the compiler just calls it now. `:` then consumes a symbol (`abs-sqr`) from the token stream so the compiler won't see it (think of calling next() on an iterator in python or equivalent), then creates a symbol table entry from that symbol to the /current compiled code output stream pointer/ -- that is, just after the last piece of code that was compiled. Next up is `(`, since we already consumed `abs-sqr`. This is an IMMEDIATE word again -- and it just consumes tokens until one of them is exactly `)`, discarding them -- that is, it defines a comment. Finally we get to the "easy" case, `*`. The compiler finally compiles! It looks up this symbol in the symbol table, sees that it is /not/ IMMEDIATE, and compiles a call to this address. Now the compiler sees `0`. This is a literal token, so we don't even bother with the symbol table; we special-case code to push this value on the stack. '<' is a non-IMMEDIATE symbol, we already know this case. We've already discussed `if`, `neg`, and `then`. And `;` is an IMMEDIATE word that just puts a return instruction into the code stream. Clear as mud? There's one more step from here that's important to make, which is that the choice of what's IMMEDIATE or not is not strictly defined. Some words must be IMMEDIATE for correctness, if they interact with the compiler in interesting ways, like consuming tokens or back-patching instructions. But suppose we want to be clever... `<` works fine as a non-IMMEDIATE word. If we want to inline it, we /could/ have the compiler generalize by looking at the instructions pointed to by it, seeing how long they are (or tracking that in the symbol table), and deciding whether to inline... or we can just re-implement `<` as an immediate word that adds the appropriate instructions directly into the code stream. Combined with assembly words, this can be pretty trivially expressed, and it really changes the paradigm a bit. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | vdupras 3 days ago | parent | prev [-] | ||||||||||||||||||||||
No, that's not it. It's much simpler than that, yet has much deeper implications than you think. You don't see it in other languages. The closest thing would maybe be compile-time macros in Zig? But in Forth, the power it unlocks comes in its purest form, without any fluff around it. |