> So there must be something else fundamental in the Forth interpreter that I don't understand.

The missing bit is IMMEDIATE mode. Words can be tagged as IMMEDIATE, which means that they get executed at compile time (or parse time, for an interpreter), rather than a call to them getting compiled (executed at run time, for an interpreter). IF/ELSE/THEN are then "just" IMMEDIATE mode words -- but you can add your own. The "special sauce" is that IF compiles (easier to talk about for a compiler; generalize as needed) a conditional branch to an unknown address, and puts the address of that branch instruction (or an equivalent) on the /compile time/ data stack; THEN then looks at the address on the /compile time/ data stack and patches that instruction to branch to the correct address. Plenty of subtlety possible, but the basic primitive of IMMEDIATE mode is the key.

▲ bxparks 3 days ago | parent [-]

Ah I see, this is peeling back a few layers of obscurity about the Forth interpreter for me. Let's stick with a Forth interpreter because that seems easier to think about for me.

Are you saying that the Forth interpreter is a 2-pass interpreter? Or does the interpreter go into a special IMMEDIATE mode upon hitting the IF keyword, then it just consumes subsequent tokens without doing any dispatching, until it hits the THEN token? It sounds like nested IF-THEN-ELSE becomes tricky to handle.

How does the FORTH interpreter handle loops? Does the interpreter hit the WHILE token, goes into IMMEDIATE mode, remembers the location of the WHILE, then dispatches all the subsequent code, until it hits the REPEAT token, then branches back to the WHILE?

▲ addaon 3 days ago | parent | next [-]

Oh, and because I didn't address it directly in the longer answer...

> Does the interpreter hit the WHILE token, goes into IMMEDIATE mode, remembers the location of the WHILE, then dispatches all the subsequent code, until it hits the REPEAT token, then branches back to the WHILE?

Yes. The beauty is that, in the context of a threaded code compiler (which, again, I encourage you to use as your default model for Forth, even though other options are possible), WHILE just pushes the address of the output code stream onto the compile-time stack. REPEAT expects the address to be on the compile-time stack and compiles a conditional jump to that address. This obviously and trivially provides support for nested control structures of all types; as long as the word that pushes compile-time data onto the stack is correctly paired with the word the pops it, we have stack semantics for nested control, which is exactly the expectation. So while your description is completely correct, "remembers" is almost trivial here -- "puts data on stack" is the primitive operation of remembering anything in Forth, and that's all that's needed here, no fancy data structures or look-aside buffers or anything. (Note that the compiler does require at least two non-stack data structures, the symbol table and the output code stream, but those reflect real domain complexity.)

▲ addaon 3 days ago | parent | prev | next [-]

I've actually never worked with a "pure" interpreter in Forth, only compilers of various levels of complexity. Threaded code compilers are (in my experience) by far the most common way to deal with forth -- and they are very much 2-pass. Even when used as an "interpreter," they generate (trivial, usually) machine code, then jump to it.

Consider a definition (in some ill-defined Forth variant) like

    : abs-sqr ( n -- |n^2| ) * 0 < if neg then ;

We can categorize things:

    IMMEDIATE words used here are : ( if then ;
    Normal words are * < neg
    Literals are 0
    Tokens that are not seen by the compiler directly (!) are abs-sqr, the contents of the comment, and )

So the compiler goes through one token (that it sees) at a time.

First up is `:`. `:` is an IMMEDIATE word, so the compiler just calls it now. `:` then consumes a symbol (`abs-sqr`) from the token stream so the compiler won't see it (think of calling next() on an iterator in python or equivalent), then creates a symbol table entry from that symbol to the /current compiled code output stream pointer/ -- that is, just after the last piece of code that was compiled.

Next up is `(`, since we already consumed `abs-sqr`. This is an IMMEDIATE word again -- and it just consumes tokens until one of them is exactly `)`, discarding them -- that is, it defines a comment.

Finally we get to the "easy" case, `*`. The compiler finally compiles! It looks up this symbol in the symbol table, sees that it is /not/ IMMEDIATE, and compiles a call to this address.

Now the compiler sees `0`. This is a literal token, so we don't even bother with the symbol table; we special-case code to push this value on the stack.

'<' is a non-IMMEDIATE symbol, we already know this case.

We've already discussed `if`, `neg`, and `then`. And `;` is an IMMEDIATE word that just puts a return instruction into the code stream.

Clear as mud?

There's one more step from here that's important to make, which is that the choice of what's IMMEDIATE or not is not strictly defined. Some words must be IMMEDIATE for correctness, if they interact with the compiler in interesting ways, like consuming tokens or back-patching instructions. But suppose we want to be clever... `<` works fine as a non-IMMEDIATE word. If we want to inline it, we /could/ have the compiler generalize by looking at the instructions pointed to by it, seeing how long they are (or tracking that in the symbol table), and deciding whether to inline... or we can just re-implement `<` as an immediate word that adds the appropriate instructions directly into the code stream. Combined with assembly words, this can be pretty trivially expressed, and it really changes the paradigm a bit.

▲ Someone 2 days ago | parent | next [-]

> We can categorize things: > IMMEDIATE words used here are : ( if then ;

`:` normally isn’t immediate

> First up is `:`. `:` is an IMMEDIATE word, so the compiler just calls it now

`:` gets executed because the interpreter, when it isn’t compiling, goes through a loop:

  1) read a token until the next space in the input
  2) look up that token in the dictionary
    3a) if a word is found: call it
    3b) if no word is found: try interpreting the token as a number
      4a) if it can be interpreted such: push that number on the stack
      4b) if it cannot: bail out with an error message

So, `:` gets called in step 3a.

> Now the compiler sees `0`. This is a literal token, so we don't even bother with the symbol table; we special-case code to push this value on the stack.

As indicated above, that’s not how ‘normal’ forths work. A lookup is done for a word named `0`, and if it exists, a call to it is compiled.

Many forths had words named after small constants such as `0`, `1`, `2` or `-1` because compiling a call to a function took less memory than compiling a call to the “LIT” function and compiling the constant value.

▲ bxparks 2 days ago | parent | prev [-]

> I've actually never worked with a "pure" interpreter in Forth, only compilers of various levels of complexity. Threaded code compilers are (in my experience) by far the most common way to deal with forth -- and they are very much 2-pass. Even when used as an "interpreter," they generate (trivial, usually) machine code, then jump to it.

Lots of good info, thank you. I don't think I will fully understand what you wrote until I implement a Forth interpreter myself.

So a side question: If most Forth "interpreters" are compilers, how does a Forth interpreter work in a Harvard architecture microprocessor (with separate memory space for data and instructions) instead of a Von Neumann architecture with a unified memory layout? In other words, in a Harvard architecture (e.g. AVR microcontrollers), the Forth compiler will live in read-only flash ROM, and it cannot generate machine code into RAM and execute it, because the data memory is not executable.

	▲	addaon 2 days ago \| parent [-]
		> how does a Forth interpreter work in a Harvard architecture microprocessor You compile to "direct threaded code" in data memory; direct threaded code represents a sequence of calls as a sequence of addresses to call. So while "normal" threaded code (what Wikipedia calls "subroutine threading") would just have `call word_a call word_b call word_c` And then executing that means jumping to the first instruction, direct threaded code would have `&word_a &word_b &word_c` And then there's a suuuuper tiny runtime (like four of five instructions, literally) that has a "runtime instruction pointer" or whatever you want to call it, and just increments that and does an indirect call through to the next word whenever it's returned to.

▲ vdupras 3 days ago | parent | prev [-]

No, that's not it. It's much simpler than that, yet has much deeper implications than you think. You don't see it in other languages. The closest thing would maybe be compile-time macros in Zig? But in Forth, the power it unlocks comes in its purest form, without any fluff around it.