▲ | addaon 3 days ago | |||||||
I've actually never worked with a "pure" interpreter in Forth, only compilers of various levels of complexity. Threaded code compilers are (in my experience) by far the most common way to deal with forth -- and they are very much 2-pass. Even when used as an "interpreter," they generate (trivial, usually) machine code, then jump to it. Consider a definition (in some ill-defined Forth variant) like
We can categorize things:
So the compiler goes through one token (that it sees) at a time.First up is `:`. `:` is an IMMEDIATE word, so the compiler just calls it now. `:` then consumes a symbol (`abs-sqr`) from the token stream so the compiler won't see it (think of calling next() on an iterator in python or equivalent), then creates a symbol table entry from that symbol to the /current compiled code output stream pointer/ -- that is, just after the last piece of code that was compiled. Next up is `(`, since we already consumed `abs-sqr`. This is an IMMEDIATE word again -- and it just consumes tokens until one of them is exactly `)`, discarding them -- that is, it defines a comment. Finally we get to the "easy" case, `*`. The compiler finally compiles! It looks up this symbol in the symbol table, sees that it is /not/ IMMEDIATE, and compiles a call to this address. Now the compiler sees `0`. This is a literal token, so we don't even bother with the symbol table; we special-case code to push this value on the stack. '<' is a non-IMMEDIATE symbol, we already know this case. We've already discussed `if`, `neg`, and `then`. And `;` is an IMMEDIATE word that just puts a return instruction into the code stream. Clear as mud? There's one more step from here that's important to make, which is that the choice of what's IMMEDIATE or not is not strictly defined. Some words must be IMMEDIATE for correctness, if they interact with the compiler in interesting ways, like consuming tokens or back-patching instructions. But suppose we want to be clever... `<` works fine as a non-IMMEDIATE word. If we want to inline it, we /could/ have the compiler generalize by looking at the instructions pointed to by it, seeing how long they are (or tracking that in the symbol table), and deciding whether to inline... or we can just re-implement `<` as an immediate word that adds the appropriate instructions directly into the code stream. Combined with assembly words, this can be pretty trivially expressed, and it really changes the paradigm a bit. | ||||||||
▲ | Someone 2 days ago | parent | next [-] | |||||||
> We can categorize things: > IMMEDIATE words used here are : ( if then ; `:` normally isn’t immediate > First up is `:`. `:` is an IMMEDIATE word, so the compiler just calls it now `:` gets executed because the interpreter, when it isn’t compiling, goes through a loop:
So, `:` gets called in step 3a.> Now the compiler sees `0`. This is a literal token, so we don't even bother with the symbol table; we special-case code to push this value on the stack. As indicated above, that’s not how ‘normal’ forths work. A lookup is done for a word named `0`, and if it exists, a call to it is compiled. Many forths had words named after small constants such as `0`, `1`, `2` or `-1` because compiling a call to a function took less memory than compiling a call to the “LIT” function and compiling the constant value. | ||||||||
▲ | bxparks 2 days ago | parent | prev [-] | |||||||
> I've actually never worked with a "pure" interpreter in Forth, only compilers of various levels of complexity. Threaded code compilers are (in my experience) by far the most common way to deal with forth -- and they are very much 2-pass. Even when used as an "interpreter," they generate (trivial, usually) machine code, then jump to it. Lots of good info, thank you. I don't think I will fully understand what you wrote until I implement a Forth interpreter myself. So a side question: If most Forth "interpreters" are compilers, how does a Forth interpreter work in a Harvard architecture microprocessor (with separate memory space for data and instructions) instead of a Von Neumann architecture with a unified memory layout? In other words, in a Harvard architecture (e.g. AVR microcontrollers), the Forth compiler will live in read-only flash ROM, and it cannot generate machine code into RAM and execute it, because the data memory is not executable. | ||||||||
|