▲ | kragen 3 days ago | |||||||
You're describing the outer interpreter in interpretation state; Forth control flow words don't work properly in interpretation state, only in compile state. They're immediate words, so they execute at compile time instead of run time, so they can do arbitrary things to the code being compiled. Here's Mike Perry and Henry Laxen's implementation of the main control-flow words from F83, which is an indirect-threaded Forth:
When the interpreter is toodling along in compile state, compiling a colon definition by stowing pointers one after another (at the pointer here) into the definition of some word you're compiling, and it encounters an if, it sees that if is immediate, and so instead of stowing a pointer to if it just runs it immediately. The definition of if is compile ?branch ?>mark. compile is also an immediate word [correction, no, it's not, see below comment, though the following is still correct]; compile ?branch stows a pointer to ?branch into the colon definition being compiled, and then ?>mark writes a 0 into the entry following the ?branch and pushes true and the address of the 0 on the operand stack, at compile time, with the sequence true here 0 ,. The interpreter toodles along compiling the body of the if and eventually gets to, for example, then, which is also immediate, and is defined as ?>resolve, which overwrites the 0 into the address of the indirect-threaded code that will be compiled following the then. It does this with swap ?condition here swap !. The swap ?condition part aborts with an error if there isn't an unresolved if or similar on the stack to resolve, consuming the true, leaving only the address of the 0 that ?>mark had pushed. So then here swap ! overwrites that 0 with the current value of here.?branch is a word written in assembly which does a conditional jump in the inner interpreter (the one that interprets the indirect-threaded code); when it's executed, it pops a value off the stack and checks to see if it's zero, and if so, it changes the interpreter's execution pointer ip (which is defined elsewhere as the register si) to the number stored in the threaded code following the pointer to ?branch. If, on the other hand, the value it popped was nonzero, it increments ip twice to skip over that number. (Note that Laxen's comment on ?branch is incorrect in that it reverses the sense of the test.) All the forward jumps work in pretty much the same way: when you begin a control structure you call ?>mark to write a zero placeholder and push its address, and later on you "resolve" that placeholder by popping its address off the stack and overwriting it with the correct address. leave (break) and ?leave (if (...) break) work slightly differently, but mostly the same. Backward jumps work the other way around: when you begin a control structure, as in begin, you call ?<mark to save the current address on the stack so that you can jump to it later, which ends up just being true here. Then, to actually compile the jump, for example in until or again, you call ?<resolve, which ends up just being swap ?condition ,—the , pops the jump target address off the stack and compiles it into the indirect threaded code, serving as an argument the ?branch or branch instruction compiled immediately before it. begin ... while ... repeat is handled, as you can see, by treating the while ... repeat part as an if ... then with an unconditional jump back to the begin jammed in right before the then. Hopefully this is helpful! BTW, for the above, I reformatted the block files from the F83 distribution with http://canonical.org/~kragen/sw/dev3/blk2unix.py, which you may find useful if you want to do the same thing. | ||||||||
▲ | alexisread 3 days ago | parent | next [-] | |||||||
Oof, I forget that most forths are a bit mind bending with the compiler STATE. There are 2/3 alternatives to using compiler state aka IMMEDIATE. https://github.com/dan4thewin/FreeForth2 This uses a two-pass search, for macros` and after that immediate words. The most interesting one is Able forth https://github.com/ablevm which uses flow control to defer execution, aka quotations. I find using quotations instead of immediate modes easier to understand. With both of these, they always compile expressions before executing them, so IF/THEN/ELSE can be used at any time. | ||||||||
▲ | andrewla 3 days ago | parent | prev | next [-] | |||||||
Thanks for this expansion of the ideas involved. My question here is what does the COMPILE word do? What is the state of the VM / compiler / repl or whatever after it encounters that word? That "IF" is implemented in terms of other more fundamental operators is fine, but can we write a program that just uses the fundamental operators that demonstrates IF-like behavior but doesn't introduce any intermediate words? | ||||||||
| ||||||||
▲ | bxparks 3 days ago | parent | prev [-] | |||||||
Wow, that's going to take some time and effort to digest, but thank you. Yes, I think control-flow is easier to understand in assembly language than the implementation you showed in Forth. :-) | ||||||||
|