Remix.run Logo
kragen 3 days ago

You're describing the outer interpreter in interpretation state; Forth control flow words don't work properly in interpretation state, only in compile state. They're immediate words, so they execute at compile time instead of run time, so they can do arbitrary things to the code being compiled. Here's Mike Perry and Henry Laxen's implementation of the main control-flow words from F83, which is an indirect-threaded Forth:

    \ Run Time Code for Control Structures                04OCT83HHL \ \ Run Time Code for Control Structures                05MAR83HHL
    CODE BRANCH   (S -- )                                            \ BRANCH    Performs an unconditional branch.  Notice that we
    LABEL BRAN1   0 [IP] IP MOV   NEXT END-CODE                      \    are using absolute addresses insead of relative ones. (fast)
    CODE ?BRANCH   (S f -- )                                         \ ?BRANCH   Performs a conditional branch.  If the top of the
      AX POP   AX AX OR   BRAN1 JE   IP INC   IP INC   NEXT END-CODE \    parameter stack in True, take the branch.  If not, skip
                                                                     \    over the branch address which is inline.

    \ Extensible Layer            Structures              03Apr84map \ \ Extensible Layer            Structures              03Apr84map
    : ?CONDITION   (S f -- )                                         \ ?CONDITION
       NOT ABORT" Conditionals Wrong"   ;                            \    Simple compile time error checking.  Usually adequate
    : >MARK      (S -- addr )    HERE 0 ,   ;                        \ >MARK        Set up for a Forward Branch
    : >RESOLVE   (S addr -- )    HERE SWAP !   ;                     \ >RESOLVE     Resolve a Forward Branch
    : <MARK      (S -- addr )    HERE    ;                           \ <MARK        Set up for a Backwards Branch
    : <RESOLVE   (S addr -- )    ,   ;                               \ <RESOLVE     Resolve a Backwards Branch

    : ?>MARK      (S -- f addr )   TRUE >MARK   ;                    \ ?>MARK       Set up a forward Branch with Error Checking
    : ?>RESOLVE   (S f addr -- )   SWAP ?CONDITION >RESOLVE  ;       \ ?>RESOLVE    Resolve a forward Branch with Error Checking
    : ?<MARK      (S -- f addr )   TRUE   <MARK   ;                  \ ?<MARK       Set up for a Backwards Branch with Error Checking
    : ?<RESOLVE   (S f addr -- )   SWAP ?CONDITION <RESOLVE  ;       \ ?<RESOLVE    Resolve a backwards Branch with Error Checking

    : LEAVE   COMPILE (LEAVE)   ; IMMEDIATE                          \ LEAVE and ?LEAVE could be non-immediate in this system,
    : ?LEAVE  COMPILE (?LEAVE)  ; IMMEDIATE                          \   but the 83 standard specifies an immediate LEAVE, so they
                                                                     \   both are for uniformity.
     
    \ Extensible Layer            Structures              01Oct83map \ \ Extensible Layer            Structures              27JUL83HHL
    : BEGIN   ?<MARK                                   ; IMMEDIATE   \ These are the compiling words needed to properly compile
    : THEN    ?>RESOLVE                                ; IMMEDIATE   \ the Forth Conditional Structures.  Each of them is immediate
    : DO      COMPILE (DO)   ?>MARK                    ; IMMEDIATE   \ and they must compile their runtime routines along with
    : ?DO     COMPILE (?DO)  ?>MARK                    ; IMMEDIATE   \ whatever addresses they need.  A modest amount of error
    : LOOP                                                           \ checking is done.  If you want to rip out the error checking
        COMPILE (LOOP)  2DUP 2+ ?<RESOLVE ?>RESOLVE    ; IMMEDIATE   \ change the ?> and ?< words to > and < words, and
    : +LOOP                                                          \ all of the 2DUPs to DUPs and the 2SWAPs to SWAPs.  The rest
        COMPILE (+LOOP) 2DUP 2+ ?<RESOLVE ?>RESOLVE    ; IMMEDIATE   \ should stay the same.
    : UNTIL   COMPILE ?BRANCH    ?<RESOLVE             ; IMMEDIATE
    : AGAIN   COMPILE  BRANCH    ?<RESOLVE             ; IMMEDIATE
    : REPEAT  2SWAP [COMPILE] AGAIN   [COMPILE] THEN   ; IMMEDIATE
    : IF      COMPILE  ?BRANCH  ?>MARK                 ; IMMEDIATE
    : ELSE    COMPILE  BRANCH ?>MARK  2SWAP ?>RESOLVE  ; IMMEDIATE
    : WHILE   [COMPILE] IF                             ; IMMEDIATE
When the interpreter is toodling along in compile state, compiling a colon definition by stowing pointers one after another (at the pointer here) into the definition of some word you're compiling, and it encounters an if, it sees that if is immediate, and so instead of stowing a pointer to if it just runs it immediately. The definition of if is compile ?branch ?>mark. compile is also an immediate word [correction, no, it's not, see below comment, though the following is still correct]; compile ?branch stows a pointer to ?branch into the colon definition being compiled, and then ?>mark writes a 0 into the entry following the ?branch and pushes true and the address of the 0 on the operand stack, at compile time, with the sequence true here 0 ,. The interpreter toodles along compiling the body of the if and eventually gets to, for example, then, which is also immediate, and is defined as ?>resolve, which overwrites the 0 into the address of the indirect-threaded code that will be compiled following the then. It does this with swap ?condition here swap !. The swap ?condition part aborts with an error if there isn't an unresolved if or similar on the stack to resolve, consuming the true, leaving only the address of the 0 that ?>mark had pushed. So then here swap ! overwrites that 0 with the current value of here.

?branch is a word written in assembly which does a conditional jump in the inner interpreter (the one that interprets the indirect-threaded code); when it's executed, it pops a value off the stack and checks to see if it's zero, and if so, it changes the interpreter's execution pointer ip (which is defined elsewhere as the register si) to the number stored in the threaded code following the pointer to ?branch. If, on the other hand, the value it popped was nonzero, it increments ip twice to skip over that number. (Note that Laxen's comment on ?branch is incorrect in that it reverses the sense of the test.)

All the forward jumps work in pretty much the same way: when you begin a control structure you call ?>mark to write a zero placeholder and push its address, and later on you "resolve" that placeholder by popping its address off the stack and overwriting it with the correct address. leave (break) and ?leave (if (...) break) work slightly differently, but mostly the same.

Backward jumps work the other way around: when you begin a control structure, as in begin, you call ?<mark to save the current address on the stack so that you can jump to it later, which ends up just being true here. Then, to actually compile the jump, for example in until or again, you call ?<resolve, which ends up just being swap ?condition ,—the , pops the jump target address off the stack and compiles it into the indirect threaded code, serving as an argument the ?branch or branch instruction compiled immediately before it.

begin ... while ... repeat is handled, as you can see, by treating the while ... repeat part as an if ... then with an unconditional jump back to the begin jammed in right before the then.

Hopefully this is helpful!

BTW, for the above, I reformatted the block files from the F83 distribution with http://canonical.org/~kragen/sw/dev3/blk2unix.py, which you may find useful if you want to do the same thing.

alexisread 3 days ago | parent | next [-]

Oof, I forget that most forths are a bit mind bending with the compiler STATE. There are 2/3 alternatives to using compiler state aka IMMEDIATE.

https://github.com/dan4thewin/FreeForth2 This uses a two-pass search, for macros` and after that immediate words.

The most interesting one is Able forth https://github.com/ablevm which uses flow control to defer execution, aka quotations. I find using quotations instead of immediate modes easier to understand.

With both of these, they always compile expressions before executing them, so IF/THEN/ELSE can be used at any time.

andrewla 3 days ago | parent | prev | next [-]

Thanks for this expansion of the ideas involved. My question here is what does the COMPILE word do? What is the state of the VM / compiler / repl or whatever after it encounters that word?

That "IF" is implemented in terms of other more fundamental operators is fine, but can we write a program that just uses the fundamental operators that demonstrates IF-like behavior but doesn't introduce any intermediate words?

kragen 3 days ago | parent [-]

Actually compile is not an immediate word (at least in F83). I was wrong about that. Here's the F83 definition:

    : COMPILE   (S -- )   R> DUP 2+ >R   @ ,   ;                     \ COMPILE     Compile the following word when this def. executes
This takes its return address (which points to the following word in the colon definition that called it), dups it, adds 2 to it, and puts that back on the return stack as its new return address. Then, it fetches from its original return address with @ (thus getting the address of the word that followed it in the colon definition, such as ?branch in my if example above) and compiles it with , into whatever is currently being compiled. Then, when it returns, having added 2 to the return address means that we don't actually execute ?branch or whatever; we've skipped over it.

So it doesn't change the state of the interpreter at all!

I think you're asking if you can use things like ?branch usefully without writing any immediate words. In some sense I think the answer is yes in F83 but no in standard Forth. I think you can put a code sequence like ?branch [ here 0 , ] into a colon definition to do what if does, and then later on say [ here swap ! ] to do what then does. I just typed this definition into F83, and it seems to work†:

    : is3 3 = ?branch [ here 0 , ] ." yes" [ here swap ! ] ;
You could sort of think of if and then as being macros for ?branch [ here 0 , ] and [ here swap ! ] respectively (although I'm omitting the checks they use for proper control structure nesting).

On the other hand, this is only possible because [ is an immediate word, and because ?branch is exposed, and happens to take an absolute address in the next word in the colon definition (as opposed to a byte delta or something). As it happens, exactly the same definition of is3 appears to work in GForth 0.7.3 and PFE 0.33.71, but it definitely will not work on, for example, any native-code-compiling Forth.

The standard way to invoke things like ?branch is using if, while, and so on. And you don't have to define any immediate words to do that, either.

______

† By "work" I mean it seems to behave the same as

    : is3 3 = if ." yes" then ;
bxparks 3 days ago | parent | prev [-]

Wow, that's going to take some time and effort to digest, but thank you.

Yes, I think control-flow is easier to understand in assembly language than the implementation you showed in Forth. :-)

kragen 3 days ago | parent [-]

Happy to help!

I think you're mistaken about assembly language.

In assembly language, the thing that plays the role of these definitions like if and then and ?<resolve is the assembler's symbol table and relocation logic, which goes back and changes your jump instructions (etc.) to jump to the places where it finds that your labels have been defined to point. Typically this involves things like hash functions, hash table collision resolution, various operand encodings for things like short jumps and long jumps, and so on.

Although you can write an assembler that does all this in an afternoon, I don't think you will ever find an assembler whose implementation of all this functionality is easier to understand than the above 30 lines of code. It might be easier to understand per line of code but there will be a lot more lines of code to understand, like 10× or 100×.