Remix.run Logo
kragen 3 days ago

I think mostly learning Forth is like learning any other programming language (or, better said, programming environment): you learn by doing it. Books can be a useful complement to practice, but practice is how you learn to do things. You can't learn to do things by reading.

As for string handling, in my limited experience, string handling in Forth is a lot like string handling in C; you have to allocate buffers and copy characters between them. memcpy is called move, and memset is called fill. You can use the pad if you want, but you can just as well create inbuf 128 allot and use inbuf. There are two big differences:

1. Forth doesn't have NUL-terminated strings like C does, because it's just as easy to return a pointer and a length from a subroutine as it would be to return just a pointer. This is generally a big win, preventing a lot of subtle and dangerous bugs. (Forth is generally more error-prone than C, but this is an exception.)

2. Forth unfortunately does have something called a "counted string", where the string length is stored in the byte before the string data. You can create them with C" (https://forth-standard.org/standard/core/Cq), and Forth beginners often wonder whether to use counted strings. The answer is no: you should never use counted strings, and they should not have been included in the standard. Use normal strings, created with S" (https://forth-standard.org/standard/core/Sq), unless you are calling word or find. https://forth-standard.org/standard/rationale#rat:cstring goes into some of the history of this.

If you want to allocate strings on the heap, which is often the simplest way to handle strings, malloc is called allocate, realloc is called resize, and free is called free: https://forth-standard.org/standard/memory

With respect to multicore and persistent data structures (I assume you mean FP-persistent, as in, an old pointer to a data structure is a pointer to the old version of the data structure), stacks aren't really related to them. Each Forth thread has its own operand stack and its own return stack (and sometimes its own dictionary), so they don't really create interactions between different cores.

zelphirkalt 3 days ago | parent [-]

I think there is another problem for me: The last time I have done any manual memory management a la C, before using Forth was some >10y ago. And immediately the next question would pop up in my head: "What if that line is longer than 128 bytes? Is there no general function to read a whole line?" And I guess then I would reinvent the whole machinery to read a whole line, determining at which byte the newline appears. And then I would have doubts like: "Uh, but what if someone puts some unicode characters in there?". While actually all I wanted was to read a single file, to get working on an AoC puzzle.

So I think I lacked the manual memory management basics as well at that point, and any haphazardly implemented hack like "assume the longest line is at most 128 ASCII characters long" would not have made me happy with my code.

kragen 3 days ago | parent [-]

Well, to bake an apple pie from scratch, you must first create the universe.

In any programming language, to read an arbitrarily long line into memory, you need an arbitrarily large computer, so your software may need to pause to convert more Temu orders, continents, asteroids, or star systems into computronium. If you're not willing to go that far, you have basically two choices:

1. Process the line in a streaming fashion rather than holding all of it in memory at once.

2. Only handle lines up to some maximum length.

If you select option 2, the only remaining questions are:

2a. What is that maximum length?

2b. What happens if you hit it?

Maybe 128 bytes is not a limit you're happy with, but it's just as easy to use 1048576 or 1234567890. Your code may be easier to understand and easier to get right if you use a dynamically-allocated string type (I suggest studying stralloc from qmail 1.03), but don't fool yourself into thinking that that means there's no limit on input line length. Dismayingly often, the answer to 2b in that case is "Linux starts thrashing and becomes unusably slow until you reboot it."

(If your input is UTF-8, the line-reading function doesn't have to worry about whether the bytes represent Unicode characters or not, because byte 0x0a will never occur inside a non-ASCII character.)

zelphirkalt 3 days ago | parent [-]

The point is, I don't want to spend lots of time solving these essential problems, when I actually want to learn the language through solving puzzles. It seems, that Forth does not lend itself to be learned that way, since even very basic things are not provided and require in-depth knowledge of Forth and developing manual memory managed solutions to problems, that are solved in almost every programming language in their standard libraries. If I used Python it would literally be 2 lines of code, and with file.readlines() or so, I don't have to think about how long a line can be and then develop ad-hoc brittle half-solutions.

Perhaps readlines() has a limit somewhere too though. Just not aware of it and so far have not needed to deal with that kind of thing. But then again Forth and Python are 2 very different languages and act on another level of abstraction in many cases, so maybe that comparison is not fair.

kragen 2 days ago | parent [-]

Forth was sort of designed by and for people who did want to solve these essential problems anew for each application. Chuck Moore claimed many times that a tailored ("ad hoc") solution that solves only the part of the problem you need to solve for a particular application would be 10× smaller and simpler than a generalized solution that has to balance the needs of all possible applications. He considered it preferable to not have a lot of library code in your application to solve problems you don't actually have. Maybe your ad-hoc solution is brittle, but it's brittle precisely in ways you know about, not in ways you don't.

But you don't have to use Forth that way just because Chuck did. You can totally use a generalized string library in Forth. I don't know which one to recommend, but http://turboforth.net/resources/string_library.html seems to be one possibility.

You can be sure that Python's file.readlines()† will have trouble if you try to read a line that is much longer than your RAM size.

You can get pretty far with just built-in standard functionality, though:

    Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
    Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
    Type `bye' to exit
    128 constant len  create buf len allot  ok
    : greet ." Name? "  buf len accept  ." Hello, " buf swap type ." !" ;  ok
    greet Name? Zelphir Hello, Zelphir! ok
And, as you said, GForth comes with a heap-allocated string library https://gforth.org/manual/String-words.html#String-words which you can use if you first say

    include string.fs
______

† ever since Python 2.0, I'd recommend using list(file) instead of file.readlines(), or just iterate over the file directly, like [line.strip() for line in file if line.startswith('zel')]