Remix.run Logo
zelphirkalt 3 days ago

Reading lines from a file and handling the strings in memory is what made me stop using it after a 3rd day of advent of code one year. I simply couldn't find a good solution, without a massive excursion into how to use the pad. Such a supposedly simple thing like reading a complete line from a file, yet it stopped me completely. Of course I could have "cheated" and put the input right into the program, but I wanted to learn Forth, so I thought I should be able to do this ...

Later I read, that GForth 1.0 should have more string handling words, but then I already had lost hope to find an easy solution. Don't get me wrong, learning the little bit of Forth that I did learn, it was quite interesting, and I would have liked to progress more. I think I also lost hope, because I couldn't see how this stack system would ever be able to handle multi-core and persistent data structures. Things that I have come to use in other niche languages. Also that some projects/libraries are one-man shows/bus factor 1, and the maintainers have stopped developing them. They are basically stale and made by people, which significantly more understanding than any beginner will have for a long time.

I guess to really learn it, one has to read one of the often recommended books and have a lot of patience, until one gets to any parts, where one learns simple things like reading a file line by line.

alexisread 3 days ago | parent | next [-]

You should be able to dive in quickly using the very nice forthkit, which finishes with a working shell / REPL:

https://github.com/tehologist/forthkit

It is an implementation of eforth, a portable forth:

http://www.exemark.com/FORTH/eForthOverviewv5.pdf

kragen 3 days ago | parent | prev | next [-]

I think mostly learning Forth is like learning any other programming language (or, better said, programming environment): you learn by doing it. Books can be a useful complement to practice, but practice is how you learn to do things. You can't learn to do things by reading.

As for string handling, in my limited experience, string handling in Forth is a lot like string handling in C; you have to allocate buffers and copy characters between them. memcpy is called move, and memset is called fill. You can use the pad if you want, but you can just as well create inbuf 128 allot and use inbuf. There are two big differences:

1. Forth doesn't have NUL-terminated strings like C does, because it's just as easy to return a pointer and a length from a subroutine as it would be to return just a pointer. This is generally a big win, preventing a lot of subtle and dangerous bugs. (Forth is generally more error-prone than C, but this is an exception.)

2. Forth unfortunately does have something called a "counted string", where the string length is stored in the byte before the string data. You can create them with C" (https://forth-standard.org/standard/core/Cq), and Forth beginners often wonder whether to use counted strings. The answer is no: you should never use counted strings, and they should not have been included in the standard. Use normal strings, created with S" (https://forth-standard.org/standard/core/Sq), unless you are calling word or find. https://forth-standard.org/standard/rationale#rat:cstring goes into some of the history of this.

If you want to allocate strings on the heap, which is often the simplest way to handle strings, malloc is called allocate, realloc is called resize, and free is called free: https://forth-standard.org/standard/memory

With respect to multicore and persistent data structures (I assume you mean FP-persistent, as in, an old pointer to a data structure is a pointer to the old version of the data structure), stacks aren't really related to them. Each Forth thread has its own operand stack and its own return stack (and sometimes its own dictionary), so they don't really create interactions between different cores.

zelphirkalt 3 days ago | parent [-]

I think there is another problem for me: The last time I have done any manual memory management a la C, before using Forth was some >10y ago. And immediately the next question would pop up in my head: "What if that line is longer than 128 bytes? Is there no general function to read a whole line?" And I guess then I would reinvent the whole machinery to read a whole line, determining at which byte the newline appears. And then I would have doubts like: "Uh, but what if someone puts some unicode characters in there?". While actually all I wanted was to read a single file, to get working on an AoC puzzle.

So I think I lacked the manual memory management basics as well at that point, and any haphazardly implemented hack like "assume the longest line is at most 128 ASCII characters long" would not have made me happy with my code.

kragen 3 days ago | parent [-]

Well, to bake an apple pie from scratch, you must first create the universe.

In any programming language, to read an arbitrarily long line into memory, you need an arbitrarily large computer, so your software may need to pause to convert more Temu orders, continents, asteroids, or star systems into computronium. If you're not willing to go that far, you have basically two choices:

1. Process the line in a streaming fashion rather than holding all of it in memory at once.

2. Only handle lines up to some maximum length.

If you select option 2, the only remaining questions are:

2a. What is that maximum length?

2b. What happens if you hit it?

Maybe 128 bytes is not a limit you're happy with, but it's just as easy to use 1048576 or 1234567890. Your code may be easier to understand and easier to get right if you use a dynamically-allocated string type (I suggest studying stralloc from qmail 1.03), but don't fool yourself into thinking that that means there's no limit on input line length. Dismayingly often, the answer to 2b in that case is "Linux starts thrashing and becomes unusably slow until you reboot it."

(If your input is UTF-8, the line-reading function doesn't have to worry about whether the bytes represent Unicode characters or not, because byte 0x0a will never occur inside a non-ASCII character.)

zelphirkalt 3 days ago | parent [-]

The point is, I don't want to spend lots of time solving these essential problems, when I actually want to learn the language through solving puzzles. It seems, that Forth does not lend itself to be learned that way, since even very basic things are not provided and require in-depth knowledge of Forth and developing manual memory managed solutions to problems, that are solved in almost every programming language in their standard libraries. If I used Python it would literally be 2 lines of code, and with file.readlines() or so, I don't have to think about how long a line can be and then develop ad-hoc brittle half-solutions.

Perhaps readlines() has a limit somewhere too though. Just not aware of it and so far have not needed to deal with that kind of thing. But then again Forth and Python are 2 very different languages and act on another level of abstraction in many cases, so maybe that comparison is not fair.

kragen 3 days ago | parent [-]

Forth was sort of designed by and for people who did want to solve these essential problems anew for each application. Chuck Moore claimed many times that a tailored ("ad hoc") solution that solves only the part of the problem you need to solve for a particular application would be 10× smaller and simpler than a generalized solution that has to balance the needs of all possible applications. He considered it preferable to not have a lot of library code in your application to solve problems you don't actually have. Maybe your ad-hoc solution is brittle, but it's brittle precisely in ways you know about, not in ways you don't.

But you don't have to use Forth that way just because Chuck did. You can totally use a generalized string library in Forth. I don't know which one to recommend, but http://turboforth.net/resources/string_library.html seems to be one possibility.

You can be sure that Python's file.readlines()† will have trouble if you try to read a line that is much longer than your RAM size.

You can get pretty far with just built-in standard functionality, though:

    Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
    Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
    Type `bye' to exit
    128 constant len  create buf len allot  ok
    : greet ." Name? "  buf len accept  ." Hello, " buf swap type ." !" ;  ok
    greet Name? Zelphir Hello, Zelphir! ok
And, as you said, GForth comes with a heap-allocated string library https://gforth.org/manual/String-words.html#String-words which you can use if you first say

    include string.fs
______

† ever since Python 2.0, I'd recommend using list(file) instead of file.readlines(), or just iterate over the file directly, like [line.strip() for line in file if line.startswith('zel')]

drivers99 2 days ago | parent | prev [-]

One year (2022) I could see, on an early problem (day 2), that I could define a handful of words in forth such that I could execute the (modified) input file itself as code (there were only 9 possible combinations since it was rock-scissor-paper, although I did have to alter the input by removing the spaces first, like "A X" was changed to "AX") to get the answer. I defined words that matches the 9 inputs and had those do whatever the problem said to do. https://adventofcode.com/2022/day/2

kragen 2 days ago | parent [-]

That's a great idea!