I find it surprising that a single-pass compiler is easier to implement than a traditional lexer->parser->AST->emitter. (I'm not a compiler expert, though.) I'd have expected that generating an AST would be at least as simple, if not simpler. Plus by generating an AST, doing some simple optimization is a lot easier: one can pattern-match parts of the AST and replace them with more efficient equivalents. Maybe I'm overthinking this, though. I tend to like extensible program designs, even when they don't necessarily make sense for the scale of the program…

Still a really cool article and an impressive project, though. I especially like the StringPool technique; I'll have to keep it in mind if I ever write a compiler!

▲ marssaxman 5 days ago | parent | next [-]

A single-pass compiler is easier to implement in part because you're not going to do any of that optimization. You're writing a single-pass compiler either because you're banging out a quick sketch of an idea, and you don't care about production use, or because you've time-traveled back to the '70s or the '80s, where processors were so painfully slow and memory so eye-wateringly expensive that you might not even be able to read the entire source file into RAM at once, much less convert it all into some intermediate representation before starting to write out the machine code.

▲

dlcarrier 4 days ago | parent [-]

I used to write software on my HP calculator, when I was bored in classes, before smartphones existed. It used a tokenized language called RPL, and the editor would read and parse just enough tokens to fill up the screen.

I miss how fast it was, compared to modern computers. VS Code is much laggier in comparison.

▲

ConceptJunkie 4 days ago | parent | next [-]

I used to dream about owning a programmable calculator in the late 70s/early 80s, but by 1980 we had Apple ][s in my high school and in 1982, I was able to use the PCs in college, first programming in BASIC and later Turbo Pascal, which was the best PC development tool in the 1980s. Most of my classes were in Pascal. My first job was in C++ and I was doing C++ by 1993 and still am.

Now I write Python for fun.

	▲	dlcarrier 4 days ago \| parent [-]
		Have you tried Lazarus (https://en.wikipedia.org/wiki/Lazarus_%28software%29, https://www.lazarus-ide.org/)? When doing something for the fun of it, I would much rather work with Pascal than Python. Both languages were developed to be quick and dirty, but I feel like Pascal is more on the quick side, and Python is more on the dirty side.

▲

fuzztester 4 days ago | parent | prev [-]

Which model of HP calculator?

And what kinds of programs could you write with it?

▲

dlcarrier 4 days ago | parent [-]

I had a 48S then a 49g+. I played around with a few games and a Magic Eye style (https://en.wikipedia.org/wiki/Random_dot_stereogram) 3D image generator, as well as programs to do actual math in my math classes. The math teachers teachers at my high school had a policy that, on exams, students could run any program they wrote, so I wrote programs to do the math and "show my work". Solving equations on paper was half of the grade, and getting the correct answer was the other half, but my handwriting is really bad, so I'd always get the wrong answer if I did it by hand.

HP's calculators are pretty capable, with built-in commands for image processing, so RPL programs could rival the capabilities of assembly language programs on TI-8x calculators.

▲

fuzztester 3 days ago | parent [-]

Thanks for the reply.

I had asked because I was thinking of getting some small device like a PDA or programmable calculator, to use to write small programs for fun and possibly actual use, when I'm not with my laptop.

I am not sure if any PDAs or such calculators are still made these days. I have googled some, but so far did not find any, IIRC. So I may have to buy a used calculator or PDA, if I can find a model I like.

I used to have 2 different Palm PDA models earlier, first a V and then a Zire, but I never did any programming with them, although I do remember trying to use Pippy, a Python 2.x version ported to the Palm. But it used to crash all the time, so I could not really do any programming on it, unfortunately.

▲

dlcarrier 3 days ago | parent [-]

HP 's graphing calculators are the best, but they can be very expensive.

Every major calculator manufacturer makes modern back-lit full-color graphing calculators, like HP's Prime, TI's Nspire, and Casio's ClassWiz, but they don't have the charm of reflective LCDs.

The best bang for your buck is probably a Casio FX 7400, 9750 or 9860 GIII series, which have Python interpreters. If you want something more fun, albeit only programmable in BASIC, Casio's CFX have an oddball color display that doesn't use subpixels. (See this video: https://www.youtube.com/watch?v=jLew3Dd3IBA)

	▲	2 days ago \| parent \| next [-]
		[deleted]
	▲	fuzztester 2 days ago \| parent \| prev [-]
		Thanks again. I will check out those options and links.

▲ kragen 5 days ago | parent | prev | next [-]

I think this might depend on the language you're writing in.

Historically, at least, it's pretty verbose to define a data type in Python compared to languages that are more designed for writing compilers. Consider these definitions from my prototype Bicicleta interpreter, which is written in ML, specifically OCaml:

    type methods = NoDefs
                               (* name, body, is_positional ... *)
                   | Definition of string * bicexpr * bool * methods
    and bicexpr = Name of string
                  | Call of bicexpr * string
                  | Literal of string option * methods
                  | Derivation of bicexpr * string option * methods
                  | StringConstant of string
                  | Integer of int
                  | Float of float
                  | NativeMethod of (lookup -> bicobj)

Those ten lines of code would be ten classes in Python with an average of 1.6 attributes each. Using dataclasses or attrs, that would be 36 lines of code, and then (if you're doing it the OO way) every function that I defined on one of these OCaml types becomes a method implemented in each class implementing a particular protocol, with a copy of its argument signature in every class. (If you used namedtuple instead, it's no less code, but you write it on less lines.) So, for example, this function on bicexprs

    let rec freevars = function
        Name n -> stringset [n]
      | Integer _ | StringConstant _ | Float _ -> stringset ["prog"]
      | NativeMethod _ -> stringset []
      | Literal (Some selfname, methods) -> 
            StringSet.diff (freevars_methods methods) (stringset [selfname])
      | Literal (None, methods) -> freevars_methods methods
      | Derivation(object_, self, methods) ->
            StringSet.union (freevars object_) (freevars (Literal(self, methods)))
      | Call(object_, _) -> freevars object_

becomes six to eight method definitions in the different classes. (You can cut it down to six if you define an abstract base class for the constant classes.) And Literal.freevars needs an if-then-else. So that's another 20 lines of code.

Python does support pattern-matching now, so functions like this might not be any more verbose than the ML version if you program them the same way instead of in the OO fashion. I haven't tried using Python pattern-matching, so I don't really know.

In general, though, Python is more verbose than ML-family languages for this kind of thing by a factor of about 2–4, and that's before you count the test code you need in Python to get the kind of confidence in correctness that ML's type-checking gives you with no extra code. To my knowledge, Mypy doesn't do the kinds of pattern-matching-exhaustiveness checks that ML compilers do.

I've sometimes "cheated" by trying to write code like this in Python using regular tuples rather than named tuples. You can definitely make it work, but it's a real pain to debug.

Quoting Andy Chu from https://andychu.net/projects/:

> Python is not the right language for [implementing] languages. I will use OCaml for subsequent projects like this.

Python does have GC and dynamic dispatch, though, and those count for a lot.

▲ kragen a day ago | parent | next [-]

Eli Bendersky's blogpost on the Expression Problem: https://news.ycombinator.com/item?id=45155877

▲

sevensor a day ago | parent [-]

I thought this was a good read, and it’s clearly a problem in an OO language, but I struggled to see how it’s an actual problem in a functional language. Maybe I need to go read that paper he referenced by the Racket guys.

▲

kragen a day ago | parent | next [-]

In my example above, to add a new variant type to bicexpr, you need to edit roughly every function that pattern-matches on bicexprs, which is a good fraction of the Bicicleta interpreter codebase.

▲

sevensor a day ago | parent [-]

Ah, I see. I’ve written that code where you have to add one case to a whole bunch of pattern matches. It’s not terrible; I rather like that everything that does the same thing is in the same place. But I have wondered if there was a better way. Clearly I need to go read the article again.

▲

kragen 19 hours ago | parent [-]

Yeah, I share your assessment that it's not unmanageable.

But it does add friction. I suspect that there are only two real solutions:

- program source code that isn't a linear document but instead can be viewed in multiple different ways: by operation or by operand type, for example. Then you would use the view that best suits the modification you were doing at the time. Smalltalk, for example, does support browsing either all the implementors of a method selector or all the methods of a class. (But in both cases you normally only see one method at a time.)

- Alternating levels of hierarchy. Maybe if you have 63 operations on 63 types, you could arrange it so that you can add a 64th operation or a 64th type by editing only 8 places instead of 63.

▲

sevensor 19 hours ago | parent [-]

> Smalltalk, for example, does support browsing either all the implementors of a method selector or all the methods of a class

That’s pretty neat. I tried using the object browser a million years ago and it confused me, but I didn’t know anything about anything at the time. Maybe Smalltalk deserves another look.

	▲	igouy 2 hours ago \| parent [-]
		Smalltalk IDEs are usually multi-window, so it isn't either/or, it's by method selector and by class instance methods and by …

▲

naasking a day ago | parent | prev [-]

The example in the article was very clear I thought: try to extend an interpreter with a new primitive (function call in the article) without changing the original interpreter's source code. I'm an FP language, you can't extend the core AST without changing the source code for eval.

▲ vidarh 5 days ago | parent | prev | next [-]

GC doesn't matter much for a simple compiler, as you either don't need to allocate much (single pass, Wirth-style compilers that generate code inline) or most of what you allocate becomes garbage all at once at the end (AST).

In my half-finished Ruby compiler prototype, even before I added type tagging, and so allocated every integer on the heap, I just didn't add a GC for a long time because it was fine to just leak, because the compiler isn't generally long running.

	▲	kragen 5 days ago \| parent [-]
		Yeah, I never got around to writing a GC for Ur-Scheme either—you could argue that it's also "half-finished" but it does successfully compile itself correctly. It's not so much the GC that matters as it is the ability to allocate on the heap without much ceremony. I should have thought of that before I posted, and I appreciate the correction.

▲ MarsIronPI 5 days ago | parent | prev [-]

My impression is that in general the traditional approach of methods as members of a class is more verbose and less extensible than the ML/Lisp generic function approach. I know I certainly prefer generic functions when I have to design polymorphic interfaces.

▲ kragen 5 days ago | parent [-]

"Generic functions" is the Common Lisp name for writing a separate method for each class, the same as in Python except that you also have to define the generic function itself before you can define the methods. I'm not sure if that's what you meant; the ML approach is quite different.

This is Common Lisp, which I am not an expert in:

    ;;; Stupid CLOS example.

    (defgeneric x (point))                  ; make X a method
    (defgeneric y (point))                  ; make Y a method

    (defclass rect-point ()
      ((x :accessor x :initarg :x)
       (y :accessor y :initarg :y)))

    (defclass polar-point ()
      ((radius :accessor radius :initarg :radius)
       (angle  :accessor angle  :initarg :angle)))

    (defmethod x ((p polar-point))
      (* (radius p) (cos (angle p))))

    (defmethod y ((p polar-point))
      (* (radius p) (sin (angle p))))

    (defgeneric move-by (point Δx Δy))

    (defmethod move-by ((p rect-point) Δx Δy)
      (incf (x p) Δx)
      (incf (y p) Δy))

    (defmethod move-by ((p polar-point) Δx Δy)
      (let ((x (+ (x p) Δx)) (y (+ (y p) Δy)))
        (setf (radius p) (sqrt (+ (* x x) (* y y)))
              (angle p) (atan y x))))

    (defmethod print-object ((p polar-point) stream) ; standard method print-object
      (format stream "@~a<~a" (radius p) (angle p)))

    (defvar p1 (make-instance 'rect-point :x 3 :y 4))
    (defvar p2 (make-instance 'polar-point :radius 1 :angle 1.047))

    ;; prints (3, 4) → (4, 5)
    (format t "(~a, ~a) → " (x p1) (y p1))
    (move-by p1 1 1)
    (format t "(~a, ~a)~%" (x p1) (y p1))

    ;; prints @1<1.047 (0.500171, 0.8659266) → @1.9318848<0.7853087 (1.366171, 1.3659266)
    (format t "~a (~a, ~a) → " p2 (x p2) (y p2))
    (move-by p2 .866 .5)
    (format t "~a (~a, ~a)~%" p2 (x p2) (y p2))

Here's a similar program in OCaml, which I am also not an expert in, using pattern-matching functions instead of methods, and avoiding mutation:

    (* Stupid OCaml example. *)

    type point = Rect of float * float | Polar of float * float

    let x = function
      | Rect(x, y) -> x
      | Polar(r, theta) -> r *. Float.cos theta

    let y = function
      | Rect(x, y) -> y
      | Polar(r, theta) -> r *. Float.sin theta

    let moved_by = fun dx dy ->
      function
      | Rect(x, y) -> Rect(x +. dx, y +. dy)
      | p ->
         let x = dx +. x p and y = dy +. y p in
         Polar(Float.sqrt(x *. x +. y *. y),
               Float.atan2 y x)

    let string_of_point = function
      | Rect(x, y) -> Printf.sprintf "Rect(%f, %f)" x y
      | Polar(r, theta) -> Printf.sprintf "Polar(%f, %f)" r theta

    ;;

    print_endline(string_of_point(moved_by 1. 2. (Rect(3., 4.)))) ;
    print_endline(string_of_point(moved_by 0.866 0.5 (Polar(1., 1.047))))

▲ MarsIronPI 4 days ago | parent [-]

> I'm not sure if that's what you meant; the ML approach is quite different. There is a difference in approach because in Common Lisp each method is a separate function definition (though macros can alleviate this), but my point is that both CL and ML are more function-oriented, if you will; i.e. "methods" (or whatever you want to call ML pattern-matched functions) aren't defined in a class body and are just ordinary functions.

I think this more function-focused approach is more elegant, but also more extensible and possibly less verbose when dealing with multiple classes that share the same interface.

> the same as in Python except that you also have to define the generic function itself before you can define the methods. As a side note, though it's not terribly important, the "defgeneric" can be omitted if you don't care to specify docstring or any special behavior.

▲ kragen 4 days ago | parent [-]

Oh, thanks! I didn't know that about defgeneric.

How would you classify Ruby? You can reopen a class and add more methods to it at any time.

▲ MarsIronPI 2 days ago | parent [-]

Well, reopening the class is the idiomatic way to define a method after the class definition, but if I wanted to write Ruby the ML/Common Lisp way, I would use "Class.define_method" like so:

  String.define_method :yell do
    puts self.upcase
  end

  Numeric.define_method :yell do
    self.to_s.yell
  end

What I like most about Ruby is how close it gets to Lisp's flexibility of semantics (i.e. macros) without actually having macros. (Common Lisp is still my favorite language for larger projects though.)

	▲	kragen a day ago \| parent [-]
		Hmm. I'll have to think about that. I still feel like ML is very much the odd one out, here, because the individual pattern-matching clauses aren't values and can't be added later except by editing the "generic" function (and usually the definition of its argument type).

▲ comonoid 4 days ago | parent | prev | next [-]

C was designed to be compiled with a single-pass compiler.

▲

Joker_vD 4 days ago | parent | next [-]

It wasn't; the original C compiler had two passes, and built expression trees in the first pass which the second pass would turn into assembly (and the original as on UNIX also had two passes).

▲

whobre 4 days ago | parent | prev [-]

C was not designed at all. It evolved from B, by adding types to it.

	▲	fuzztester 4 days ago \| parent [-]
		Yes. And B evolved from BCPL: https://en.m.wikipedia.org/wiki/BCPL I had read the book about BCPL by Martin Richards, creator of the language. it was quite interesting. IIRC, after describing the language, he gave some examples of small system software routines or utilities, written in it. https://en.m.wikipedia.org/wiki/Martin_Richards_(computer_sc...

▲ arjvik 5 days ago | parent | prev | next [-]

Not sure if fewer LoC necessarily implies easier!

▲ markus_zhang 5 days ago | parent | prev [-]

I think it depends on the language. I heard Turbo Pascal was pretty fast because 1) Pascal’s language features, 2) no optimization in TP 1.0 at least.

▲

MarsIronPI 5 days ago | parent [-]

Fast to run or fast to compile?

▲

markus_zhang 5 days ago | parent | next [-]

Fast to compile. It was very fast comparing to the others then. Since computers were slow (8086/80286) compiling speed actually mattered.

https://www.pcengines.ch/tp3.htm

▲

MarsIronPI 5 days ago | parent [-]

As an ex-Gentoo user I can confirm that compiling speed still matters on certain machines.

▲

markus_zhang 5 days ago | parent [-]

I have never worked as a system programmer so I never had the need to compile something big. I guess it could be an issue with very large software like Oracle Database, the Linux kernel and other similar stuffs.

	▲	rini17 a day ago \| parent [-]
		The kernel is quite okay. On the other hand, qtwebengine..

▲

renox 4 days ago | parent | prev [-]

Well given that often the alternative was BASIC which was interpreted, it also felt fast to run..