I think this might depend on the language you're writing in.

Historically, at least, it's pretty verbose to define a data type in Python compared to languages that are more designed for writing compilers. Consider these definitions from my prototype Bicicleta interpreter, which is written in ML, specifically OCaml:

    type methods = NoDefs
                               (* name, body, is_positional ... *)
                   | Definition of string * bicexpr * bool * methods
    and bicexpr = Name of string
                  | Call of bicexpr * string
                  | Literal of string option * methods
                  | Derivation of bicexpr * string option * methods
                  | StringConstant of string
                  | Integer of int
                  | Float of float
                  | NativeMethod of (lookup -> bicobj)

Those ten lines of code would be ten classes in Python with an average of 1.6 attributes each. Using dataclasses or attrs, that would be 36 lines of code, and then (if you're doing it the OO way) every function that I defined on one of these OCaml types becomes a method implemented in each class implementing a particular protocol, with a copy of its argument signature in every class. (If you used namedtuple instead, it's no less code, but you write it on less lines.) So, for example, this function on bicexprs

    let rec freevars = function
        Name n -> stringset [n]
      | Integer _ | StringConstant _ | Float _ -> stringset ["prog"]
      | NativeMethod _ -> stringset []
      | Literal (Some selfname, methods) -> 
            StringSet.diff (freevars_methods methods) (stringset [selfname])
      | Literal (None, methods) -> freevars_methods methods
      | Derivation(object_, self, methods) ->
            StringSet.union (freevars object_) (freevars (Literal(self, methods)))
      | Call(object_, _) -> freevars object_

becomes six to eight method definitions in the different classes. (You can cut it down to six if you define an abstract base class for the constant classes.) And Literal.freevars needs an if-then-else. So that's another 20 lines of code.

Python does support pattern-matching now, so functions like this might not be any more verbose than the ML version if you program them the same way instead of in the OO fashion. I haven't tried using Python pattern-matching, so I don't really know.

In general, though, Python is more verbose than ML-family languages for this kind of thing by a factor of about 2–4, and that's before you count the test code you need in Python to get the kind of confidence in correctness that ML's type-checking gives you with no extra code. To my knowledge, Mypy doesn't do the kinds of pattern-matching-exhaustiveness checks that ML compilers do.

I've sometimes "cheated" by trying to write code like this in Python using regular tuples rather than named tuples. You can definitely make it work, but it's a real pain to debug.

Quoting Andy Chu from https://andychu.net/projects/:

> Python is not the right language for [implementing] languages. I will use OCaml for subsequent projects like this.

Python does have GC and dynamic dispatch, though, and those count for a lot.

▲ kragen a day ago | parent | next [-]

Eli Bendersky's blogpost on the Expression Problem: https://news.ycombinator.com/item?id=45155877

▲

sevensor a day ago | parent [-]

I thought this was a good read, and it’s clearly a problem in an OO language, but I struggled to see how it’s an actual problem in a functional language. Maybe I need to go read that paper he referenced by the Racket guys.

▲

kragen a day ago | parent | next [-]

In my example above, to add a new variant type to bicexpr, you need to edit roughly every function that pattern-matches on bicexprs, which is a good fraction of the Bicicleta interpreter codebase.

▲

sevensor a day ago | parent [-]

Ah, I see. I’ve written that code where you have to add one case to a whole bunch of pattern matches. It’s not terrible; I rather like that everything that does the same thing is in the same place. But I have wondered if there was a better way. Clearly I need to go read the article again.

▲

kragen 19 hours ago | parent [-]

Yeah, I share your assessment that it's not unmanageable.

But it does add friction. I suspect that there are only two real solutions:

- program source code that isn't a linear document but instead can be viewed in multiple different ways: by operation or by operand type, for example. Then you would use the view that best suits the modification you were doing at the time. Smalltalk, for example, does support browsing either all the implementors of a method selector or all the methods of a class. (But in both cases you normally only see one method at a time.)

- Alternating levels of hierarchy. Maybe if you have 63 operations on 63 types, you could arrange it so that you can add a 64th operation or a 64th type by editing only 8 places instead of 63.

▲

sevensor 19 hours ago | parent [-]

> Smalltalk, for example, does support browsing either all the implementors of a method selector or all the methods of a class

That’s pretty neat. I tried using the object browser a million years ago and it confused me, but I didn’t know anything about anything at the time. Maybe Smalltalk deserves another look.

	▲	igouy 2 hours ago \| parent [-]
		Smalltalk IDEs are usually multi-window, so it isn't either/or, it's by method selector and by class instance methods and by …

▲

naasking a day ago | parent | prev [-]

The example in the article was very clear I thought: try to extend an interpreter with a new primitive (function call in the article) without changing the original interpreter's source code. I'm an FP language, you can't extend the core AST without changing the source code for eval.

▲ vidarh 5 days ago | parent | prev | next [-]

GC doesn't matter much for a simple compiler, as you either don't need to allocate much (single pass, Wirth-style compilers that generate code inline) or most of what you allocate becomes garbage all at once at the end (AST).

In my half-finished Ruby compiler prototype, even before I added type tagging, and so allocated every integer on the heap, I just didn't add a GC for a long time because it was fine to just leak, because the compiler isn't generally long running.

	▲	kragen 5 days ago \| parent [-]
		Yeah, I never got around to writing a GC for Ur-Scheme either—you could argue that it's also "half-finished" but it does successfully compile itself correctly. It's not so much the GC that matters as it is the ability to allocate on the heap without much ceremony. I should have thought of that before I posted, and I appreciate the correction.

▲ MarsIronPI 5 days ago | parent | prev [-]

My impression is that in general the traditional approach of methods as members of a class is more verbose and less extensible than the ML/Lisp generic function approach. I know I certainly prefer generic functions when I have to design polymorphic interfaces.

▲ kragen 5 days ago | parent [-]

"Generic functions" is the Common Lisp name for writing a separate method for each class, the same as in Python except that you also have to define the generic function itself before you can define the methods. I'm not sure if that's what you meant; the ML approach is quite different.

This is Common Lisp, which I am not an expert in:

    ;;; Stupid CLOS example.

    (defgeneric x (point))                  ; make X a method
    (defgeneric y (point))                  ; make Y a method

    (defclass rect-point ()
      ((x :accessor x :initarg :x)
       (y :accessor y :initarg :y)))

    (defclass polar-point ()
      ((radius :accessor radius :initarg :radius)
       (angle  :accessor angle  :initarg :angle)))

    (defmethod x ((p polar-point))
      (* (radius p) (cos (angle p))))

    (defmethod y ((p polar-point))
      (* (radius p) (sin (angle p))))

    (defgeneric move-by (point Δx Δy))

    (defmethod move-by ((p rect-point) Δx Δy)
      (incf (x p) Δx)
      (incf (y p) Δy))

    (defmethod move-by ((p polar-point) Δx Δy)
      (let ((x (+ (x p) Δx)) (y (+ (y p) Δy)))
        (setf (radius p) (sqrt (+ (* x x) (* y y)))
              (angle p) (atan y x))))

    (defmethod print-object ((p polar-point) stream) ; standard method print-object
      (format stream "@~a<~a" (radius p) (angle p)))

    (defvar p1 (make-instance 'rect-point :x 3 :y 4))
    (defvar p2 (make-instance 'polar-point :radius 1 :angle 1.047))

    ;; prints (3, 4) → (4, 5)
    (format t "(~a, ~a) → " (x p1) (y p1))
    (move-by p1 1 1)
    (format t "(~a, ~a)~%" (x p1) (y p1))

    ;; prints @1<1.047 (0.500171, 0.8659266) → @1.9318848<0.7853087 (1.366171, 1.3659266)
    (format t "~a (~a, ~a) → " p2 (x p2) (y p2))
    (move-by p2 .866 .5)
    (format t "~a (~a, ~a)~%" p2 (x p2) (y p2))

Here's a similar program in OCaml, which I am also not an expert in, using pattern-matching functions instead of methods, and avoiding mutation:

    (* Stupid OCaml example. *)

    type point = Rect of float * float | Polar of float * float

    let x = function
      | Rect(x, y) -> x
      | Polar(r, theta) -> r *. Float.cos theta

    let y = function
      | Rect(x, y) -> y
      | Polar(r, theta) -> r *. Float.sin theta

    let moved_by = fun dx dy ->
      function
      | Rect(x, y) -> Rect(x +. dx, y +. dy)
      | p ->
         let x = dx +. x p and y = dy +. y p in
         Polar(Float.sqrt(x *. x +. y *. y),
               Float.atan2 y x)

    let string_of_point = function
      | Rect(x, y) -> Printf.sprintf "Rect(%f, %f)" x y
      | Polar(r, theta) -> Printf.sprintf "Polar(%f, %f)" r theta

    ;;

    print_endline(string_of_point(moved_by 1. 2. (Rect(3., 4.)))) ;
    print_endline(string_of_point(moved_by 0.866 0.5 (Polar(1., 1.047))))

▲ MarsIronPI 4 days ago | parent [-]

> I'm not sure if that's what you meant; the ML approach is quite different. There is a difference in approach because in Common Lisp each method is a separate function definition (though macros can alleviate this), but my point is that both CL and ML are more function-oriented, if you will; i.e. "methods" (or whatever you want to call ML pattern-matched functions) aren't defined in a class body and are just ordinary functions.

I think this more function-focused approach is more elegant, but also more extensible and possibly less verbose when dealing with multiple classes that share the same interface.

> the same as in Python except that you also have to define the generic function itself before you can define the methods. As a side note, though it's not terribly important, the "defgeneric" can be omitted if you don't care to specify docstring or any special behavior.

▲ kragen 4 days ago | parent [-]

Oh, thanks! I didn't know that about defgeneric.

How would you classify Ruby? You can reopen a class and add more methods to it at any time.

▲ MarsIronPI 2 days ago | parent [-]

Well, reopening the class is the idiomatic way to define a method after the class definition, but if I wanted to write Ruby the ML/Common Lisp way, I would use "Class.define_method" like so:

  String.define_method :yell do
    puts self.upcase
  end

  Numeric.define_method :yell do
    self.to_s.yell
  end

What I like most about Ruby is how close it gets to Lisp's flexibility of semantics (i.e. macros) without actually having macros. (Common Lisp is still my favorite language for larger projects though.)

	▲	kragen a day ago \| parent [-]
		Hmm. I'll have to think about that. I still feel like ML is very much the odd one out, here, because the individual pattern-matching clauses aren't values and can't be added later except by editing the "generic" function (and usually the definition of its argument type).