Remix.run Logo
ekidd 4 days ago

"Sum types" (aka "Rust enums") are something that I've really come to love for representing the idea "There are 3 possible cases here, here's the data we need in each case, and please remember that these cases are strictly mutually exclusive."

Making this super easy and short to express makes a lot of designs clearer.

There's also an interesting tradeoff here between sum types and abstract interfaces:

- Abstract interfaces: "We have an unknown number of cases/subclasses, but they each support this fixed set of methods." The set of implementations is open, but the set of operations is harder to update.

- Sum types (aka Rust enums): "We know all the cases, but there may be many different methods." The set of cases is harder to extend, but the set of methods manipulating them is completely open.

A good example of the latter pattern is actually LLVM! LLVMs' intermediate representation is a essentially a small assembly language with a fixed set of operations (and an escape hatch). But because the set of basic instructions is small and known, it's possible to write dozens and dozens of optimization passes that each manipulate that small set of instructions.

And a surprising number of programs work well with an LLVM-like structure.

So in a language like Rust, which supports both abstract interfaces (via traits) and sum types (via enums), the tradeoff is whether you think you have many different types of values that you'll manipulate in a limited number of ways, or a small number of types of values that you'll manipulate in many different ways.

(Oh, and the metaphor of "algebraic types" goes deeper than you might think; there are some super interesting data structures based on taking the "derivatives" of simple algebraic types.)

ahartmetz 3 days ago | parent | next [-]

I wonder why sum types are named like that. They are not the sum of their possible states in a way that I would recognize, more like an exclusive-or or one of out many. At some point I came up with an explanation, but I forgot it again. If you need to explain why something is simple, it's not simple.

duckfan77 3 days ago | parent [-]

The number of valid options is the sum of the valid option for each variant. This is also why product types are called product types.

If you have a struct (product type) containing an 8 bit integer and a boolean, the struct has 512 possible states. 256 for the integer, times 2 for each value of the boolean.

If you have a sum type where one variant is an 8 bit integer, and the other is a boolean, you have 258 possible states. 256 for when it's an integer, and 2 for when it's a boolean.

Sum types add the options of each variant, product types multiply the options for each item in it.

ulrikrasmussen 3 days ago | parent | prev | next [-]

The lack of sum types should really be considered a weird omission rather than a strange fancy language feature. I suspect that their omission from popular languages is mostly due to the historical belief that object-orientation was the way forward and that "everything is an object", leading many programmers to represent things that are decidedly not objects using abstractions designed for objects. An object is a stateful thing which exists over a period of time (sometimes unbounded) and whose identity is characterized by its observable behavior. Abstractions for objects consist of interfaces which expose only a slice of the observable behavior to consumers. On the other hand, a value is really just that; a value. It does not have state, it does not start or stop existing, and its identity is defined only by its attributes. You can use objects to model values (poorly), but you often end up doing a lot of extra work to stop your "fake value" objects from behaving like objects, e.g. by being careful to make all fields immutable and implementing proper deep equality. Abstractions for values are not naturally expressed using interfaces because interfaces force the implementation to be tied to the instance of an object, but since values do not have a lifetime this restriction is a huge disadvantage and often leads to clunky and inflexible abstractions. For example, consider the Comparable interface in Java which tells you that an object is modeling a value which can be compared to other values of the same type. It would be awfully nice if List<A> could implement this interface, but it cannot because doing so will mean that you can only create lists of things (of some type A) which have a total order defined on them.

However, if you consider programming to not only be about expressing what you can do to stateful objects but also expressing values and their operations, then algebraic data types and traits/modules/type classes become the natural basic vocabulary as opposed to classes and interfaces. When dealing with first-order algebraic data types, products (i.e. records) and coproducts (i.e. sum types) are unavoidable. A list is one of the simplest algebraic data types which use both products and coproducts:

data List a = Nil | Cons a (List a)

One trait of lists is that lists of totally ordered things are themselves totally ordered using lexicographic ordering, and this is in Haskell expressed as a type class instance

instance Ord a => Ord (List a) where ...

Crucially, the type class is removed from the definition of what a list is, which enables us to still talk about lists of e.g. functions which do not have an order on them. These lists do not have the Ord trait, but they are still lists.

Another important distinction between traits and interfaces is that some behavior of value types consist of simply identifying special values. For example, the Monoid type class in Haskell is

    typeclass Monoid a where
      mempty :: a
      mappend :: a -> a -> a
It is impossible to express Monoid as an interface because one of the features of a monoid is that you have a special element `mempty`, but consumers cannot get this value without first having another value of type `a` on their hands.

Many languages now have proper support for both objects and values in that they have better primitives for defining both products and coproducts, but I still think that most mainstream languages are missing proper primitives for defining traits.

epolanski 3 days ago | parent | next [-]

> It is impossible to express Monoid as an interface because one of the features of a monoid is that you have a special element `mempty`, but consumers cannot get this value without first having another value of type `a` on their hands.

Can't you trivially do so in TypeScript?

- define a Magma<A> interface `concat<A>(a:A, b: A) => A`,

- define a Semigroup<A>, same implementation of `Magma<A>` could even just point to it,

- define Monoid<A> as Semigroup<A> and `empty` property (albeit I prefer the term `unit`, empty is very misleading).

Now you can implement as many monoids you want for strings, functions, lists and use those instances with APIs that require a Monoid. E.g. you could have a `Foldable` interface that requires a `Monoid` one, so if you have a Foldable for some A you can define foldMap.

Not sure what the practical differences would be.

After all Haskell's monoids, same as in TypeScript, are contracts, not laws. There are no guarantees that monoid properties (left and right identity) hold for every element of type `a`.

ulrikrasmussen 3 days ago | parent [-]

Yes, you can do that, and you can do something similar in Java or Kotlin by `object : Monoid<Int> { override val mempty = 0; override fun mappend(x: Int, y: Int): Int = x + y }`. This is an encoding of traits as objects though, and is not as nice to work with as e.g. typeclasses and ML modules.

pdpi 3 days ago | parent | prev [-]

You can express Monoid as an interface if Monoid<T> is implemented by a standalone object instead of by T itself — it’s how Scala does it.

GhosT078 3 days ago | parent | prev | next [-]

This tradeoff sounds similar to the choice to use "tagged types" versus "variant records" in Ada. Ada has provided variant records since 1983 and tagged types since 1995 (and both with a very nice syntax).

wk_end 3 days ago | parent [-]

According to this [0] tutorial, variant records in Ada have the downside that you only get an error at runtime if you use them incorrectly, i.e. access data from one branch of the sum type when the actual value is on another. That's a pretty huge drawback.

https://learn.adacore.com/courses/intro-to-ada/chapters/more...

csb6 3 days ago | parent | next [-]

Ada doesn’t have pattern matching, so there is no way for it to enforce that only the valid fields are accessed except at runtime (beyond warnings if the discriminant is known at compile time). Most people would just use a case statement that checks the variant discriminant before accessing any fields.

I believe the SPARK subset can (through static analysis) ensure that only valid fields are accessed for the current discriminant.

GhosT078 3 days ago | parent | prev [-]

I'm not sure I understand your point. Is it about such an error being reported at run time versus compile time? The GNAT compiler will often provide compile time warnings that "Constraint_Error will be raised at run time" in many such scenarios. Other Ada type checks would make this scenario improbable (although it could be contrived).

phkahler 3 days ago | parent | prev | next [-]

Sum type sounds like the old Variant from VB.

We use something similar in Solvespace. Where one might have used polymorphism, there is one class with a type enum and a bunch of members that can be used for different things depending on the type. Definitely some disadvantages (what does valA hold in this type?) but also some advantages from the consistency.

WorldMaker 2 days ago | parent [-]

VB's Variant is a bit "assume I know what this is" dynamic typing, almost Python-like duck typing. It sounds like your solution is also more akin to duck typing.

Sum Types are still themselves Types. The Sum Type itself like `int | string` is a Type. That Type generally only provides the intersection of methods/abilities, which depending on the language might only be a `toString()` method and the `+` operator (such as if the language uses that operator for string concatenation and also supports implicit conversions between int and string). The only safe things to do are the methods both types in the Sum support, so that intersection is an important tool to a type system with good Sum Types support.

But the other part of it is that Sum Types are still made up of types with their own unique shapes. When you pattern match a Sum Type you "narrow" it to a specific type. You can narrow an `int | string` to an `int` and then subtract something from it or multiply something with it. You can narrow it to a `string` and do things only strings can do like `.split(',')`. Both types are still "preserved" as unique types inside the Sum Type. A good compiler tests your type narrowing in the same way that a good OOP compiler will typecheck most of your casts to higher or lower types in an inheritance tree.

Given those cast checks, it's often more the case that you want to implement Sum Types as inheritance trees with casts, because the compiler will be doing more of the safety work. You write a base-class, probably abstract, to represent the Sum with only the intersection methods/operator overloads, then leaf classes, probably sealed/final, for each possible interior type the Sum can be, and use type checks to narrow to the leaf classes or cast back up to the SumType as necessary.

That's actually a lot of work in most OOP languages to build. The thing about Sum Types in languages that support them is that a lot of that from pattern matching/narrowing, to computing the intersection of two types, is free/cheap/easy from those compilers. The compiler understands `int | string` directly, rather than needing to build by hand a class IntOrString with sub-types IntOrStringInt and IntOrStringString. (It start to get out of hand when you start adding more types to the sum. IntOrStringOrBoolIntOrString for narrowing from `int | string | bool` down to `int | string`, for instance, which in OOP maybe can't be the same type as IntOrString alone depending on how you want to narrow/expand your sum types.)

(Also, not to get too much into the weeds but building a class structure like that is also more like a specialization of Sum Types called Discriminated Unions because each possibility in the Sum Type has a special "name" that can be used to discriminate one from the other. A lot of people that want Sum Types really just want Discriminated Unions, but you can build Discriminated Unions easily inside a Sum Type system without native support for it, but you can't as easily build richer Sum Types with just Discriminated Unions. In Typescript you will see a lot of Discriminated Unions built with a `type` field of some kind, often a string but sometimes an enum or symbol, which would also resemble your solution, but again the key difference is all the types that are summed together may have entirely different shapes and Typescript supports the sum type intersection calculations automatically, so maybe the top level sum just has the `type` field itself and nothing else, but you can use that `type` field to narrow to very specific shapes.)

3 days ago | parent | prev [-]
[deleted]