Remix.run Logo
Quarrelsome 5 hours ago

ok, so what problems do they help me solve that I can't already solve? Is it just that we can make code more concise or am I missing a trick somewhere?

rspeele an hour ago | parent | next [-]

Object-oriented polymorphism (interfaces, inheritance) is for when you have a fixed set of methods to implement but an unbounded set of types that may want to implement them.

As a consumer, you cannot change the methods, but you can add a subtype. When you subtype an abstract class or an interface, the compiler does not let you proceed until you have implemented all the methods.

Discriminated unions are for the exact opposite situation, when you have a fixed set of subtypes, but unbounded set of methods to implement on them. As a consumer, you cannot add a subtype, but you can add a new method. When you write a new method, the compiler does not let you proceed until you have handled all the subtypes.

Good languages should support both!

The best example is abstract syntax trees, the data types that represent expressions and statements in a programming language. "Expression" breaks down into cases: integer literal, string literal, variable name, binary operations like add(expr1,expr2), unary operations like negate(expr), function call(functionName, exprs), etc.

Clearly all of these expression subtypes should belong to a base type `Expression`. But what methods do you put on `Expression`? If you're writing a compiler, you have to walk this syntax tree many times for very different purposes. First you might do a pass on it where you "de-sugar" syntax, then another pass where you type-check it and resolve names in the code, then another pass where you generate assembly code from it. Perhaps your compiler even supports different backends so you have a code-gen path for x86, another for ARM, etc. You'll likely want a pretty-printer so you can do automatic reformatting, maybe you want linting support, etc.

If you look at all those concerns and say that each subtype of `Expression` must implement methods for each one, then you end up with untenable code organization. Every expression subtype now has a huge stack of methods to implement all in one file, dealing with stuff from totally different layers of the compiler. It's a mess.

It's much cleaner to have the "shape" of the expression defined in one place without all that clutter, and then in each of those areas of the code you can write methods that consume expressions however they need, so each of those separate concerns lives in its own silo.

------------------------------------------------

Some real code (but it's F# not C#) to look at.

AST for my SQL dialect: https://github.com/fsprojects/Rezoom.SQL/blob/master/src/Rez... Typechecker code: https://github.com/fsprojects/Rezoom.SQL/blob/master/src/Rez... Backend code that outputs MS TSQL from it: https://github.com/fsprojects/Rezoom.SQL/blob/master/src/Rez...

------------------------------------------------

If you're an old hand at OO you may be familiar with its actual answer to this problem, the "Visitor" pattern. See System.Linq.Expressions.ExpressionVisitor. However, once you've used a language with good union and pattern matching support, this feels like a clunky hack. Basically the mirror image of a language without real object orientation imitating it by passing around closures and structs-of-closures.

------------------------------------------------

It doesn't just have to be compiler stuff. A business app data model can use this too. Instead of having:

    public class DbUser
    {
        public EmailAddress Email { get; set; }
        public PasswordHash? Password { get; set; } // null if they use SSO
        public SamlEntityProviderId? SamlProvider { get; set; } // null if they use password auth
    }

You could have:

    type UserAuth =
        | PasswordAuth of PasswordHash
        | SSOAuth of SamlIdentityProviderId
The implementation details of those different auth methods, the UI for them, etc. don't have to be part of the data model. We do have to model what "shapes" of data are acceptable, but "doing stuff" based on those shapes is another layer's problem.
bigstrat2003 5 hours ago | parent | prev | next [-]

I think "what problems do they solve that I can't already solve" is the wrong way to look at it. After all, ultimately most language features are just syntactic sugar - you could implement for loops with goto, but it would be a lot less pleasant. I think that unions aren't strictly necessary, but they are a very pleasant to use way of differentiating between different, but related, types of value.

Quarrelsome 5 hours ago | parent [-]

Ok. I'm just trying to understand what code I'm replacing with them. Like I wanna see the before and after in order to gain the same level of excitment as other people seem to have for them.

Often the explanations just seem rather abstract which makes it harder to appreciate the win, versus the hideous sort of code that might appear when they're misused.

airstrike 4 hours ago | parent | next [-]

They are so fundamental to the way I write code I can't imagine ever using a language that does not support them.

"Make invalid states unrepresentable."

arwhatever an hour ago | parent [-]

I might suggest that anyone who wants to make it concrete to go through the article

https://fsharpforfunandprofit.com/posts/designing-with-types...

while visiting https://dotnetfiddle.net and typing the code samples in, experimenting with what manner of changes and additions to the code cause the compilation to fail, and considering how you would leverage those abilities in your everyday development work.

I think this would be even more powerful if you then come back and re-read some of the pro-Union comments in this very thread.

speed_spread 4 hours ago | parent | prev [-]

The value is realized when you have both discriminated union types _and_ language pattern matching (not regex). Then it's not just a way to structure data but a way to think about how to process it.

JCTheDenthog 4 hours ago | parent | prev [-]

Simple example that I use often when writing API clients:

In current C# I usually do something like

public class ApiResponse<T> { public T? Response { get; set; } public bool IsSuccessful { get; set; } public ErrorResponse Error { get; set; } }

This means I have to check that IsSuccessful is true (and/or that Response is not null). But more importantly, it means my imbecile coworkers who never read my documentation need to do so as well otherwise they're going to have a null reference exception in prod because they never actually test their garbage before pushing it to prod. And I get pulled into a 4 hour meeting to debug and solve the issue as a result.

With union types, I can return a union of the types T and ErrorResponse and save myself massive headaches.

Quarrelsome 4 hours ago | parent [-]

I think I get it but I'm not really sure what I'm gaining over exception types. With an intelligent use of exceptions I can easily specify the happy path and all the error paths separately which seems really nice to me, because usually the behaviour between those two outcomes is rather different.

Akronymus 3 hours ago | parent | next [-]

> I think I get it but I'm not really sure what I'm gaining over exception types. With an intelligent use of exceptions I can easily specify the happy path and all the error paths separately which seems really nice to me, ...

Until your coworker comes along and accidentally refactors the code to skip the exception catching and it suddenly blows up prod.

With tagged unions you can't accidentally dereference to the underlying value without checking if it's actually proper data first.

JCTheDenthog 4 hours ago | parent | prev [-]

Exceptions are significantly slower than normal control flow in C# (about 10,000 times slower). It's also pretty non-idiomatic in both C# and most other languages I've worked in to use exceptions instead of a switch statement or similar to handle an HTTP error code. Also there can be multiple possible non-error responses from an endpoint you need to differentiate between, and exceptions would make zero sense in that case.