Remix.run Logo
keybored 2 days ago

Where does this exponentional size requirement come from?

jandrese 2 days ago | parent | next [-]

Imagine you have a program where there are 4 options, W, X, Y, Z. Y and Z can't be set at the same time, and X can't be set unless W is set. If Y is set then X must be set as well.

How do you represent this in a way that makes it impossible, even through programmer error elsewhere in the program, to have the flags in an invalid state?

You can create en enum that looks like:

    enum program_state =
    (
      W_X_Y_NZ,
      W_NX_NY_Z,
      NW_NX_NY_Z,
      ... and so on
     );
Hackbraten 2 days ago | parent | next [-]

Maybe I'm spoiled by TypeScript but this is how I'd do it in a structural typing system:

    type ConstrainedByW =
    | { w: false, x: false }
    | { w: true, x: bool }
    
    type ConstrainedByY =
    | { x: bool, y: false, z: bool }
    | { x: true, y: true, z: false }
    
    type ProgramState = ConstrainedByW & ConstrainedByY
jandrese 2 days ago | parent [-]

Yes, but certainly you can see how even a toy example with just 4 booleans is already getting complicated?

Hackbraten 2 days ago | parent [-]

It's certainly complicated. But not WTF-level complicated.

Complexity is both subjective and relative. Allowing invalid states in your model is going to complicate things, too.

I wouldn't mind having this example in production. It will scale just fine with the number of booleans.

aiono 2 days ago | parent | prev | next [-]

  enum program_state {
    X_W,
    Y_X_W,
    Z,
    W,
    Z_W,
    Z_X_W,
  }
You only have these 6 options. And in this case you can easily use a SAT solver to generate the cases for you so that you don't have to write them by hand.
keybored 2 days ago | parent | prev [-]

And the Make Impossible States Unrepresentable crowd program like that?

WorldMaker 2 days ago | parent [-]

It is not too bad in languages with discriminated unions. It's also not hard to fake discriminated unions in languages without them, even if you will miss some of the niceties.

Rather than thinking of it as an enum, think of it as a list of contructors:

    class ProgramState {
        bool w, x, y, z;

        ProgramState(x, z) // implies y = true, w = true
        ProgramState(w, z) // cannot set x; implies y = false (cannot set y)
    }
Even if the class needs all four fields, internally to represent all the possible combinations of data, there's no constructors/setters to work with them independently. (Which is also related to why "Make Impossible States Unrepresentable" also generally implies immutable, no setters at all makes it much easier to make states impossible.)

In languages with discriminated unions you might even have some natural field sharing depending on how your "constructors" are written and the memory expansion isn't necessarily "exponential".

crdrost a day ago | parent [-]

Also to mention it, languages without discriminated unions often have generics and function types, which can be used to build discriminated unions with Church encodings:

    // idiomatic typescript
    type Optional<IfAbsent, IfPresent> = 
     | {type: 'absent', detail: IfAbsent}
     | {type: 'present', value: IfPresent}
    
    // Church-encoded version
    type Either<x, y> = <z>(ifLeft: (x: x) => z, ifRight: (y: y) => z) => z
    
    // isomorphism between the two
    function church<x, y>(opt: Optional<x, y>): Either<x, y> {
      return (ifLeft, ifRight) => opt.type === 'absent'? ifLeft(opt.detail) : ifRight(opt.value)
    }
    function unchurch<x, y>(opt: Either<x, y>): Optional<x, y> {
      return opt<Optional<x,y>>(x => ({type: 'absent', detail: x}), y => ({type: 'present', value: y}))
    }
In addition the Church encoding of a sum type, is a function that takes N handler functions and calls the appropriate one for the case that the data type is in. With a little squinting, this is the Visitor pattern.

    interface LeftRightVisitor<X, Y, Z> {
      visit(x: Left<X>): Z
      visit(y: Right<Y>): Z
    }
    interface LeftRight<X, Y> {
      accept<Z>(visitor: LeftRightVisitor<X, Y, Z>): Z;
    }
    class Left<X> implements LeftRight<X, any> {
      constructor(public readonly x: X) {}
      accept<Z>(visitor: LeftRightVisitor<X, any, Z>) {
        return visitor.visit(this)
      }
    }
    class Right<Y> implements LeftRight<any, Y> {
      constructor(public readonly y: Y) {}
      accept<Z>(visitor: LeftRightVisitor<any, Y, Z>) {
        return visitor.visit(this)
      }
    }
    // isomorphism
    function visitify<X, Y>(opt: Optional<X, Y>): LeftRight<X, Y> {
      return opt.type === 'absent' ? new Left(opt.detail) : new Right(opt.value)
    }
    function unvisitify<X, Y>(opt: LeftRight<X, Y>): Optional<X, Y> {
      return opt.accept({
        visit(value: Left<X> | Right<Y>) {
          return value instanceof Left? {type: 'absent', detail: value.x} : {type: 'present', value: value.y}
        }
      })
    }
The main difference with the usual visitor pattern is that the usual visitor pattern doesn't return anything (it expects you to be holding some mutable state and the visitor will mutate it), you can do that too if you don't have access to a suitable generic for the Z parameter.
crdrost 2 days ago | parent | prev [-]

Approximate expansion of the original claim, without direct endorsement:

Suppose you have an Order object that needs to track where some larger process is in relation to three subtasks. We could imagine say that the Inventory department needs to set a physical object aside, then the Billing department needs to successfully charge for it, then the Shipping department needs to retrieve that physical object and ship it.

You start from a description of this as "one Order contains three Subtasks" where a Subtask contains the Go-style (Optional[Result], Optional[Error]) type. This architecture almost fits into a relational database, except that foreign key constraints are a bit funky if you shove everything into one nullable Result column. But let's just have the Result be some random JSON in the Subtasks table and let our relational purists weep.

Then you read this advice and you start to see that this allows for a lot of illegal states: things could contain both a result AND an error, or neither. You eventually decide that neither, is an allowed state. These are two boolean flags representing only 3 legal states and so they need to be factored into an enum: the enum is "Pending | Success[Result] | Failure[Error]".

Well, except the problem is a bit more nuanced because the pending-states also need to be consistent among the different subtasks: there is a dependency graph among them. So you should actually have an enum that says:

    Inventory_Pending
    Inventory_Failure[Error]
    Inventory_OK_Billing_Pending[InventoryData]
    Inventory_OK_Billing_Failure[InventoryData, Error]
    Inventory_OK_Billing_OK_Shipping_Pending[InventoryData, BillingData]
    Inventory_OK_Billing_OK_Shipping_Failure[InventoryData, BillingData, Error]
    Inventory_OK_Billing_OK_Shipping_OK[InventoryData, BillingData, ShippingData]
See, you would have had 3x3x3 = 27 valid states before for the Order but we have reduced to only the 7 legal states. Yay!

But now consider e.g. the following mutation. On Failure cases the executives at our company mandate that we never return a failed Order to a Pending status, rather we must always create a separate Order. This Order might skip inventory and/or billing and those need to be represented separately, as Inventory Skipped[OrderID] or InventoryAndBillingSkipped[OrderID]. So now our list of states following the "no unrepresentable state" logic, should really be:

    [... the 7 above, plus ...]
    Inventory_Skipped_Billing_Pending[OrderID]
    Inventory_Skipped_Billing_Failure[OrderID, Error]
    Inventory_Skipped_Billing_OK_Shipping_Pending[OrderID, BillingData]
    Inventory_Skipped_Billing_OK_Shipping_Failure[OrderID, BillingData, Error]
    Inventory_Skipped_Billing_OK_Shipping_OK[OrderID, BillingData, ShippingData]
    Inventory_And_Billing_Skipped_Shipping_Pending[OrderID]
    Inventory_And_Billing_Skipped_Shipping_Failure[OrderID, Error]
    Inventory_And_Billing_Skipped_Shipping_OK[OrderID, ShippingData]
Now someone else wants to add remediation actions, but only to remediate the exact error in the failure state, so _Failure is going to mean "no remediation taken" but we need to add some _Remediation with a boolean saying whether that process has completed or not. So we add:

    Inventory_Remediation[Error, Bool, Array[RemediationEvent]]
    Inventory_OK_Billing_Remediation[InventoryData, Error, Bool, Array[RemediationEvent]]
    Inventory_OK_Billing_OK_Shipping_Remediation[InventoryData, BillingData, Error, Bool, Array[RemediationEvent]]
    Inventory_Skipped_Billing_Remediation[OrderID, Error, Bool, Array[RemediationEvent]]
    Inventory_Skipped_Billing_OK_Shipping_Remediation[OrderID, BillingData, Error, Bool, Array[RemediationEvent]]
    Inventory_And_Billing_Skipped_Shipping_Remediation[OrderID, Error, Bool, Array[RemediationEvent]]
We're only up to 21 total states so far which is still probably manageable? But these changes do demonstrate exponential growth, which is a technical term that means that the growth of each step is some small fraction of the total growth that has happened up until that point. Because everything depends on Inventory (it's at the root of the tree), when we add a new state that the Inventory can be in (Skipped) we have to add enum cases for all of the other states, and we pay a cost proportional to the size of the tree. Similarly when everything can have an error (at the leaves of the tree), when we add a new uniform requirement for errors we have to add new leaves all throughout the tree and we pay a cost proportional to the size of the tree. (Another thing to notice about the Remediation state is that it is a Pending state for another Subtask that could have been added to the original Order whenever something moves into Failure mode.)

You get something powerful by reducing the 256ish-or-whatever states into the 21 legal states; you have a compile-time assurance that no bugs in your code have created weird states that can propagate their weirdnesses throughout the system. But you also have to maintain the 21 legal states all at once, instead of maintaining 4 subtasks each having one of 4 statuses.