Remix.run Logo
withoutboats3 3 days ago

This post is written by a fan of implicits, so it frames it as "better" than traits, though at the end it admits it is in fact a complex trade off, which is the truth. In my opinion, the trade off favors traits, but others may feel differently.

The core difference between traits (also called type classes) and ML modules is that with traits the instance/implementation has no name, whereas for ML modules they do. The analogy here is between Rust/Haskell's traits/typeclasses and ML's signatures and between Rust/Haskell's impls/instances and ML's structures. In Rust/Haskell, implementations are looked up by a tuple of types and a trait to determine the implementation. The advantage of this is that you don't need to name the impl and then invoke that name every time you use it; since we usually don't think of "Hash for i32" as something which has a meaningful name beyond the relationship between Hash and i32, this is quite nice.

But coherence requires that instances resolve consistently: if I hash an integer in one code location to insert into a map and then hash it again in a different location to do a lookup on the same map, I need to hash integers the same way each time. If you care about coherence, and the correctness property it implies, you can't allow overlapping impls if impls aren't named, because otherwise you aren't guaranteed a consistent result every time you look up the impl.

This introduces another problem: you can't see all the impls in the universe at once. Two libraries could add impls for types/traits in their upstream dependencies, and the incoherence won't be discovered until they are compiled together later on. This problem, called "orphan impls," causes its own controversy: do you just let downstream users discover the error eventually, when they try to combine the two libraries, or do you prohibit all orphan impls early on? Rust and Haskell have chosen different horns of this dilemma, and the grass is always greener.

Of course with implicits, this author intends a different solution to the problem of resolving instances without naming them: just allow incoherence (which they re-brand as "local coherence"). Instead, incoherence impls are allowed and are selected in a manner based on proximity to code location.

As the post eventually admits, this does nothing to solve the correctness problem that coherence is meant to solve, because code with different nearest impls can be compiled together, and in Rust such a correctness problem could become a memory safety problem, and how you figure out if the impl you've found for this type is actually the nearest impl to your code is left as an exercise to your reader. But sure, since you've rebranded incoherence to "local coherence" you can do some juxtaposed wordplay to call coherence a "local maxima" because achieving it has the downside that you can't have arbitrary orphan impls.

davidatbu 3 days ago | parent | next [-]

I read through this, and thought to myself: "wow, what a response that elucidates the PL design tradeoff space while giving real world examples of languages that occupy various points on that space; all as concisely and economically as possible."

And then I read the user name. Of course it's boats!!

Thank you for all your work! I want to say that especially since I've noticed a lot of shallow dismissal of your work recently (shallow because the dismissal often doesn't engage with the tradeoffs of whatever alternative solution it proposes in the context of Rust, among other things), and would like you to know there's a lot of us who are very very grateful for all the productivity and empowerment you've enabled through your contribution to Rust.

atq2119 3 days ago | parent | prev | next [-]

I'm not convinced by your example of hashing.

Let's assume for the sake of argument that the standard library didn't implement Hash for i32.

You could then have two crates, A and B, with different implementations of Hash for i32, and both could instantiate HashMap<i32>.

This can be made to work if we recognize the HashMap<i32> in crate A as a different type than the HashMap<i32> in crate B.

This only really works if orphan implementations are exported and imported explicitly to resolve the conflict that arises from a crate C that depends on A and B.

If C wants to handle HashMap<i32>, it needs to decide whether to import the orphan implementation of Hash for i32 from crate A or B (or to define its own). Depending on the decision, values of type HashMap<i32> can move between these crates or not.

Basically, the "proximity to code location" is made explicit in a way the programmer can control.

This makes type checking more complex, so it's not clear whether the price is worth it, but it does allow orphan implementations without creating coherence problems.

withoutboats3 3 days ago | parent [-]

Implementations are not imported at all because they are not named. Like I wrote, named implementations (ala ML modules) is a valid alternative, but one with a much greater annotation burden.

You could imagine having named impls that are allowed to be incoherent as an additional feature on top of coherent unnamed impls, but to use them you would need to make any code that depends on their behavior parameterized by the impl as well as the types. In fact, you can pretty trivially emulate that behavior in Rust today by adding a dummy type parameter to your type and traits.

Again, it's all a set of trade offs.

atq2119 3 days ago | parent [-]

Right, but what I'm describing is a tradeoff point that's between the extremes, where implementations are unnamed but can still be explicitly imported.

Making my example more explicit, you'd need syntax along the lines of

    // inside crate C
    use A::impl std::hash::Hash for i32;
This syntax would explicitly be limited to orphan implementations.

I suppose to further clarify, there's still some coherence requirement there in that crate C can't import the conflicting implementations from both A and B. Which could then perhaps be worked around by adding syntax to spell types along the lines of

    HashMap<i32 + A::impl Hash, V>
Which you could argue is a form of naming implementations, I suppose? I'm not familiar with ML. You could maybe also think of it as a more ergonomic way of doing (more or less) those wrapper types.

In any case, the annotation burden only exists where it's actually needed to enable orphan implementations.

And either way, multiple different impls can safely coexist within the overall set of code that's linked together, with everything being statically checked at compile time.

thunderseethe 3 days ago | parent [-]

I think rather than at odds with without.boats is saying, this is very much aligned with what they are suggesting. While not literally `use A::impl std::hash::Hash for i32` is for all intents and purposes naming the impl.

Similarly, `HashMap<i32 + A::impl Hash, V>` is what they are talking about when they refer to parameterizing code on the impl chosen.

atq2119 3 days ago | parent [-]

Essentially, yes. What I don't see is their claim that it's a "much greater annotation burden". Compared to what? Rust today just doesn't allow this at all, and if you use a wrapper type to simulate it, you definitely end up with more "annotations" (boilerplate).

couchand 3 days ago | parent [-]

FWIW It's not at all clear to me how this requirement would be implemented in practice: "This syntax would explicitly be limited to orphan implementations."

atq2119 3 days ago | parent [-]

Maybe I'm missing something, but the compiler can tell whether an implementation is an orphan. That's how you get an error message today if you try to write one. So I don't know what difficulty you have in mind.

marcosdumay 3 days ago | parent | prev | next [-]

I'm pretty sure the article resolves the implicit dependencies at the point of the declaration. (Did I misunderstood it?)

So, you don't have a `data HashMap datatype`, you have a `data HashMap hashAlgo datatype`, where hashAlgo is decided implicitly by the context. That's the entire reason it's called "implicit".

Every other usage of the data knows how to hash your values because of that `hashAlgo` parameter. It doesn't matter where it happens.

thunderseethe 3 days ago | parent | prev [-]

Great write up, and you're absolutely right that implicits are moving towards ML modules. Quite possibly a production system would end up being synonymous with ML modules out of the need for named impls.

Small nit in terminology, the implicits described are coherent. Part of their value over previous implicit work is that they are coherent and stable. I have generally seen the property you're referring to called canonicity, which they do lack.