Remix.run Logo
jcranmer 4 days ago

Pointer provenance probably dates back to the 70s, although not under that name.

The essential idea of pointer provenance is that it is somehow possible to enumerate all of the uses of a memory location (in a potentially very limited scope). By the time you need to introduce something like "volatile" to indicate to the compiler that there are unknown uses of a variable, you have to concede the point that the compiler needs to be able to track all the known uses within a compiler--and that process, of figuring out known uses, is pointer provenance.

As for optimizations, the primary optimization impacted by pointer provenance is... moving variables from stack memory to registers. It's basically a prerequisite for doing any optimization.

The thing is that traditionally, the pointer provenance model of compilers is generally a hand-wavey "trace dataflow back to the object address's source", which breaks down in that optimizers haven't maintained source-level data dependency for a few decades now. This hasn't been much of a problem in practice, because breaking data dependencies largely requires you to have pointers that have the same address, and you don't really run into a situation where you have two objects at the same address and you're playing around with pointers to their objects in a way that might cause the compiler to break the dependency, at least outside of contrived examples.

JonChesterfield 4 days ago | parent [-]

My grievance isn't with aliasing or dataflow, it's with a pointer provenance model which makes assumptions which are inconsistent with reality, optimises based on it, then justifies the nonsense that results with UB.

When the hardware behaviour and the pointer provenance model disagree, one should change the model, not change the behavior of the program.

jcranmer 4 days ago | parent [-]

Give me an example of a program that violates pointer provenance (and only pointer provenance) that you think should be allowed under a reasonable programming model.

JonChesterfield 3 days ago | parent [-]

This is rather woven in with type themed alias analysis which makes a hard distinction tricky. E.g realloc doesn't work under either, but the provenance issue probably only shows up under no-strict-aliasing.

I like pointer tagging because I like dynamic language implementations. That tends to look like "summon a pointer from arithmetic", which will have unknown to the compiler provenance, which is where the deref without provenance is UB demon strikes.

jcranmer 3 days ago | parent [-]

I think you're misunderstanding pointer provenance, and you're being angry at a model that doesn't exist.

The failure mode of pointer provenance is converting an integer to a pointer to an object that was never converted to an integer. Tricks like packing integers into unknown bits or packing pointers into floating-point NaNs don't violate pointer provenance--it's really no different from passing a pointer to an external function call and getting it back from a different external function call.

JonChesterfield 3 days ago | parent [-]

That's definitely possible. The UB if no provenance information is available belief comes from https://www.cl.cam.ac.uk/~pes20/cerberus/clarifying-provenan..., in particular

> access via a pointer value with empty provenance is undefined behaviour

I'm annoyed that casting an aligned array of bytes to a pointer to a network packet type is forbidden, and that a pointer to float can't be cast to a pointer to a simd vector of float, and that malloc cant be written in C, but perhaps those aren't provenance either.

jcranmer 3 days ago | parent [-]

> The UB if no provenance information is available belief comes from https://www.cl.cam.ac.uk/~pes20/cerberus/clarifying-provenan..., in particular

That's an old document. In particular, it's largely arguing for a PVI provenance model (i.e., integers carry provenance information), whereas the current TS is relying on a PNVI provenance model (i.e., integers do not carry provenance information). https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2577.pdf is the last draft pre-TS-ification (i.e., has all the background information to understand it).

> I'm annoyed that casting an aligned array of bytes to a pointer to a network packet type is forbidden, and that a pointer to float can't be cast to a pointer to a simd vector of float, and that malloc cant be written in C, but perhaps those aren't provenance either.

That's all strict aliasing rules, not pointer provenance rules. (Well, malloc has issues with living in the penumbra of the C object model). The big thing that provenance prevents you from doing is writing memcpy in C (since char access of a pointer counts as exposing the pointer, whereas the PNVI model makes memcpy a non-exposing operation).