Remix.run Logo
fooker 12 hours ago

There is a rematerialize pass, there is no real reason to couple it with register allocation. LLVM regalloc is already somewhat subpar.

What would be neat is to expose all right knobs and levers so that frontend writers can benchmark a number of possibilities and choose the right values.

I can understand this is easier said than done of course.

pizlonator 12 hours ago | parent [-]

> There is a rematerialize pass, there is no real reason to couple it with register allocation

The reason to couple it to regalloc is that you only want to remat if it saves you a spill

fooker 12 hours ago | parent [-]

Remat can produce a performance boost even when everything has a register.

Admittedly, this comes up more often in non-CPU backends.

pizlonator 12 hours ago | parent [-]

> Remat can produce a performance boost even when everything has a register.

Can you give an example?

fooker 12 hours ago | parent [-]

Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.

Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.

pizlonator 11 hours ago | parent [-]

> Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.

While this is literally "rematerialization", it's such a different case of remat from what I'm talking about that it should be a different phase. It's optimizing for a different goal.

Also feels very GPU specific. So I'd imagine this being a pass you only add to the pipeline if you know you're targeting a GPU.

> Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.

This also feels like it's gotta be GPU specific.

No chance that doing this on a CPU would be a speed-up unless it saved you reg pressure.