Remix.run Logo
pizlonator 12 hours ago

> Remat can produce a performance boost even when everything has a register.

Can you give an example?

fooker 12 hours ago | parent [-]

Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.

Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.

pizlonator 11 hours ago | parent [-]

> Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.

While this is literally "rematerialization", it's such a different case of remat from what I'm talking about that it should be a different phase. It's optimizing for a different goal.

Also feels very GPU specific. So I'd imagine this being a pass you only add to the pipeline if you know you're targeting a GPU.

> Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.

This also feels like it's gotta be GPU specific.

No chance that doing this on a CPU would be a speed-up unless it saved you reg pressure.