Think pretty much anything is going to get a enormous speed boost if the model isn’t undergoing mem latency but is just inherently baked into the circuits asic style