The 31B is surprisingly fast too, for a dense model. Runs tg at least twice as fast as it ought to on my machine when compared to other 30B, probably due to the hybrid attention I guess. Ingestion is somewhat slower though.