| ▲ | horsawlarway an hour ago | |
I'm particularly curious to know how this plays out, and I seriously hope that more labs focus on diffusion models for text usage. My immediate thought - this performs slightly worse than the autoregressive gemma equivalent, but it may also let me functionally run better models in diffusion variants. Ex - I can run 70b-120b autoregressive models locally right now, but I get ~5-15t/s, which just isn't fast enough for serious work. Which caps me down in the 20-36b models (ex - gemma4) where I can get 100+t/s on the same hardware. So the question becomes - does the quality drop from a diffusion model outweigh the quality bump from using a larger model? Because if not... sounds like diffusion models have a lot of space to thrive. --- Sadly - if they can't be hosted profitably, I question whether this space will actually be explored. | ||