| ▲ | Consistency diffusion language models: Up to 14x faster, no quality loss(together.ai) | |||||||||||||||||||||||||
| 56 points by zagwdt 3 hours ago | 11 comments | ||||||||||||||||||||||||||
| ▲ | nl a minute ago | parent | next [-] | |||||||||||||||||||||||||
Releasing this on the same day as Taalas's 16,000 token-per-second acceleration for the roughly comparable Llama 8B model must hurt! I wonder how far down they can scale a diffusion LM? I've been playing with in-browser models, and the speed is painful. | ||||||||||||||||||||||||||
| ▲ | MASNeo 20 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||
I wish there would be more of this research to speed things up rather than building ever larger models | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | yjftsjthsd-h 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Is anyone doing any form of diffusion language models that are actually practical to run today on the actual machine under my desk? There's loads of more "traditional" .gguf options (well, quants) that are practical even on shockingly weak hardware, and I've been seeing things that give me hope that diffusion is the next step forward, but so far it's all been early research prototypes. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | LarsDu88 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Google is working on a similar line of research. Wonder why they haven't rolled out a GPT40 scaled version of this yet | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | refulgentis 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||
If this means there’s a 2x-7x speed up available to a scaled diffusion model like Inception Mercury, that’ll be a game changer. It feels 10x faster already… | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||