Consistency diffusion language models: Up to 14x faster, no quality loss

nl a minute ago | parent | next [-]

Releasing this on the same day as Taalas's 16,000 token-per-second acceleration for the roughly comparable Llama 8B model must hurt!

I wonder how far down they can scale a diffusion LM? I've been playing with in-browser models, and the speed is painful.

https://taalas.com/products/

▲

MASNeo 20 minutes ago | parent | prev | next [-]

I wish there would be more of this research to speed things up rather than building ever larger models

	▲	nl 11 minutes ago \| parent [-]
		Why not both? Scaling laws are real! But they don't preclude faster processing.

▲

yjftsjthsd-h 2 hours ago | parent | prev | next [-]

Is anyone doing any form of diffusion language models that are actually practical to run today on the actual machine under my desk? There's loads of more "traditional" .gguf options (well, quants) that are practical even on shockingly weak hardware, and I've been seeing things that give me hope that diffusion is the next step forward, but so far it's all been early research prototypes.

	▲	Bolwin 2 hours ago \| parent [-]
		Based on my experience running diffusion image models I really hope this isn't going to take over anytime soon. Parallel decoding may be great if you have a nice parallel gpu or npu but is dog slow for cpus

▲

LarsDu88 2 hours ago | parent | prev | next [-]

Google is working on a similar line of research. Wonder why they haven't rolled out a GPT40 scaled version of this yet

	▲	vintermann an hour ago \| parent [-]
		Probably because it's expensive. But I wish there were more "let's scale this thing to the skies" experiments from those who actually can afford to scale things to the skies.

▲

refulgentis 2 hours ago | parent | prev [-]

If this means there’s a 2x-7x speed up available to a scaled diffusion model like Inception Mercury, that’ll be a game changer. It feels 10x faster already…

▲

blurbleblurble an hour ago | parent [-]

Diffusion language models seem poised to smash purely autoregressive models. I'm giving it 1-2 years.

▲

meatmanek 21 minutes ago | parent [-]

Feels like the sodium ion battery vs lithium ion battery thing, where there are theoretical benefits of one but the other has such a head start on commercialization that it'll take a long time to catch up.

	▲	sroussey 6 minutes ago \| parent [-]
		Same with digital vs analog