| ▲ | yjftsjthsd-h 3 hours ago | |
Is anyone doing any form of diffusion language models that are actually practical to run today on the actual machine under my desk? There's loads of more "traditional" .gguf options (well, quants) that are practical even on shockingly weak hardware, and I've been seeing things that give me hope that diffusion is the next step forward, but so far it's all been early research prototypes. | ||
| ▲ | janalsncm 2 minutes ago | parent | next [-] | |
I worked on it for a more specialized task (query rewriting). It’s blazing fast. A lot of inference code is set up for autoregressive decoding now. Diffusion is less mature. Not sure if Ollama or llama cpp support it. | ||
| ▲ | Bolwin 3 hours ago | parent | prev [-] | |
Based on my experience running diffusion image models I really hope this isn't going to take over anytime soon. Parallel decoding may be great if you have a nice parallel gpu or npu but is dog slow for cpus | ||