| ▲ | janalsncm an hour ago | |
I worked on it for a more specialized task (query rewriting). It’s blazing fast. A lot of inference code is set up for autoregressive decoding now. Diffusion is less mature. Not sure if Ollama or llama cpp support it. | ||
| ▲ | stavros 3 minutes ago | parent [-] | |
How was the quality? | ||