▲ | Tostino 2 days ago | |
This is something I have been thinking about integrating into a sampler for standard autoregressive LLMs. The idea is to buffer N context tokens from the ongoing autoregressive generation. Then, every K tokens, a section of this buffer (or perhaps the whole buffer) could be processed by a diffusion model, guided by one or more specific commands to operate on that buffered text. One application I envision for this kind of sampler, leveraging the diffusion model's capabilities, would be to detect and potentially correct instances of post-hoc reasoning within the buffer. The diffusion model could then help ensure that proper causal reasoning chains are established in that segment before the autoregressive model continues generating. You could also allow for slight, controlled backtracking or revision within that buffer window if the generation starts to go off-track, again using the diffusion model to smooth or adjust the text before committing it and moving forward. |