▲ | paulsmith a day ago | |
> say model returns the probability of the token "cat" on the second position as p_2("cat") = 0.3, while p_2("dog") = 0.6. We may want to replace "cat" with dog, and use it in the subsequent iterations. Might one tradeoff of speed/quality be a tree "search" for better outcomes by branching on logit choices? If a diffusion model is so much faster overall than AR, then I might not mind that I hunt or backtrack for the best probabilities overall. |