▲ | Zacharias030 4 days ago | |||||||
There is no reason that it couldn’t be beneficial for training though. | ||||||||
▲ | cubefox 4 days ago | parent [-] | |||||||
Except that speculative decoding is de facto only an inference time optimization. But the H-Net architecture from the previous reference, which doesn't require tokens or speculative decoding, does something similar both for inference and training. | ||||||||
|