| ▲ | talloaktrees 6 hours ago | |||||||
except it's not true | ||||||||
| ▲ | observationist 5 hours ago | parent [-] | |||||||
It's not not true, it's just that things are getting lost in the excitement. There are some specific cases where there's a big boost, it's just not exactly what people are hoping. >>>The "1/6th" specifically appears in community comparisons to DeepSeek's mHC (multi-lane highway connections, a prior technique for better depth-wise information flow in deep models). Several Chinese-language sources and downstream discussions (e.g., translated articles, YouTube breakdowns, and blogs like houdao.com) state that Block AttnRes achieves comparable (or better) performance to mHC while using only one-sixth of the data read/write volume (or memory bandwidth pressure) during inference/engineering deployment. There are specific cases where that speedup does occur; it's not going to translate exactly into local models or other architectures or hardware. | ||||||||
| ||||||||