> 2. WAY lower bandwidth requirements for inference. Means with approaches like this it should run on consumer hardware far better. It apparently requires 1/6th the memory bandwidth of a traditional approach for better results.

That should be the headline right there. Giant side 60 font headline.

Some people have PhDs in burying the lede!

▲

talloaktrees 6 hours ago | parent [-]

except it's not true

▲

observationist 5 hours ago | parent [-]

It's not not true, it's just that things are getting lost in the excitement. There are some specific cases where there's a big boost, it's just not exactly what people are hoping.

>>>The "1/6th" specifically appears in community comparisons to DeepSeek's mHC (multi-lane highway connections, a prior technique for better depth-wise information flow in deep models). Several Chinese-language sources and downstream discussions (e.g., translated articles, YouTube breakdowns, and blogs like houdao.com) state that Block AttnRes achieves comparable (or better) performance to mHC while using only one-sixth of the data read/write volume (or memory bandwidth pressure) during inference/engineering deployment.

There are specific cases where that speedup does occur; it's not going to translate exactly into local models or other architectures or hardware.

	▲	djsjajah 5 hours ago \| parent [-]
		No. It seems to me that the comment is objectively incorrect. The original comment was talking about inference and from what I can tell, it is strictly going to run slower than the model trained to the same loss without this approach (it has "minimal overhead"). The main point is that you wont need to train that model for as long.