Thanks! I have other ideas, following Jeff Hawkins's Thousand Brains Project, but in this one I'm trying to get to cortical columns from the other side, from "standard" deep neural networks.
The short version: each layer trains itself independently using Hinton's Forward-Forward algorithm. Instead of propagating error gradients backward through the whole network, each layer has its own local objective: "real data should produce high activation norms, corrupted data should produce low ones." Gradients never cross layer boundaries. The human brain is massively parallel and part of that is not using backprop, so I'm trying to use that as inspiration.
You're right that the brain has backward-projecting circuits. But those are mostly thought to carry contextual/modulatory signals, not error gradients in the backprop sense. I'm handling cross-layer communication through attention residuals (each layer dynamically selects which prior layers to attend to) and Hopfield memory banks (per-layer associative memory written via Hebbian outer products, no gradients needed).
The part I'm most excited about is "sleep". During chat, user feedback drives reward-modulated Hebbian writes to the memory banks (instant, no gradients, like hippocampal episodic memory). Then a /sleep command consolidates those into weights by generating "dreams" from the bank-colored model and training on them with FF + distillation. No stored text needed, only the Hopfield state. The model literally dreams its memories into its weights.
Still early, training a 100M param model on TinyStories right now, loss is coming down but I don't have eval numbers yet.