Remix clone Hacker News

new | show | ask | jobs Github

	▲	dnhkng 6 hours ago
		Thanks! I have pushed basic code to GitHub (https://github.com/dnhkng/RYS) Some interesting areas to explore might be a combination of deleting some layers and duplicating others. i.e. reduce VRAM by dropping some layer (this works, well documented), and recovering performance by duplicating others (saves VRAM). I am not pursuing this, but it seems interesting!
	▲	vessenes 4 hours ago \| parent [-]
		Thanks -- interesting. I like the idea of ablating layers. I guess you could get a differentiable stack that has a layer skip and layer copy/loop and a total memory use loss function; that would let someone ship either a big (usually ablate) or little (usually copy) model. The expert routing for longer sequences interests me a lot because the edge inference issue is always memory bandwidth.