Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM

martinald 3 hours ago | parent | next [-]

Why is this a paper? It's just using the n-cpu-moe option on llama.cpp? What am I missing here?

	▲	Farmadupe 2 hours ago \| parent \| next [-]
		It's amazingly vacuous isn't it? I think the most interesting read was the fact that they were surprised llama.cpp crashed when they used a bad set of commandline arguments. Although in the section immediately above the observation they claimed that they ran 10 whole completions with 100% success rate. So who knows. I have to admit I slightly miss the flood of AI-psychosis research papers that seemed to be popping up a couple of months ago. Good to know there's still one or two new ones floating around.
	▲	LoganDark 2 hours ago \| parent \| prev [-]
		Apparently the author has a patent about it, too.

Um, doesn't the 4060 laptop card have the ability to share system memory?

Wait... My mistake. Google AI says the 4060 mobile can access system memory but tech sheets say no.