Remix.run Logo
Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM(arxiv.org)
28 points by dryarzeg 7 hours ago | 4 comments
martinald 3 hours ago | parent | next [-]

Why is this a paper? It's just using the n-cpu-moe option on llama.cpp? What am I missing here?

Farmadupe 2 hours ago | parent | next [-]

It's amazingly vacuous isn't it? I think the most interesting read was the fact that they were surprised llama.cpp crashed when they used a bad set of commandline arguments.

Although in the section immediately above the observation they claimed that they ran 10 whole completions with 100% success rate. So who knows.

I have to admit I slightly miss the flood of AI-psychosis research papers that seemed to be popping up a couple of months ago. Good to know there's still one or two new ones floating around.

LoganDark 2 hours ago | parent | prev [-]

Apparently the author has a patent about it, too.

sandworm101 3 hours ago | parent | prev [-]

Um, doesn't the 4060 laptop card have the ability to share system memory?

Wait... My mistake. Google AI says the 4060 mobile can access system memory but tech sheets say no.