Amazing. Is it possible to do this with Qwen 3.6 27B? Will it work with quants (I assume so)?

From a quick and shallow view of the paper, it looks very feasible (with a little tinkering ) to be adapted to qwen3.6 27B. The process looks somewhat similar to training a LoRA, or in a way distilling your own model so that a mini model learns how to imitate it, and you glue them. I might bite the bullet and rent a gpu to do it for 3.6 27b, as this will solve a lot of my problems.

▲

sleepyeldrazi 3 hours ago | parent [-]

Scratch that, I don't have that kind of money, and 3.5's architecture is a little more divergent from 3's, so it will be a bit less trivial. It does look possible, just not on a student's paycheck.

▲

Boranbruh 2 hours ago | parent [-]

There are websites that let you rent GPUs for cheap, such as QuickPod. Have you checked those P2P GPU rentals out?

	▲	sleepyeldrazi an hour ago \| parent [-]
		My plan is to validate it first using qwen3.5 0.8B if it even works (as it has the same architecture as qwen3.6 27b, just scaled down a bit) on my 3090. If it does, I'll make a git about the process if anyone wants to use my approach, while I try to convince my uni to lend me h100s for a day.