| ▲ | ilaksh 6 hours ago |
| Amazing. Is it possible to do this with Qwen 3.6 27B? Will it work with quants (I assume so)? |
|
| ▲ | sleepyeldrazi 3 hours ago | parent [-] |
| From a quick and shallow view of the paper, it looks very feasible (with a little tinkering ) to be adapted to qwen3.6 27B. The process looks somewhat similar to training a LoRA, or in a way distilling your own model so that a mini model learns how to imitate it, and you glue them. I might bite the bullet and rent a gpu to do it for 3.6 27b, as this will solve a lot of my problems. |
| |
| ▲ | sleepyeldrazi 3 hours ago | parent [-] | | Scratch that, I don't have that kind of money, and 3.5's architecture is a little more divergent from 3's, so it will be a bit less trivial. It does look possible, just not on a student's paycheck. | | |
| ▲ | Boranbruh 2 hours ago | parent [-] | | There are websites that let you rent GPUs for cheap, such as QuickPod. Have you checked those P2P GPU rentals out? | | |
| ▲ | sleepyeldrazi an hour ago | parent [-] | | My plan is to validate it first using qwen3.5 0.8B if it even works (as it has the same architecture as qwen3.6 27b, just scaled down a bit) on my 3090. If it does, I'll make a git about the process if anyone wants to use my approach, while I try to convince my uni to lend me h100s for a day. |
|
|
|