Remix.run Logo
cyanydeez 2 hours ago

yeah, then theres prompt loading too.

but anyone who can fit QWEN-3.6 35B with a sustained ~30 token/s and ~100k context with cache could print money as a hardware vendor.

wmf 2 hours ago | parent [-]

That just sounds like a 3090.

cyanydeez 10 minutes ago | parent [-]

not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.