I'm thinking the next step would be to include this as a 'junior dev' and let Opus farm simple stuff out to it. It could be local, but also if it's on cerebras, it could be realllly fast.

▲

ttoinou 8 hours ago | parent [-]

Cerebras already has GLM 4.7 in the code plans

▲

vessenes 8 hours ago | parent [-]

Yep. But this is like 10x faster; 3B active parameters.

▲

ttoinou 7 hours ago | parent [-]

Cerebras is already 200-800 tps, do you need even faster ?

▲

overfeed 6 hours ago | parent [-]

Yes! I don't try to read agent tokens as they are generated, so if code generation decreases from 1 minute to 6 seconds, I'll be delighted. I'll even accept 10s -> 1s speedups. Considering how often I've seen agents spin wheels with different approaches, faster is always better, until models can 1-shot solutions without the repeated "No, wait..." / "Actually..." thinking loops

	▲	pqtyw 3 hours ago \| parent [-]
		> until models can 1-shot solutions without the repeated "No, wait..." / "Actually..." thinking loops That would imply they'd have to be actually smarter than humans, not just faster and be able to scale infinitely. IMHO that's still very far away..