Remix.run Logo
kamikazeturtles 11 hours ago

> It doesn't perform on par with Anthropic's models in my experience.

Why do you think that is the case? Is Anthropic's models just better or do they train the models to somehow work better with the harness?

mmargenot 11 hours ago | parent | next [-]

It is more common now to improve models in agentic systems "in the loop" with reinforcement learning. Anthropic is [very likely] doing this in the backend to systematically improve the performance of their models specifically with their tools. I've done this with Goose at Block with more classic post-training approaches because it was before RL really hit the mainstream as an approach for this.

If you want to look at some of the tooling and process for this, check out verifiers (https://github.com/PrimeIntellect-ai/verifiers), hermes (https://github.com/nousresearch/hermes-agent) and accompanying trace datasets (https://huggingface.co/datasets/kai-os/carnice-glm5-hermes-t...), and other open source tools and harnesses.

mmargenot 7 hours ago | parent [-]

Here’s an explicit example of the above from today using the above dataset: https://x.com/kaiostephens/status/2040396678176362540?s=46

MrScruff 11 hours ago | parent | prev | next [-]

It's a good question, I've wondered that myself. I haven't used GLM-5 with CC but I've used GLM-4.7 a fair amount, often swapping back and forth with Sonnet/Opus. The difference is fairly obvious - on occasions I've mistakenly left GLM enabled running when I thought I was using Sonnet, and could tell pretty quickly just based on the gap in problem solving ability.

esafak 11 hours ago | parent | prev [-]

They're just dumber. I've used plenty of models. The harness is not nearly as important.

vidarh 10 hours ago | parent [-]

The harness if anything matters more with those other models because of how much dumber they are... You can compensate for some of the stupidity (but by no means all) with harnesses that tries to compensate in ways that e.g. Claude Code does not because it isn't necessary to do so for Anthropics own models.