Remix.run Logo
starchild3001 5 days ago

Really appreciate the depth of this paper; it's a welcome change from the usual model announcement blog posts. The Zhipu/Tsinghua team laid out not just the 'what' but the 'how,' which is where the most interesting details are for anyone trying to build with or on top of these models.

The post-training methodology (Sec 3) is what really stands out to me. The idea of creating specialized 'expert models' for reasoning, agents, and chat, and then distilling their capabilities into a final unified model is a fascinating approach. It feels like a more structured way to solve the "jack of all trades, master of none" problem that can plague generalist models. Instead of just mixing all the data, they're essentially having a generalist learn from a committee of specialists.

A couple of the findings from their RL experiments are pure gold for anyone working in this space. The counter-intuitive result that a single-stage RL process at the full 64K context length outperforms a progressive, multi-stage approach (Fig 6) is a fantastic lesson. I've seen teams assume the opposite would be true. Also, the pragmatic choice to use an XML-like template for function calls to avoid JSON escaping hell (Fig 4) may be a small but brilliant engineering decision that makes a huge difference in practice. Wrangling escaped code inside JSON turns out to be a mess.

The performance on SWE-bench is impressive, putting it in the same league as much larger or proprietary models. What I’d love to see, and maybe others here have thoughts, is whether this hybrid training recipe holds up outside ARC-style evals. For example, do the agentic improvements transfer to messier, real-world workflows where APIs are undocumented, partial failures are common, and user input is full of ambiguity?

algo_trader 5 days ago | parent | next [-]

Are all these "post/mid-training tweaks" important if you have a specific domain with abundant/verified/synthesis data and labels?

Can a small team working on ASI/domain-specific stick to scaling 2024-era best practices training stack? Or will they miss massive improvements?

starchild3001 4 days ago | parent [-]

> Are all these post/mid-training tweaks important with abundant, verified, synthetic domain data?

No. Many are aimed at cleaning/aligning noisy, mixed-domain data. With abundant, high-quality domain data, you can skip most of the complexity and focus on direct SFT/RL on your corpus.

> Can a small team stick to scaling 2024-era best practices?

2024 was the year of SFT. I believe fitting reasoning traces to your final responses via RL is the technique-du-jour of 2025. Jumping from SFT to RL training might be biggest gain here if RL can be applied to your problem (e.g. math, coding etc).

calmoo 5 days ago | parent | prev [-]

I don't want to call you out unnecessarily, but your writing heavily smells of LLMs.

edit: looks like i'm not the first person to notice this either regarding this poster. https://news.ycombinator.com/item?id=44279662

I think we have a duty to call this out, before the web becomes ridden with slop.

tomhow 4 days ago | parent | next [-]

Please don't do this here. If a comment seems unfit for HN, please flag it and email us at hn@ycombinator.com so we can have a look.

calmoo 3 days ago | parent [-]

Will do, thanks!

tomhow 2 days ago | parent [-]

Appreciated!

starchild3001 5 days ago | parent | prev | next [-]

Yes, I occassionally use LLM for edits and re-writes. Opinions are mine. I thought most people do these days?

(Re: Other post you linked to. it is entirely my own thoughts.)

tomhow 4 days ago | parent | next [-]

We prefer the raw, flawed original version than the AI-polished version on HN. It makes HN feel more real and authentic. If other community members notice that your writing seems AI-enhanced, you've taken it too far.

calmoo 4 days ago | parent | prev [-]

I do admit your post had some apparent substance to it (at least from my naive perspective), but that substance is really diminished when it's passed through an LLM. I think something is lost when someone's unique writing is 'planed' by AI. I'd really encourage you to avoid this kind of editing and just use your own words.

sapphire42 5 days ago | parent | prev | next [-]

The comment you're replying to is 100% AI-generated. How does obviously LLM-generated content continually make it to the front of HN, and why in God's name are you being downvoted for calling this out??

"...a fascinating approach..." (LLMs think everything is fascinating)

"...they're essentially having a generalist learn from a committee of specialists..." (analogies, analogies)

"...where APIs are undocumented, partial failures are common, and user input is full of ambiguity..." (typical AI rule of three template with semantically similar parameters that contribute nothing to the overall meaning)

calmoo 5 days ago | parent | next [-]

It does worry me how defensive people can become over really obvious slop - I don't think I'm even particularly attuned to the style of LLM writing but it is incredibly obvious every time I see it. It's only going to get worse I think.

varelse 5 days ago | parent [-]

[dead]

unshavedyak 5 days ago | parent | prev [-]

> and why in God's name are you being downvoted for calling this out??

Tinfoil hat time, but perhaps the bots don't like being called out? I don't actually take that statement seriously, but it seems an eventual avenue. They've long been seeding threads on Reddit to shape initial hive mind, i imagine that's going to get more advanced and widespread.

jasonjmcghee 5 days ago | parent | prev | next [-]

> ...is what really stands out to me. The idea of...

> ...are pure gold for anyone working in this space...

Specifically OpenAI

HSO 5 days ago | parent | prev | next [-]

不管黑猫白猫,能捉到老鼠就是好猫

dwaltrip 5 days ago | parent | prev | next [-]

I see your points, but is this actually slop in this case? Is the comment incorrect or misleading at all?

It felt interesting and informative to me, but I didn’t verify any of it.

Good eye btw.

ranyume 5 days ago | parent | prev [-]

You did call out.

calmoo 5 days ago | parent [-]

If you read my comment closely, I didn't deny calling anyone out.