Given that DeepSwe is one of the very few coding benchmarks worth taking a look at, this achieves rather excellent result at it (not far from opus 4.8).

From looking at the results and my own impression of 5.1 and other models, I think this is the best Chinese coding model by some non-insignificant margin.

▲

LaurensBER 10 hours ago | parent [-]

I've been very pleased with it's performance over the last few days.

It's definitely not near Opus 4.8 level but it's very impressive nonetheless and it does do design extremely well.

▲

ebbi 9 hours ago | parent [-]

> it does do design extremely well

Better than Opus?

▲

osti 6 hours ago | parent [-]

I don't know what people mean when they say design lol, is it for frontends?

	▲	ebbi 2 hours ago \| parent [-]
		Yeah, that's what I mean anyway. Each model has certain design tropes it repeats everywhere, and some of them are very old-school or not really UI best practice. And then the more ambitious cases where you ask for a feature without being prescriptive with UI needs, the end result is sometimes atrocious with weird font use, colours, etc.