Remix.run Logo
sbinnee 9 hours ago

> What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars.

It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?

twotwotwo 4 hours ago | parent | next [-]

The fast Cerebras thing got me to try the Qwen3 models. I couldn't get them working all that well: they had trouble using the required output format and following instructions. On the other hand, benchmarks say they should be great, and it sounds like maybe some people use them OK via different tools.

I'm curious if my experience was unusual (it very much could be!) and I'd be interested to hear from anyone who's used both.

IgorPartola 8 hours ago | parent | prev | next [-]

I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.

__mharrison__ 5 hours ago | parent | next [-]

I use Aider heavily and find their benchmark to be pretty good. It is updated relatively frequently (a month ago, which may be an eternity in AI time).

https://aider.chat/docs/leaderboards/

Jgoauh 7 hours ago | parent | prev | next [-]

have you tried https://artificialanalysis.ai/

JimDugan 6 hours ago | parent [-]

Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.

IgorPartola 6 hours ago | parent [-]

Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?

exe34 8 hours ago | parent | prev [-]

> asking one model to research the others.

that's basically choosing are random with extra steps!

throwup238 7 hours ago | parent [-]

Research not spit out the answer based on weights. Just ask Gemini/Claude to do deep research on /r/LocalLLama and HN posts.

comrade1234 8 hours ago | parent | prev | next [-]

MIT license that lets you run it on your own hardware and make money off of it.

coder543 7 hours ago | parent [-]

Qwen3 models (including their 235B and 480B models) use the Apache-2.0 license, so it’s not like that’s a big difference here.

coder543 7 hours ago | parent | prev [-]

They seem fairly competitive with each other. You would have to benchmark them for your specific use case.