Remix.run Logo
sho 2 days ago

So, this is the version that's able to serve inference from Huawei chips, although it was still trained on nVidia. So unless I'm very much mistaken this is the biggest and best model yet served on (sort of) readily-available chinese-native tech. Performance and stability will be interesting to see; openrouter currently saying about 1.12s and 30tps, which isn't wonderful but it's day one after all.

For reference, the huawei Ascend 950 that this thing runs on is supposed to be roughly comparable to nVidia's H100 from 2022. In other words, things are hotting up in the GPU war!

alpineman 2 days ago | parent | next [-]

Can't see how NVIDA justifies its valuation/forward P/E ratio with these developments and on-device also becoming viable for 98% of people's needs when it comes to AI

aurareturn 2 days ago | parent [-]

On-device is incredibly far away from being viable. A $20 ChatGPT subscription beats the hell out of the 8B model that a $1,000 computer can run.

Nvidia's forward PE ratio is only 20 for 2026. That's much lower than companies like Walmart and Costco. It's also growing nearly 100% YoY and has a $1 trillion backlog.

I think Nvidia is cheap.

littlestymaar 2 days ago | parent | next [-]

> On-device is incredibly far away from being viable. A $20 ChatGPT subscription beats the hell out of the 8B model that a $1,000 computer can run.

That's a very strange comment. Why would anyone run a dense model on a low-end computer? A 8B model is only going to make sense if you have a dGPU. And a Qwen3.6 or Gemma4 MoE aren't going to be “beaten the hell out” for most tasks especially if you use tools.

Finally, over the lifetime of your computer, your ChatGPT subscription is going to cost more than the cost of your reference computer! So the real question should be whether you're better off with a $1000 computer and a ChatGPT subscription or with a $2000 computer (assuming a conservative lifetime of 4 years for the computer).

My Strix Halo desktop (which I paid ~1700€ before OpenAI derailed the RAM market) paired with Qwen3.5 is a close replacement for a $200/month subscription, so the cost/benefit ratio is strongly in favor of the local model in my use case.

The complexity of following model releases and installing things needed for self-hosting is a valid argument against local models, but it's absolutely not the same thing as saying that local models are too bad to use (which is complete BS).

vibe42 2 days ago | parent | prev | next [-]

I run both MoE and dense models on laptops.

One set of models run on 8GB VRAM / 16GB RAM and another set runs on 24GB VRAM / 64GB RAM. Both are very useful for easy and easy-to-moderate complex code, respectively.

The latest open, small models are incredibly useful even at smaller sizes when configured properly (quant size, sampling params, careful use of context etc).

2ndorderthought 2 days ago | parent | prev | next [-]

8b models can run on laptops. Of course a 1.8T model is more capable, but for a lot of tasks it really isn't 1000x

midwain 2 days ago | parent | prev | next [-]

This is an assessment of the moment. When rate of AI data center construction slows down, then P/E will start to grow. Or are we saying that the pace will only grow forever? There are already signs of a slowdown in construction.

jatora 2 days ago | parent [-]

What are these signs you are referencing? Source?

polski-g 2 days ago | parent [-]

Like why would it slow down? If 1% of human capability is currently replaced with AI, how would things look if that number goes to 15%? When autonomous robots come into fruition as photo recognition improves, demand for compute will skyrocket.

jatora 2 days ago | parent [-]

Exactly, that's why I meet this claim with skepticism. I know I hear news of so and so state/county trying to pass legislation against data centers but I highly doubt that is picking up much speed.

alpineman 2 days ago | parent | prev | next [-]

I think you overestimate what most people are doing with AI. A 2B model can give out relationship advice and tell you how long to boil an egg.

biglyburrito 2 days ago | parent [-]

And honestly, what other types of questions would you ever need answers to?

dannyw 2 days ago | parent | prev [-]

I do think Nvidia isn't that badly priced; they still have the dominance in training and the proven execution

Biggest risk I see is Nvidia having delays / bad luck with R&D / meh generations for long enough to depress their growth projections; and then everything gets revalued.

npodbielski 2 days ago | parent | prev [-]

Great! Can't wait to buy decent GPU for interference for <1k$