Remix.run Logo
Flere-Imsaho 6 hours ago

Yeah the future is probably a number of highly specialised small models you can run on your own hardware rather than massive frontier models in the cloud.

That's what I'm betting on anyway.

girvo 3 hours ago | parent | next [-]

Step 3.7 Flash on my Asus GB10 based mini pc is incredibly close to that today. I’m very impressed, and that’s without MTP to boost performance

thewebguyd 6 hours ago | parent | prev | next [-]

That seems to be what Microsoft is betting on also based on what was shown at the BUILD keynote today + that new surface ultra and the surface mini PC with the new Nvidia chip. Nadella really played up local AI as the main use case they have in mind.

search_facility 6 hours ago | parent | prev [-]

MOE basically work that way already, QWEN/etc with low active params (A-number in name) allows to inference big models locally (only active params have to fit into memory)