| ▲ | Flere-Imsaho 6 hours ago | |
Yeah the future is probably a number of highly specialised small models you can run on your own hardware rather than massive frontier models in the cloud. That's what I'm betting on anyway. | ||
| ▲ | girvo 3 hours ago | parent | next [-] | |
Step 3.7 Flash on my Asus GB10 based mini pc is incredibly close to that today. I’m very impressed, and that’s without MTP to boost performance | ||
| ▲ | thewebguyd 6 hours ago | parent | prev | next [-] | |
That seems to be what Microsoft is betting on also based on what was shown at the BUILD keynote today + that new surface ultra and the surface mini PC with the new Nvidia chip. Nadella really played up local AI as the main use case they have in mind. | ||
| ▲ | search_facility 6 hours ago | parent | prev [-] | |
MOE basically work that way already, QWEN/etc with low active params (A-number in name) allows to inference big models locally (only active params have to fit into memory) | ||