| ▲ | lumost 3 hours ago | ||||||||||||||||
Capex, opex, quality, and volume are tricky things to balance. On balance, pc/mobile are cheaper to operate than equivalent cloud and on prem deployments. It’s not unreasonable to suppose that in 2 years time an opus 5 quality model will be etched into silicon for high performance local inference. Then you just upgrade your model every 2-3 years by upgrading your hardware. | |||||||||||||||||
| ▲ | jazzyjackson 2 hours ago | parent [-] | ||||||||||||||||
I haven't been following anyone baking models into ASICs, is it not still necessary to pack just as many transistors onto a chip, whether it's an NPU or GPU, ASIC or not you still need to hold hundreds of gigabytes in memory, so how is it cheaper to bake it onto custom silicon than running it on commodity VRAM? (Asking because I don't know!) | |||||||||||||||||
| |||||||||||||||||