| ▲ | small_model 5 hours ago | |||||||||||||
Poverty of imagination here, plenty uses of this and its a prototype at this stage. | ||||||||||||||
| ▲ | ACCount37 5 hours ago | parent [-] | |||||||||||||
What uses, exactly? The prototype is: silicon with a Llama 3.1 8B etched into it. Today's 4B models already outperform it. Token rate in five digits is a major technical flex, but, does anyone really need to run a very dumb model at this speed? The only things that come to mind that could reap a benefit are: asymmetric exotics like VLA action policies and voice stages for V2V models. Both of which are "small fast low latency model backed by a large smart model", and both depend on model to model comms, which this doesn't demonstrate. In a way, it's an I/O accelerator rather than an inference engine. At best. | ||||||||||||||
| ||||||||||||||