|
| ▲ | 0cf8612b2e1e 3 hours ago | parent | next [-] |
| In the hypothetical fruit sorting example, if you have a hard budget of 10 msec to respond and the 7B takes 8 msec and the 14B takes 12msec, there is your imaginary answer. Regular engineering where you have to balance competing constraints instead of running the biggest available. |
|
| ▲ | 0xbadcafebee 3 hours ago | parent | prev | next [-] |
| ....because sometimes people need a faster answer? There's many possible reasons someone might need speed over accuracy. In the food sorting example, if lower accuracy means you waste more peanuts, but the speed means you get rid of more bad peanuts overall, then you get fewer complaints about bad peanuts, with a tiny amount of extra material waste. |
|
| ▲ | jwatte an hour ago | parent | prev [-] |
| Hard real time is a thing in some systems.
Also, the current approaches might have 85% accuracy -- if the LLM can deliver 90% accuracy while being "less exact" that's still a win! |