| ▲ | jychang 6 hours ago | |
1.54GB model? You can run this on a raspberry pi. | ||
| ▲ | BoredomIsFun 3 hours ago | parent | next [-] | |
Performance of LLM inference consists of two independent metrics - prompt processing (compute intensive) and token generation (bandwidth intensive). For autocomplete with 1.5B you can get away with abysmal 10 t/s token generation performance, but you'd want as fast as possible prompt processing, pi in incapable of. | ||
| ▲ | gunalx 44 minutes ago | parent | prev [-] | |
if you mean on the new ai hat with npu and integrated 8gb memory, maybe. | ||