| ▲ | vichle 6 hours ago |
| What type of hardware do I need to run a small model like this? I don't do Apple. |
|
| ▲ | bodegajed 6 hours ago | parent | next [-] |
| 1.5B models can run on CPU inference at around 12 tokens per second if I remember correctly. |
| |
| ▲ | moffkalast 6 hours ago | parent [-] | | Ingesting multiple code files will take forever in prompt processing without a GPU though, tg will be the least of your worries. Especially when you don't append but change it in random places so caching doesn't work. |
|
|
| ▲ | jychang 6 hours ago | parent | prev [-] |
| 1.54GB model? You can run this on a raspberry pi. |
| |
| ▲ | BoredomIsFun 3 hours ago | parent | next [-] | | Performance of LLM inference consists of two independent metrics - prompt processing (compute intensive) and token generation (bandwidth intensive). For autocomplete with 1.5B you can get away with abysmal 10 t/s token generation performance, but you'd want as fast as possible prompt processing, pi in incapable of. | |
| ▲ | gunalx 44 minutes ago | parent | prev [-] | | if you mean on the new ai hat with npu and integrated 8gb memory, maybe. |
|