| ▲ | jeffhuys 5 hours ago | |
Check chatjimmy.ai | ||
| ▲ | lelandbatey 3 hours ago | parent [-] | |
https://chatjimmy.ai being a demo of the "burn the model to an ASIC" approach being sold by Taalas[0], an approach which they use to run Llama 3.1 8B at ~17000 tokens per second. | ||