Remix.run Logo
jeffhuys 5 hours ago

Check chatjimmy.ai

lelandbatey 3 hours ago | parent [-]

https://chatjimmy.ai being a demo of the "burn the model to an ASIC" approach being sold by Taalas[0], an approach which they use to run Llama 3.1 8B at ~17000 tokens per second.

[0] - https://taalas.com/products/