Remix.run Logo
nl 12 hours ago

Taalas is interesting. 16,000 TPS for Llama on a chip.

https://taalas.com/

Nihilartikel 2 hours ago | parent | next [-]

Neat! I had been wondering if anyone was trying to implement a model in silico. We're getting closer to having chatty talking toasters every day now!

empath75 2 hours ago | parent [-]

"What is my purpose..."

https://www.youtube.com/watch?v=sa9MpLXuLs0

micw 9 hours ago | parent | prev | next [-]

On a very old model, it's more like 16.000 garbage words/s

patapong 4 hours ago | parent | next [-]

I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps?

nl 9 hours ago | parent | prev [-]

Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.

They are doing an updated model in a month or so anyway, then a frontier level one "by summer".

replete 8 hours ago | parent | prev | next [-]

Its exciting to see, but look at the die size for only an 8b model

DeathArrow 9 hours ago | parent | prev [-]

I wonder how many token per seconds can they get if they put Mercury 2 on a chip.