Remix.run Logo
refulgentis 12 hours ago

I'm very worried for both.

Cerebras requires a $3K/year membership to use APIs.

Groq's been dead for about 6 months, even pre-acquisition.

I hope Inception is going well, it's the only real democratic target at this. Gemini 2.5 Flash Lite was promising but it never really went anywhere, even by the standards of a Google preview

nl 12 hours ago | parent | next [-]

Taalas is interesting. 16,000 TPS for Llama on a chip.

https://taalas.com/

Nihilartikel 2 hours ago | parent | next [-]

Neat! I had been wondering if anyone was trying to implement a model in silico. We're getting closer to having chatty talking toasters every day now!

empath75 2 hours ago | parent [-]

"What is my purpose..."

https://www.youtube.com/watch?v=sa9MpLXuLs0

micw 9 hours ago | parent | prev | next [-]

On a very old model, it's more like 16.000 garbage words/s

patapong 4 hours ago | parent | next [-]

I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps?

nl 9 hours ago | parent | prev [-]

Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.

They are doing an updated model in a month or so anyway, then a frontier level one "by summer".

replete 8 hours ago | parent | prev | next [-]

Its exciting to see, but look at the die size for only an 8b model

DeathArrow 9 hours ago | parent | prev [-]

I wonder how many token per seconds can they get if they put Mercury 2 on a chip.

freeqaz 12 hours ago | parent | prev | next [-]

You can call Cerebras APIs via OpenRouter if you specify them as the provider in your request fyi. It's a bit pricier but it exists!

andai 11 hours ago | parent [-]

I used their API normally (pay per token) a few weeks ago. Their Coding Plan appears to be permanently sold out though.

ainch 12 hours ago | parent | prev | next [-]

I don't think it's a good comparison given Inception work on software and Cerebras/Groq work on hardware. If Inception demonstrate that diffusion LLMs work well at scale (at a reasonable price) then we can probably expect all the other frontier labs to copy them quickly, similarly to OpenAI's reasoning models.

refulgentis 12 hours ago | parent [-]

Definitely depends on what you're buying, maybe some of the audience here was buying Groq and Cerebras chips? I don't think they sold them but can't say for sure.

If you're a poor schmoke like me, you'd be thinking of them as API vendors of ~1000 token/s LLMs.

Especially because Inception v1's been out for a while and we haven't seen a follow-the-leader effect.

Coincidentally, that's one of my biggest questions: why not?

Leynos 6 hours ago | parent | prev | next [-]

Cerebras are on OpenRouter.

estsauver 12 hours ago | parent | prev | next [-]

I am currently using their APIs on a paygo plan, I think it might just be a capacity issue for new sign ups.

7thpower 12 hours ago | parent | prev | next [-]

What do you mean by Grow is dead since about 6 months ago? Not refuting your point, but I’m curious.

refulgentis 12 hours ago | parent [-]

No new model since GPT-OSS 120B, er maybe Kimi K2 not-thinking? Basically there were a couple models it normally obviously support, and it didn't.

Something about that Nvidia sale smelled funny to me because the # was yuge, yet, the software side shut down decently before the acquisition.

But that's 100% speculation, wouldn't be shocked if it was:

"We were never looking to become profitable just on API users, but we had to have it to stay visible. So, yeah, once it was clear an Nvidia sale was going through, we stopped working 16 hours a day, and now we're waiting to see what Nvidia wants to do with the API"

behnamoh 10 hours ago | parent | prev [-]

Once again, it's a tech that Google created but never turned into a product. AFAIK in their demo last year, Google showed a special version of Gemini that used diffusion. They were so excited about it (on the stage) and I thought that's what they'd use in Google search and Gmail.