Remix.run Logo
aurareturn 6 hours ago

Edit: it seems like this is likely one chip and not 10. I assumed 8B 16bit quant with 4K or more context. This made me think that they must have chained multiple chips together since N6 850mm2 chip would only yield 3GB of SRAM max. Instead, they seem to have etched llama 8B q3 with 1k context instead which would indeed fit the chip size.

This requires 10 chips for an 8 billion q3 param model. 2.4kW.

10 reticle sized chips on TSMC N6. Basically 10x Nvidia H100 GPUs.

Model is etched onto the silicon chip. So can’t change anything about the model after the chip has been designed and manufactured.

Interesting design for niche applications.

What is a task that is extremely high value, only require a small model intelligence, require tremendous speed, is ok to run on a cloud due to power requirements, AND will be used for years without change since the model is etched into silicon?

pjc50 5 hours ago | parent | next [-]

Where are those numbers from? It's not immediately clear to me that you can distribute one model across chips with this design.

> Model is etched onto the silicon chip. So can’t change anything about the model after the chip has been designed and manufactured.

Subtle detail here: the fastest turnaround that one could reasonably expect on that process is about six months. This might eventually be useful, but at the moment it seems like the model churn is huge and people insist you use this week's model for best results.

aurareturn 5 hours ago | parent | next [-]

  > The first generation HC1 chip is implemented in the 6 nanometer N6 process from TSMC. Each HC1 chip has 53 billion transistors on the package, most of it very likely for ROM and SRAM memory. The HC1 card burns about 200 watts, says Bajic, and a two-socket X86 server with ten HC1 cards in it runs 2,500 watts.
https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...
darkwater 4 hours ago | parent | next [-]

And what of that makes you assume that having a server with 10 HC1 cards is needed to run a single model version on that server?

dakolli 5 hours ago | parent | prev [-]

So it lights money on fire extra fast, AI focused VCs are going to really love it then!!

adityashankar 5 hours ago | parent | prev | next [-]

This depends on how much better the models will get from now in, if Claude Opus 4.6 was transformed into one of these chips and ran at a hypothetical 17k tokens/second, I'm sure that would be astounding, this depends on how much better claude Opus 5 would be compared to the current generation

aurareturn 5 hours ago | parent | next [-]

I’m pretty sure they’d need a small data center to run a model the size of Opus.

empath75 3 hours ago | parent | prev [-]

Even an O3 quality model at that speed would be incredible for a great many tasks. Not everything needs to be claude code. Imagine Apple fine tuning a mid tier reasoning model on personal assistant/MacOs/IOS sorts of tasks and burning a chip onto the mac studio motherboard. Could you run claude code on it? Probably not, would it be 1000x better than Siri? absolutely.

empath75 3 hours ago | parent | prev [-]

100x of a less good model might be better than 1 of a better model for many many applications.

This isn't ready for phones yet, but think of something like phones where people buy new ones every 3 years and even having a mediocre on-device model at that speed would be incredible for something like siri.

danpalmer 5 hours ago | parent | prev | next [-]

Alternatively, you could run far more RAG and thinking to integrate recent knowledge, I would imagine models designed for this putting less emphasis on world knowledge and more on agentic search.

freeone3000 5 hours ago | parent [-]

Maybe; models with more embedded associations are also better at search. (Intuitively, this tracks; a model with no world knowledge has no awareness of synonyms or relations (a pure markov model), so the more knowledge a model has, the better it can search.) It’s not clear if it’s possible to build such a model, since there doesn’t seem to be a scaling cliff.

machiaweliczny 4 hours ago | parent | prev | next [-]

A lot of NLP tasks could benefit from this

Shaanveer 5 hours ago | parent | prev | next [-]

ceo

charcircuit 5 hours ago | parent [-]

No one would never give such a weak model that much power over a company.

teaearlgraycold 5 hours ago | parent | prev | next [-]

I'm thinking the best end result would come from custom-built models. An 8 billion parameter generalized model will run really quickly while not being particularly good at anything. But the same parameter count dedicated to parsing emails, RAG summarization, or some other specialized task could be more than good enough while also running at crazy speeds.

5 hours ago | parent | prev | next [-]
[deleted]
thrance 5 hours ago | parent | prev [-]

> What is a task that is extremely high value, only require a small model intelligence, require tremendous speed, is ok to run on a cloud due to power requirements, AND will be used for years without change since the model is etched into silicon?

Video game NPCs?

aurareturn 5 hours ago | parent [-]

Doesn’t pass the high value and require tremendous speed tests.

thrance 2 hours ago | parent [-]

Video games are a huge market, and speed and cost of current models are definitely huge barriers to integrating LLMs in video games.