Chip CEO here. It really depends on what "design" or "production" means. Does "design" mean that the design was complete? Does "production" mean the beginning of production, i.e. tapeout? If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip. If measuring from concept (no RTL at all, block diagram of architecture) to tapeout, this is an amazing timeline. The truth is probably somewhere in between. A more concrete statement would use actual technical milestones and gates.

▲

otterdude 7 hours ago | parent | next [-]

Not a chip CEO, but I read this article and thought that they're working on some kind of application specific chip only for serving models. Similar to how an FPGA can optimize certain tasks.

Given constant weights / biases of a Transformer / DNN you could use pipelining to feed forward calculations through the array one layer at a time. For DNN's with thousands of layers you might see 1:1 speed up per layer channel.

I doubt they would undergo this process for marginal gains.

▲

zgao 5 hours ago | parent | next [-]

Yes, my statement was not about the quality or performance of the chip -- simply the tapeout timeline that was stated, by itself.

▲

xdavidliu 7 hours ago | parent | prev [-]

i don't understand what the second paragraph is saying.

▲

nine_k 6 hours ago | parent | next [-]

In very crude terms, AFAICT, if you have a bunch of matrix multiplications, but one of matrices (the one with model weights) doesn't change, you can seriously speed up the computation. One thing is that you don't need to re-fetch the elements of the constant matrix, you can keep it near the ALUs. Then you maybe can detect and ignore sparse / empty blocks by marking them once.

IDK how the custom hardware exploits this; would love to hear any ideas!

▲

guyomes 5 hours ago | parent | next [-]

> IDK how the custom hardware exploits this; would love to hear any ideas!

You might like this article [1], titled "FPGA-based CNN Acceleration using Pattern-Aware Pruning". More context and details can be found in the PhD thesis of Léo Pradels [2].

[1]: https://inria.hal.science/hal-04689673/document

[2]: https://theses.hal.science/tel-05021575v1/file/PRADELS_Leo.p...

▲

cm2187 3 hours ago | parent | prev [-]

Random thought. Once models stabilise, could you possibly hardcode the model in gates? Or are they too large for a single chip?

	▲	8note 3 hours ago \| parent [-]
		https://www.anuragk.com/blog/posts/Taalas.html

▲

otterdude 6 hours ago | parent | prev [-]

Basically getting around the branch predictor problem with generalized compute architectures https://en.wikipedia.org/wiki/Branch_predictor

▲

pama an hour ago | parent | prev | next [-]

If you look at the timelines for the hiring of the hardware team, this was an extremely fast and high risk implementation from concept to tapeout. Amazing it works at all during bringup.

▲

nonethewiser 7 hours ago | parent | prev | next [-]

>If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip.

Even for a company’s first design?

▲

hailwren 7 hours ago | parent | next [-]

I don't think you get the newcomer novelty buff when your val approaches 13 digits.

▲

RugnirViking 5 hours ago | parent [-]

Big companies are lumbering behemoth, crude assemblages of barely cobbled-together incentives and principal agent problems in a trenchcoat. Getting them to change direction, or worse, try something new at scale, is a massive undertaking

▲

mlinhares 4 hours ago | parent [-]

Nah, you just need to get the CEO behind it. Most coordination issues get solved when the CEO is breathing down your neck to get something done. Trouble is that they don't do this enough.

▲

NBJack 3 hours ago | parent [-]

Eh, zero guarantees on that one.

The Fire Phone was Jeff Bezos' personal baby, and we know how that went. Then there was the Apple G4 Cube with Steve Jobs, the Model X' Falcon Wing doors and Elon, and lets not even talk about the Metaverse and Zuck.

	▲	aleph_minus_one 3 hours ago \| parent \| next [-]
		> The Fire Phone was Jeff Bezos' personal baby, and we know how that went. I'd rather guess that Jeff Bezos' opinion on what makes a good phone is/was different on the opinion of many potential buyers.
	▲	kQq9oHeAz6wLLS 2 hours ago \| parent \| prev [-]
		Actually, you've provided examples that prove the point. None of those were especially good (though everyone wanted the G4 Cube), and yet they made it to market anyway. Why? Because the CEO was behind it, breathing down their necks.

▲

zgao 5 hours ago | parent | prev | next [-]

The typical way a chip effort in a non-chip company works is that the "design" is the RTL (e.g. SystemVerilog that defines the behavior of the chip) and then this is handed off to a third-party "design house" (such as Broadcom) that turns that code into a real image of a chip, which is called a GDS (basically you can think of this as a very big layer by layer photoshop file) that can actually be sent to a fab. This is called "backend design", in contrast to the "frontend design" (the RTL itself).

As another commenter said, Broadcom is very experienced with backend design (as well as the supply chain management, testing, etc. that comes after the chip is taped out) and so this can't be regarded as a "first chip". Richard Ho (the head of hardware at OpenAI) is also extremely experienced and used to be the head of the Google TPU effort -- where he actually worked with Broadcom in a similar tapeout already. So yes, this is not a "first design"!

▲

surajrmal 3 hours ago | parent [-]

I wonder if broadcomm borrowed IP between the Google tpu and this design. How would you ever know it didn't happen?

	▲	zgao 35 minutes ago \| parent \| next [-]
		There is no real way to prevent this, but there are ways to increase the cost of doing so. For example, one level of obfuscation is, OAI could internally run synthesis and adopt a “netlist-in” model in which Broadcom gets a netlist - a description of a huge amount of gates and wires and how they connect - instead of the plain Verilog (or other language). It is possible to reverse engineer the netlist, but it’s a certain level of indirection and effort. A big part of the semiconductor industry also operates on a reputation basis. Broadcom (like TSMC) is a neutral party as a design house, but if they did something like this, it might ruin that reputation.
	▲	kQq9oHeAz6wLLS 2 hours ago \| parent \| prev [-]
		More likely that the AI training set contained the IP of others, and we all know how that turns out.

▲

formerly_proven 7 hours ago | parent | prev [-]

This isn't Broadcom's first design.

▲

swiftcoder 6 hours ago | parent [-]

Yeah, "first chip" here likely means they contracted Broadcom (or a firm with similar experience) to do all the heavy lifting. Building out your own in-house teams for this sort of thing is a decade-long project - just look how much inside Apple's early chips was licensed ARM / PowerVR cores

▲

MisterTea 6 hours ago | parent [-]

Apple didn't have the talent in-house until they bought Intrincity who worked with Samsung on Apple's earlier Arm chips as well. https://en.wikipedia.org/wiki/Intrinsity

▲

selectodude 5 hours ago | parent [-]

I think the folks at PA Semi had some chops too.

▲

stinkbeetle 4 hours ago | parent | next [-]

PA Semi group did the logic designs. I think they're talking about physical design though.

▲

reinitctxoffset 4 hours ago | parent | prev [-]

The way I heard it PA Semi was the singular driving force that led to Apple Silicon, but I'm not any kind of insider that's just the chatter I heard.

Whoever it was, whooo, that's hot shit. I remember an M1 MacBook Air just cleaning the clock of an Intel MacBook Pro and thinking "x86_64 has real competition again".

Great silicon. I'm over it with not having root on my own machine, so I've left the ecosystem, but it's really nice hardware, can't dispute that.

	▲	re-thc an hour ago \| parent [-]
		> The way I heard it PA Semi was the singular driving force that led to Apple Silicon And a lot of them are sitting under Qualcomm via the Nuvia acquisition.

▲

dndmfnfn an hour ago | parent | prev [-]

[dead]