TOP500 at ISC’26: We have a New Number 1 Supercomputer

brianolson an hour ago | parent | next [-]

> Why aren’t these AI companies submitting to the TOP500 to show off their computing prowess?

my knowledge is 10+ years out of date, but once upon a time if they'd chosen to, Google could have had _several_ entries in the top 10 of the TOP500 list

It's just poker, they didn't want to tip their hand

▲

ziofill 19 minutes ago | parent | next [-]

Also, would those 550k Blackwell have good FP64 performance? How would one even compare them?

▲

iberator an hour ago | parent | prev [-]

Cloud computing is not a supercomputer. Different architecture, bandwitch, interconnectivity and latencies.

▲

dgacmu an hour ago | parent | next [-]

That's not nearly as true when you look at AI training clusters. They're basically supercomputers but without an FP64 focus.

(These are the systems to which GP was referring at Google.)

	▲	cynicalkane 12 minutes ago \| parent \| next [-]
		Even before AI training clusters became important, Google has had an outstanding custom fabric (there's papers about it) together with the ability to tune NICs for their own cases, and "their own cases" meant nearly everything engineered within Google. Ethernet hardware has had low kernel latency and DMA for a long time; it's the rest of the stack that hurts. But as far back as the early 2010s (if not further back, that goes beyond my knowledge horizon), you could just make it not hurt, if you had the software engineers to do it.
	▲	jeffbee 17 minutes ago \| parent \| prev [-]
		I thought TPUs couldn't reasonably run LINPACK at all because TPUs do not acknowledge that FP64 exists. I know Google wants to compare their stuff to El Capitan or whatever but the comparison does not seem valid to me.

▲

wmf 18 minutes ago | parent | prev [-]

Historically there have been a bunch of clusters on the Top 500 that weren't used for HPC. The tell is that they used Ethernet (this was before RoCE). It's less efficient but you can still get an OK Linpack score.

▲

flopsamjetsam 29 minutes ago | parent | prev | next [-]

> We think it is highly likely that these LX2 chiplets are etched using SMIC 7 nanometer processes at the N+3 refinement, and we base that on the fact that the chip only runs at 1.55 GHz. That is nowhere near the 3 GHz that SMIC can push with that process, but it is probably lower to get the memory and core speeds more balanced. [1]

Based on the ARMv9.2.

[1] https://www.nextplatform.com/hpc/2026/06/25/a-deep-dive-on-c...

▲

jandrewrogers 34 minutes ago | parent | prev | next [-]

TOP500 hasn't been a particularly useful measure of practical computing power in modern systems for many years because what it measures isn't a significant bottleneck in most real systems. It has become a measure of how much money someone is willing to spend for bragging rights. (HPCG is better in that it is a bit more bandwidth focused but still pretty narrow.)

Most companies with huge systems don't participate.

	▲	bee_rider 5 minutes ago \| parent [-]
		I wonder if there would have been an opportunity to generate some finer-grained benchmarks with something like BiCGStab+ILU (or maybe CG+incomplete cholesky). Instead of CG+Gauss Seidel. The pitch being, you might have made different memory vs compute trade-offs with designing your cluster, but you should be able to select a fill-in factor for the preconditioner to suit it.

▲

ziofill 30 minutes ago | parent | prev | next [-]

> Two cores are disabled per cluster.

I’m sure there is a good reason for this, which is..?

	▲	jandrewrogers 24 minutes ago \| parent [-]
		It is likely that those cores are dedicated to unrelated management, monitoring, and administrative tasks. This is common and many workloads are throttled on bandwidth anyway. For the purposes of the benchmark, those cores are not participating in the workload.

▲

dgellow an hour ago | parent | prev | next [-]

Just glad to see Hamburg mentioned :) Hope you all didn’t suffer too much through the current heatwave

▲

2OEH8eoCRo0 an hour ago | parent | prev | next [-]

Extremely impressive accomplishment considering they did this with Chinese interconnects and Chinese chips. This is a wake up call.

▲

jandrewrogers 30 minutes ago | parent | next [-]

TOP500 can be done with inexpensive silicon. It is more about a willingness to aggregate enough hardware in one place. As a benchmark, it tells you almost nothing about computing power or scalability for other applications because it doesn't exercise the bottlenecks most high-scale applications have.

▲

echelon an hour ago | parent | prev [-]

We're too busy regulating the tech, not granting access to US engineers and companies, arguing against power and data centers, stopping skilled immigration.

This is absolutely going to bite us in the face in five to ten years.

	▲	2OEH8eoCRo0 an hour ago \| parent [-]
		Separate issue that has nothing to do with US manufacturing or HPC. I think our retreat from science funding and offshoring advanced manufacturing is a bigger issue.

▲

lokimedes an hour ago | parent | prev | next [-]

Would the AI “GW-scale” clusters be able to run the Top500 benchmarks meaningfully? And what might be the outcome?

	▲	wmf 15 minutes ago \| parent [-]
		Yes, they should score well on Linpack as long as they use Ozaki emulation.

▲

rippeltippel an hour ago | parent | prev | next [-]

Previously on HN: https://news.ycombinator.com/item?id=48658334

▲

techsystems 2 hours ago | parent | prev [-]

Is it the first to reach 2 exaflops?