Remix.run Logo
aurareturn 4 hours ago

Notice how it doesn't say it's more efficient than Lunar Lake.

Their benchmarks say LNL is more efficient.

adrian_b 3 hours ago | parent [-]

The performance with LNL is not apples-to-apples, like the comparison with Arrow Lake H.

LNL has a much lower power consumption in the memory interface, like the Apple CPUs, which has nothing to do with the fabrication process. Also LNL is a lower performance CPU, for which it is normal to have better energy efficiency.

Only the comparison between Panther Lake and Arrow Lake H, which have equivalent structures, can be used to compare the Intel 18A and the TSMC 3-nm fabrication processes.

This comparison shows that Intel 18A ensures a better performance per watt, i.e. energy efficiency, which leads to a better multithreaded performance, but the TSMC 3-nm process, at least for now, allows higher maximum clock frequencies, which make possible a higher single-thread performance.

aurareturn 3 hours ago | parent [-]

On-package memory disproportionately affect idle power more than load power. The benchmark was done with Cinebench 2024 which is a heavy load test. Therefore, LNL's on package memory would have made little to no difference overall to perf/watt in Cinebench 2024 ST.

  Only the comparison between Panther Lake and Arrow Lake H, which have equivalent structures, can be used to compare the Intel 18A and the TSMC 3-nm fabrication processes.
Panther Lake uses a new core design which likely contributed to better perf/watt regardless of which node was used. For example, Zen3 had a 19% increase in IPC despite being on the same N7 family node as Zen2. Panther Lake has 3 tiers of cores instead of 2 in Arrow Lake. The MT design is very different. New core and layout designs can make a huge difference in efficiency on the same node.

  This comparison shows that Intel 18A ensures a better performance per watt, i.e. energy efficiency, which leads to a better multithreaded performance, but the TSMC 3-nm process, at least for now, allows higher maximum clock frequencies, which make possible a higher single-thread performance.
We should compare ST perf/watt instead of MT. MT has too many factors including core count, die size, transistor count, clock speed.

Based on ST perf/watt, Intel 18A is likely a bit worse than N3B (2022 node) and a bit better than N4P (2021 node).

adrian_b 2 hours ago | parent [-]

Panther Lake does not have new CPU cores.

The Panther Lake cores, i.e. Darkmont and Cougar Cove are the Arrow Lake/Lunar Lake cores, i.e. Skymont and Lion Cove, ported from the TSMC 3 nm to the Intel 18A fabrication process.

The Panther Lake cores have only minor changes, i.e. bug fixes and the addition of a new mechanism for interrupts and exceptions, FRED. A preliminary version of FRED is likely to have already been implemented on Arrow Lake/Lunar Lake, but if so it was disabled there after production.

In any case FRED will not cause improvements in the present benchmarks, as it is used only inside the operating system and the current operating systems are unlikely to have been updated to use it anyway.

In contradiction with what you say, ST performance or performance per watt cannot be used to compare fabrication processes but only the multithreaded performance can bu used for this purpose.

Single-thread performance is affected by a lot of factors that have nothing to do with the fabrication process, but all those have little or no influence on multithreaded performance.

The reason is that in any well optimized MT workload, the CPU runs at a constant power consumption. This eliminates the influence of all factors mentioned by you.

I have already explained in another comment that a constant power consumption means a constant number of gate switchings per second, which is determined by the energy required to switch a logical gate, which is a characteristic of a fabrication process.

When a given amount of work is done by a benchmark using the same algorithm, well-designed CPUs will need approximately the same number of gate switchings to complete the work, regardless of the number of cores included in a CPU.

Significant variations of the numbers of gate switchings can be caused only by architectural differences like the width of vector and matrix execution units. Smaller variations are caused by various quality characteristics of a CPU core design, like the frequencies of branch mispredictions and of cache misses, which should be similar for CPU design teams that do not differ much in competence.

When we compare equivalent cores in different fabrication processes, like Arrow Lake H vs. Panther Lake, the multithreaded benchmarks are almost unaffected by anything else except the fabrication process, assuming that the cooling systems are also equivalent.