Remix.run Logo
tobz1000 4 hours ago

Panther Lake's efficiency doesn't match M5, but it seems to be very good by all accounts. "Extremely disappointing" is a misrepresentation.

aurareturn 4 hours ago | parent [-]

Scroll to the "Cinebench 2024 Single Power Efficiency" section.[0]

It doesn't even beat Lunar Lake in efficiency (made on TSMC N3B) released in 2024.

[0]https://www.notebookcheck.net/Intel-Panther-Lake-Core-Ultra-...

adrian_b 3 hours ago | parent | next [-]

Besides the fact that Lunar Lake has a lower consumption in the memory interface, which has nothing to do with the fabrication process, single-thread benchmarks cannot be used to compare CPU fabrication processes.

Both the absolute performance and the performance per watt in single-thread benchmarks are determined mainly by the CPU design and they are only slightly constrained by the CPU fabrication process.

Only the multithreaded benchmarks are useful for comparing CMOS fabrication processes, because the performance in multithreaded benchmarks (with a given cooling system) is limited mainly by the energy required to switch a logic gate, which is a characteristic of the fabrication process, and they are only weakly dependent on the CPU design, as long as the CPU design does not have obvious mistakes.

In multithreaded benchmarks, CPUs work at a fixed power consumption, determined by the maximum allowable temperature and the cooling system. A fixed power means a fixed number of gates that switch per second. The completion of a given benchmark requires a similar number of gate switchings in well designed CPUs, in which case the performance in such a benchmark is fully determined by the fabrication process. Deviations from proportionality appear when some CPUs need much less gate switchings than others to complete some work, which happens for example when a CPU has wider vector or matrix execution units, e.g. by supporting AVX-512 or SME or AMX.

aurareturn 2 hours ago | parent [-]

On package memory disproportionally affects idle power consumption more than load. In Cinebench 2024, which is a heavy load test, on package memory likely makes little difference.

ST is far better than MT for this node comparison. MT is heavily influenced by core count, clock speed, core configuration. Panther Lake also has 3 tiers of cores compared to Arrow Lake's 2. The architecture for MT is entirely different.

Meanwhile, for ST, a core is a core. It's less or not affected by architectural changes to core configurations.

williadc 4 hours ago | parent | prev [-]

From the article you linked:

> With the new Panther Lake mobile processors, Intel has managed to successfully combine the two previous generations, Arrow Lake and Lunar Lake, as the performance is even better than with Arrow Lake, while efficiency has been improved at the same time. Even with low power limits, the performance is very competitive, and Intel (in conjunction with the new GPUs) is therefore the better choice for slim laptops.

aurareturn 4 hours ago | parent [-]

Notice how it doesn't say it's more efficient than Lunar Lake.

Their benchmarks say LNL is more efficient.

adrian_b 3 hours ago | parent [-]

The performance with LNL is not apples-to-apples, like the comparison with Arrow Lake H.

LNL has a much lower power consumption in the memory interface, like the Apple CPUs, which has nothing to do with the fabrication process. Also LNL is a lower performance CPU, for which it is normal to have better energy efficiency.

Only the comparison between Panther Lake and Arrow Lake H, which have equivalent structures, can be used to compare the Intel 18A and the TSMC 3-nm fabrication processes.

This comparison shows that Intel 18A ensures a better performance per watt, i.e. energy efficiency, which leads to a better multithreaded performance, but the TSMC 3-nm process, at least for now, allows higher maximum clock frequencies, which make possible a higher single-thread performance.

aurareturn 3 hours ago | parent [-]

On-package memory disproportionately affect idle power more than load power. The benchmark was done with Cinebench 2024 which is a heavy load test. Therefore, LNL's on package memory would have made little to no difference overall to perf/watt in Cinebench 2024 ST.

  Only the comparison between Panther Lake and Arrow Lake H, which have equivalent structures, can be used to compare the Intel 18A and the TSMC 3-nm fabrication processes.
Panther Lake uses a new core design which likely contributed to better perf/watt regardless of which node was used. For example, Zen3 had a 19% increase in IPC despite being on the same N7 family node as Zen2. Panther Lake has 3 tiers of cores instead of 2 in Arrow Lake. The MT design is very different. New core and layout designs can make a huge difference in efficiency on the same node.

  This comparison shows that Intel 18A ensures a better performance per watt, i.e. energy efficiency, which leads to a better multithreaded performance, but the TSMC 3-nm process, at least for now, allows higher maximum clock frequencies, which make possible a higher single-thread performance.
We should compare ST perf/watt instead of MT. MT has too many factors including core count, die size, transistor count, clock speed.

Based on ST perf/watt, Intel 18A is likely a bit worse than N3B (2022 node) and a bit better than N4P (2021 node).

adrian_b 2 hours ago | parent [-]

Panther Lake does not have new CPU cores.

The Panther Lake cores, i.e. Darkmont and Cougar Cove are the Arrow Lake/Lunar Lake cores, i.e. Skymont and Lion Cove, ported from the TSMC 3 nm to the Intel 18A fabrication process.

The Panther Lake cores have only minor changes, i.e. bug fixes and the addition of a new mechanism for interrupts and exceptions, FRED. A preliminary version of FRED is likely to have already been implemented on Arrow Lake/Lunar Lake, but if so it was disabled there after production.

In any case FRED will not cause improvements in the present benchmarks, as it is used only inside the operating system and the current operating systems are unlikely to have been updated to use it anyway.

In contradiction with what you say, ST performance or performance per watt cannot be used to compare fabrication processes but only the multithreaded performance can bu used for this purpose.

Single-thread performance is affected by a lot of factors that have nothing to do with the fabrication process, but all those have little or no influence on multithreaded performance.

The reason is that in any well optimized MT workload, the CPU runs at a constant power consumption. This eliminates the influence of all factors mentioned by you.

I have already explained in another comment that a constant power consumption means a constant number of gate switchings per second, which is determined by the energy required to switch a logical gate, which is a characteristic of a fabrication process.

When a given amount of work is done by a benchmark using the same algorithm, well-designed CPUs will need approximately the same number of gate switchings to complete the work, regardless of the number of cores included in a CPU.

Significant variations of the numbers of gate switchings can be caused only by architectural differences like the width of vector and matrix execution units. Smaller variations are caused by various quality characteristics of a CPU core design, like the frequencies of branch mispredictions and of cache misses, which should be similar for CPU design teams that do not differ much in competence.

When we compare equivalent cores in different fabrication processes, like Arrow Lake H vs. Panther Lake, the multithreaded benchmarks are almost unaffected by anything else except the fabrication process, assuming that the cooling systems are also equivalent.