At what point do the OEMs begin to realize they don’t have to follow the current mindset of attaching a GPU to a PC and instead sell what looks like a GPU with a PC built into it?

▲

lizknope 16 hours ago | parent | next [-]

The vast majority of computers sold today have a CPU / GPU integrated together in a single chip. Most ordinary home users don't care about GPU or local AI performance that much.

In this video Jeff is interested in GPU accelerated tasks like AI and Jellyfin. His last video was using a stack of 4 Mac Studios connected by Thunderbolt for AI stuff.

https://www.youtube.com/watch?v=x4_RsUxRjKU

The Apple chips have both power CPU and GPU cores but also have a huge amount of memory (512GB) directly connected unlike most Nvidia consumer level GPUs that have far less memory.

▲

onion2k 13 hours ago | parent [-]

Most ordinary home users don't care about GPU or local AI performance that much.

Right now, sure. There's a reason why chip manufacturers are adding AI pipelines, tensor processors, and 'neural cores' though. They believe that running small local models are going to be a popular feature in the future. They might be right.

▲

swiftcoder 12 hours ago | parent [-]

It's mostly marketing gimmicks though - they aren't adding anywhere near enough compute for that future. The tensor cores in an "AI ready" laptop from a year ago are already pretty much irrelevant as far as inferencing current-generation models go.

	▲	zozbot234 12 hours ago \| parent [-]
		NPU/Tensor cores are actually very useful for prompt pre-processing, or really any ML inference task that isn't strictly bandwidth limited (because you end up wasting a lot of bandwidth on padding/dequantizing data to a format that the NPU can natively work with, whereas a GPU can just do that in registers/local memory). Main issue is the limited support in current ML/AI inference frameworks.

▲

nightshift1 a day ago | parent | prev | next [-]

Exactly. With the Intel-Nvidia partnership signed this September, I expect to see some high-performance single-board computers being released very soon. I don't think the atx form-factor will survive another 30 years.

▲

bostik 20 hours ago | parent [-]

One should also remember that NVidia does have organisational experience on designing and building CPUs[0].

They were a pretty big deal back in ~2010, and I have to admit I didn't know that Tegra was powering Nintendo Switch.

0: https://en.wikipedia.org/wiki/Tegra

	▲	goku12 18 hours ago \| parent \| next [-]
		I had a Xolo Tegra Note 7 tablet (marketed in the US as EVGA Tegra Note 7) in around 2013. I preordered it as far as I remember. It had a Tegra 4 SoC with quad core Cortex A15 CPU and a 72 core GeForce GPU. Nvidia used to claim that it is the fastest SoC for mobile devices at the time. To this day, it's the best mobile/Android device I ever owned. I don't know if it was the fastest, but it certainly was the best performing one I ever had. UI interactions were smooth, apps were fast on it, screen was bright, touch was perfect and still had long enough battery backup. The device felt very thin and light, but sturdy at the same time. It had a pleasant matte finish and a magnetic cover that lasted as long as the device did. It spolied the feel of later tablets for me. It had only 1 GB RAM. We have much more powerful SoCs today. But nothing ever felt that smooth (iPhone is not considered). I don't know why it was so. Perhaps Android was light enough for it back then. Or it may have had a very good selection and integration of subcomponents. I was very disappointed when Nvidia discontinued the Tegra SoC family and tablets.
	▲	miladyincontrol 14 hours ago \| parent \| prev \| next [-]
		I'd argue their current CPUs aren't to be discounted either. Much as people love to crown Apple's M-series chips as the poster child of what arm can do, Nvidia's grace CPUs too trade blows with the best of the best. It leaves one to wonder what could be if they had any appetite for devices more in the consumer realm of things.
	▲	14 hours ago \| parent \| prev [-]
		[deleted]

▲

themafia 16 hours ago | parent | prev | next [-]

At this point what you really need is an incredibly powerful heatsink with some relatively small chips pressed against it.

	▲	jnwatson 12 hours ago \| parent \| next [-]
		If you disassemble a modern GPU, that's what you'll find. 95% by weight of a GPU card is cooling related.
	▲	whywhywhywhy 14 hours ago \| parent \| prev [-]
		Transhcan Mac Pro was this idea, triangular heatsink core cpu+gpu+gpu for each side

▲

pjmlp 21 hours ago | parent | prev | next [-]

So basically going back to the old days of Amiga and Atari, in a certain sense, when PCs could only display text.

▲

goku12 19 hours ago | parent | next [-]

I'm not familiar with that history. Could you elaborate?

▲

pjmlp 18 hours ago | parent | next [-]

In the home computer universe, such computers were the first ones having a programmable graphics unit that did more than paste the framebuffer into the screen.

While the PCs were still displaying text, or if you were lucky to own an Hercules card, gray text, or maybe a CGA one, with 4 colours.

While the Amigas, which I am more confortable with, were doing this in the mid-80's:

https://www.youtube.com/watch?v=x7Px-ZkObTo

https://www.youtube.com/watch?v=-ga41edXw3A

The original Amiga 1000, had on its motherboard, later reduced to fit into an Amiga 500,

Motorola 68000 CPU, a programmable sounds chip with DMA channels (Paula), and a programable blitter chip (Agnus aka early GPUs).

You would build in RAM the audio, or graphics instructions for the respetive chipset, set the DMA parameters, and let them lose.

	▲	goku12 15 hours ago \| parent \| next [-]
		Thanks! Early computing history is very interesting (I know that this wasn't the earliest). They also sometimes explain certain odd design decisions that are still followed today.
	▲	nnevatie 17 hours ago \| parent \| prev [-]
		Hey! I had an Amiga 1000 back in the day - it was simply awesome.

▲

estimator7292 13 hours ago | parent | prev [-]

In the olden days we didn't have GPUs, we had "CRT controllers".

What it offered you was a page of memory where each byte value mapped to a character in ROM. You feed in your text and the controller fetches the character pixels and puts them on the display. Later we got ASCII box drawing characters. Then we got sprite systems like the NES, where the Picture Processing Unit handles loading pixels and moving sprites around the screen.

Eventually we moved on to raw framebuffers. You get a big chunk of memory and you draw the pixels yourself. The hardware was responsible for swapping the framebuffers and doing the rendering on the physical display.

Along the way we slowly got more features like defining a triangle, its texture, and how to move it, instead of doing it all in software.

Up until the 90s when the modern concept of a GPU coalesced, we were mainly pushing pixels by hand onto the screen. Wild times.

The history of display processing is obviously a lot more nuanced than that, it's pretty interesting if that's your kind of thing.

	▲	pjmlp 12 hours ago \| parent [-]
		Small addendum, there was already stuff like TMS34010 in the 1980's, just not at home.

▲

cmrdporcupine 12 hours ago | parent | prev [-]

Those machines multiplexed the bus to split access to memory, because RAM speeds were competitive with or faster than the CPU bus speed. The CPU and VDP "shared" the memory, but only because CPUs were slow enough to make that possible.

We have had the opposite problem for 35+ years at this point. The newer architecture machines like the Apple machines, the GB10, the AI 395+ do share memory between GPU and CPU but in a different way, I believe.

I'd argue with memory becoming suddenly much more expensive we'll probably see the opposite trend. I'm going to get me one of these GB10 or Strix Halo machines ASAP because I think with RAM prices skyrocketing we won't be seeing more of this kind of thing in the consumer market for a long time. Or at least, prices will not be dropping any time soon.

	▲	pjmlp 11 hours ago \| parent [-]
		You are right, hence my "in a certain sense", because I was too lazy to point out the differences between a motherboard having everything there without pluggable graphics unit[0], and having everything now inside of a single chip. [0] - Not fully correct, as there are/were extensions cards that override the bus, thus replacing one of the said chips, on Amiga case.

▲

animal531 19 hours ago | parent | prev | next [-]

It's funny how ideas come and go. I made this very comment here on Hacker News probably 4-5 years ago and received a few down votes for it at the time (albeit that I was thinking of computers in general).

It would take a lot of work to make a GPU do current CPU type tasks, but it would be interesting to see how it changes parallelism and our approach to logic in code.

▲

goku12 19 hours ago | parent | next [-]

> I made this very comment here on Hacker News probably 4-5 years ago and received a few down votes for it at the time

HN isn't always very rational about voting. It will be a loss if you judge any idea on their basis.

> It would take a lot of work to make a GPU do current CPU type tasks

In my opinion, that would be counterproductive. The advantage of GPUs is that they have a large number of very simple GPU cores. Instead, just do a few separate CPU cores on the same die, or on a separate die. Or you could even have a forest of GPU cores with a few CPU cores interspersed among them - sort of like how modern FPGAs have logic tiles, memory tiles and CPU tiles spread out on it. I doubt it would be called a GPU at that point.

▲

zozbot234 18 hours ago | parent | next [-]

GPU compute units are not that simple, the main difference with CPU is that they generally use a combination of wide SIMD and wide SMT to hide latency, as opposed to the power-intensive out-of-order processing used by CPU's. Performing tasks that can't take advantage of either SIMD or SMT on GPU compute units might be a bit wasteful.

Also you'd need to add extra hardware for various OS support functions (privilege levels, address space translation/MMU) that are currently missing from the GPU. But the idea is otherwise sound, you can think of the 'Mill' proposed CPU architecture as one variety of it.

	▲	goku12 15 hours ago \| parent [-]
		> GPU compute units are not that simple Perhaps I should have phrased it differently. CPU and GPU cores are designed for different types of loads. The rest of your comment seems similar to what I was imagining. Still, I don't think that enhancing the GPU cores with CPU capabilities (OOE, rings, MMU, etc from your examples) is the best idea. You may end up with the advantages of neither and the disadvantages of both. I was suggesting that you could instead have a few dedicated CPU cores distributed among the numerous GPU cores. Finding the right balance of GPU to CPU cores may be the key to achieving the best performance on such a system.

▲

Den_VR 19 hours ago | parent | prev | next [-]

As I recall, Gartner made the outrageous claim that upwards of 70% of all computing will be “AI” in some number of years - nearly the end of cpu workloads.

▲

deliciousturkey 16 hours ago | parent | next [-]

I'd say over 70% of all computing is already been non-CPU for years. If you look at your typical phone or laptop SoC, the CPU is only a small part. The GPU takes the majority of area, with other accelerators also taking significant space. Manufacturers would not spend that money on silicon, if it was not already used.

▲

goku12 15 hours ago | parent | next [-]

> I'd say over 70% of all computing is already been non-CPU for years.

> If you look at your typical phone or laptop SoC, the CPU is only a small part.

Keep in mind that the die area doesn't always correspond to the throughput (average rate) of the computations done on it. That area may be allocated for a higher computational bandwidth (peak rate) and lower latency. Or in other words, get the results of a large number of computations faster, even if it means that the circuits idle for the rest of the cycles. I don't know the situation on mobile SoCs with regards to those quantities.

	▲	deliciousturkey 15 hours ago \| parent [-]
		This is true, and my example was a very rough metric. But the computation density per area is actually way, way higher on GPU's compared to CPU's. CPU's only spend a tiny fraction of their area doing actual computation.

▲

swiftcoder 12 hours ago | parent | prev | next [-]

> If you look at your typical phone or laptop SoC, the CPU is only a small part

In mobile SoCs a good chunk of this is power efficiency. On a battery-powered device, there's always going to be a tradeoff to spend die area making something like 4K video playback more power efficient, versus general purpose compute

Desktop-focussed SKUs are more liable to spend a metric ton of die area on bigger caches close to your compute.

▲

PunchyHamster 15 hours ago | parent | prev [-]

If going by raw operations done, if the given workload uses 3d rendering for UI that's probably true for computers/laptops. Watching YT video is essentially CPU pushing data between internet and GPU's video decoder, and to GPU-accelerated UI.

▲

yetihehe 16 hours ago | parent | prev [-]

Looking at home computers, most of "computing" when counted as flops is done by gpus anyway, just to show more and more frames. Processors are only used to organise all that data to be crunched up by gpus. The rest is browsing webpages and running some word or excel several times a month.

▲

k4rnaj1k 18 hours ago | parent | prev [-]

[dead]

▲

sharpneli 16 hours ago | parent | prev | next [-]

Is there any need for that? Just have a few good CPUs there and you’re good to go.

As for how the HW looks like we already know. Look at Strix Halo as an example. We are just getting bigger and bigger integrated GPUs. Most of the flops on that chip is the GPU part.

	▲	amelius 15 hours ago \| parent [-]
		I still would like to see a general GPU back end for LLVM just for fun.

▲

PunchyHamster 15 hours ago | parent | prev | next [-]

It would just make everything worse. Some (if anything, most) tasks are far less paralleliseable than typical GPU loads.

▲

deliciousturkey 16 hours ago | parent | prev [-]

HN in general is quite clueless about topics like hardware, high performance computing, graphics, and AI performance. So you probably shouldn't care if you are downvoted, especially if you honestly know you are being correct.

Also, I'd say if you buy for example a Macbook with an M4 Pro chip, it is already is a big GPU attached to a small CPU.

	▲	philistine 15 hours ago \| parent [-]
		People on here tend to act as if 20% of all computers sold were laptops, when it’s the reverse.

▲

amelius 16 hours ago | parent | prev | next [-]

Maybe at the point where you can run Python directly on the GPU. At which point the GPU becomes the new CPU.

Anyway, we're still stuck with "G" for "graphics" so it all doesn't make much sense and I'm actually looking for a vendor that takes its mission more seriously.

▲

cmrdporcupine 13 hours ago | parent | prev [-]

I mean, that's kind of what's going on at a certain level with the AMD Strix Halo, the NVIDIA GB10, and the newer Apple machines.

In the sense that the RAM is fully integrated, anyways.