Remix.run Logo
mook 3 days ago

I suspect part of it is also that Nvidia actually does a lot of things in firmware that can be upgraded. The new Nvidia Linux drivers (the "open" ones) support Turing cards from 2018. That means chips that old already do much of the processing in firmware.

AMD keeps having issues because their drivers talk to the hardware directly so their drivers are massive bloated messes, famous for pages of auto-generated register definitions. Likely it's much more difficult to fix anything.

Evil_Saint 3 days ago | parent | next [-]

Having worked at both Nvidia and AMD I can assure you that they both feature lots of generated header files.

bgnn 3 days ago | parent | prev [-]

Hmm that is interesting. Can you elaborate what is exactly different between them?

I'm asking because I think a firmware has to directly talk to hardware through lower HAL (hardware abstraction layer), while customer facing parts should be fairly isolated in the upper HAL. Some companies like to add direct HW acces to customer interface via more complex functions (often a recipe made out of lower HAL functions), which I always disliked. I prefer to isolate lower level functions and memory space from the user.

In any case, both Nvidia and AMD should have very similar FW capabilities. I don't know what I'm missing here.

Evil_Saint 3 days ago | parent | next [-]

I worked on at both companies on drivers. The programming models are quite different. Both make GPUs but they were designed by different groups of people who made different decisions. For example:

Nvidia cards are much easier to program in the user mode driver. You cannot hang a Nvidia GPU with a bad memory access. You can hang the display engine with one though. At least when I was there.

You can hang an AMD GPU with a bad memory access. At least up to the Navi 3x.

raxxorraxor 2 days ago | parent | prev [-]

Why isolate these functions? That will always cripple capabilities. With well designed interfaces, it doesn't lead to a mess and a more powerful device. Of course these lower level functions shouldn't be essential, but especially in these times you almost have to provide an interface here or be left behind by other environments.