> up to 6,144 state-of-the-art CUDA cores

A RTX Pro 6000 has ~24K 5th generation tensor cores, I'm guessing this would then be 1/4 of the count but 6th generation? Wasn't clear from the images.

▲

gravypod 2 hours ago | parent [-]

What is more important than core count is how the caching architecture is laid out. They could lay out those 6k cuda cores in a layout which provides much larger blocks of cache to smaller number of cores. That would increase the memory bandwidth which would be better for inference.

	▲	embedding-shape 2 hours ago \| parent [-]
		Sounds like the memory bandwidth is worse though; > The memory is not as fast as dedicated GPU memory, but it is cheap enough while delivering enough bandwidth to run AI models locally. Also "cheap while delivering enough" certainly sounds like someone is trying to temper expectations. It sounds like something sitting in-between GPU+VRAM inference and CPU+RAM one, not as a step above/besides GPU+VRAM.