>Consumer gpus are totally different products from the high end gpus now. Intel has failed on the gpu market and has effectively zero market share, so it is not actually clear there is an antitrust issue in that market. It would be nice if there was more competition but there are other players like AMD and a long tail of smaller ones

I'm sorry that's just not correct. Intel is literally just getting started in the GPU market, and their last several releases have been nearly exactly what people are asking for. Saying "they've lost" when the newest cards have been on the market for less than a month is ridiculous.

If they are even mediocre at marketing, the Arc Pro B50 has a chance to be an absolute game changer for devs who don't have a large budget:

https://www.servethehome.com/intel-arc-pro-b50-review-a-16gb...

I have absolutely no doubt Nvidia sees that list of "coming features" and will do everything they can to kill that roadmap.

▲ raincole 6 days ago | parent | next [-]

"Intel getting started in GPU market" is like a chain smoker quitting smoking. It's so easy that they have done it 20 times!

▲ tapland 6 days ago | parent | prev | next [-]

The lastest Arc GPUs were doing good, and were absolutely an option for entry/mid level gamers. I think lack of maturity was one of the main things keeping sales down.

	▲	Seattle3503 6 days ago \| parent [-]
		I've been seeing a lot of homelab types recommending their video cards for affordable Plex transcoding as well.

▲ bpt3 7 days ago | parent | prev | next [-]

Intel has been making GPUs for over 25 years. Claiming they are just getting started is absurd.

To that point, they've been "just getting started" in practically every chip market other than x86/x64 CPUs for over 20 years now, and have failed miserably every time.

If you think Nvidia is doing this because they're afraid of losing market share, you're way off base.

▲

cptskippy 6 days ago | parent [-]

There's a very big difference between the MVP graphics chips they've included in CPUs and the Arc discrete GPU.

▲

bpt3 6 days ago | parent [-]

Sure, but claiming they have literally just started is completely inaccurate.

They've been making discrete GPUs on and off since the 80s, and this is at least their 3rd major attempt at it as a company, depending on how you define "major".

They haven't even just started on this iteration, as the Arc line has been out since 2022.

The main thing I learned from this submission is how much people hate Nvidia.

	▲	cptskippy 6 days ago \| parent [-]
		> The main thing I learned from this submission is how much people hate Nvidia. I think there's a lot of frustration with Nvidia as of late. Their monopoly was mostly won on the merits of their technology but now that they are a monopoly they have shifted focus from building the best technology to building the most lucrative technology. They've demonstrated that they no longer have interested in producing the best gaming GPUs because those might cannibalize their server technology. Instead they seem to focus on crypto and AI while shipping over priced knee capped cards at outrageous prices. People are upset because they fear this deal will somehow influence Intel's GPU ambitions. Unfortunately I'm not sure these folks want to buy Intel GPUs, they just want Nvidia to be scared into competing again so they can buy a good Nvidia card. People just need to draw a line in the sand and stop supporting Nvidia.

▲ bigyabai 7 days ago | parent | prev | next [-]

  224 GB/s

  128 bit

The monkey's paw curls...

I love GPU differentiation, but this is one of those areas where Nvidia is justified shipping less VRAM. With less VRAM, you can use fewer memory controllers to push higher speeds on the same memory!

For instance, both the B50 and the RTX 2060 use GDDR6 memory. But the 2060 has a 192-bit memory bus, and enjoys ~336 GB/s bandwidth because of it.

▲

privatelypublic 7 days ago | parent [-]

Tell me again, how fast can you move data from system ram to vram?

	▲	bigyabai 7 days ago \| parent [-]
		Over a PCIe5 x8, ~31.5gb/s.

▲ Sohcahtoa82 6 days ago | parent | prev [-]

I don't know what anybody would do with such a weak card.

My RTX 5090 is about 10x faster (measured by FP32 TFLOPS) and I still don't find it to be fast enough. I can't imagine using something so slow for AI/ML. Only 2.2 tokens/sec on an 8B parameter Llama model? That's slower than someone typing.

I get that it's a budget card, but budget cards are supposed to at least win on a pure price/performance ratio, even with a lower baseline performance. The 5090 is 10x faster but only 6-8x the price, depending on where in the $2-3,000 price range you can find one at.

▲

dragonwriter 6 days ago | parent | next [-]

> My RTX 5090 is about 10x faster (measured by FP32 TFLOPS) and I still don't find it to be fast enough. I can't imagine using something so slow for AI/ML. Only 2.2 tokens/sec on an 8B parameter Llama model? That's slower than someone typing.

Its also orders of magnitudr slower than what I normally see cited by people using 5090s; heck, its even much slower than I see on my own 3080Ti laptop card for 8B models, though usually won’t use more than an 8bpw quant for that size model.

	▲	Sohcahtoa82 6 days ago \| parent [-]
		Yeah, I must be doing something wrong. Someone else pointed out that I should be getting much better performance. I'll be looking into it.

▲

clifflocked 6 days ago | parent | prev | next [-]

I feel as though you are measuring tokens/s wrong, or have a serious bottleneck somewhere. On my i5-10210u (no dedicated graphics, at standard clock speeds), I get ~6 tokens/s on phi4-mini, a 4b model. That means my laptop CPU with a power draw of 15 watts, that was released 6 years ago, is performing better than a 5090.

> The 5090 is 10x faster but only 6-8x the price

I don't buy into this argument. A B580 can be bought at MSRP for 250$. A RTX 5090 from my local Microcenter is around 3250$. That puts it at around 1/13th the price.

Power costs can also be a significant factor if you choose to self-host, and I wouldn't want to risk system integrity for 3x the power draw, 13x the price, a melting connector, and Nvidia's terrible driver support.

EDIT: You can get an RTX 5090 for around 2500$. I doubt it will ever reach MSRP though.

	▲	AuryGlenz 6 days ago \| parent [-]
		You can get them for $2,000 now. One from Asus has been that price several times over the last few months. I got my PNY for 2200 or so.

▲

jpalawaga 6 days ago | parent | prev | next [-]

you have outlier needs if an rtx, the fastest consumer grade card, is not good enough for you.

the intel card is great for 1080p gaming. especially if you're just playing counterstrike, indie games, etc, you don't need a beast.

very few people are trying to play 4k tombraider on ultra with high refresh rate.

▲

Sohcahtoa82 6 days ago | parent [-]

FWIW, my slowness is because of quantizing.

I've been using Mistral 7B, and I can get 45 tokens/sec, which is PLENTY fast, but to save VRAM so I can game while doing inference (I run an IRC bot that allows people to talk to Mistral), I quantize to 8 bits, which then brings my inference speed down to ~8 tokens/sec.

For gaming, I absolutely love this card. I can play Cyberpunk 2077 with all the graphics settings set to the maximum and get 120+ fps. Though when playing a much more graphically intense game like that, I certainly need to kill the bot to free up the VRAM. But I can play something simpler like League of Legends and have inference happening while I play with zero impact on game performance.

I also have 128 GB of system RAM. I've thought about loading the model in both 8-bit and 16-bit into system RAM and just swap which one is in VRAM based on if I'm playing a game so that if I'm not playing something, the bot runs significantly faster.

▲

mysteria 6 days ago | parent [-]

Hold on, you're only getting 45 tokens/sec with Mistral 7B on a 5090 of all things? That gets ~240 tokens/sec with Llama 7B quantized to 4 bits on llama.cpp [1] and those models should be pretty similar architecturally.

I don't know exactly how the scaling works here but considering how LLM inference is memory bandwidth limited you should go beyond 100 tokens/sec with the same model and a 8 bit quantization.

1. https://github.com/ggml-org/llama.cpp/discussions/15013

▲

Sohcahtoa82 6 days ago | parent [-]

My understanding is that quantizing lowers memory usage but increases compute usage because it still needs to convert the weights to fp16 on the fly at inference time.

Clearly I'm doing something wrong if it's a net loss in performance for me. I might have to look more into this.

	▲	mysteria 6 days ago \| parent [-]
		Yes it increases compute usage but your 5090 has a hell of a lot of compute and the decompression algorithms are pretty simple. Memory is the bottleneck here and unless you have a strange GPU which has lots of fast memory but very weak compute a quantized model should always run faster. If you're using llama.cpp run the benchmark in the link I posted earlier and see what you get; I think there's something like it for vllm as well.

▲

adgjlsfhk1 6 days ago | parent | prev | next [-]

The B60 is ridiculously good for scientific workloads. it's 50% more fp64 flops than a 5090 and 3/4ths the VRAM for 1/4th the price.

▲

ohdeargodno 6 days ago | parent | prev [-]

[dead]