| ▲ | mythz 4 days ago |
| Really confused why the Intel and AMD both continue to struggle and yet still refuse to offer what Nvidia wont, i.e. high ram consumer GPUs. I'd much prefer paying 3x cost for 3x VRAM (48GB/$1047), 6x cost for 6x VRAM (96GB/$2094), 12x cost for 12x VRAM (192GB/$4188), etc.
They'd sell like hotcakes and software support would quickly improve. At 16GB I'd still prefer to pay a premium for NVidia GPUs given its superior ecosystem, I really want to get off NVidia but Intel/AMD isn't giving me any reason to. |
|
| ▲ | fredoralive 4 days ago | parent | next [-] |
| Because the market of people who want huge RAM GPUs for home AI tinkering is basically about 3 Hacker News posters. Who probably won’t buy one because it doesn’t support CUDA. PS5 has something like 16GB unified RAM, and no game is going to really push much beyond that in VRAM use, we don’t really get Crysis style system crushers anymore. |
| |
| ▲ | bilekas 4 days ago | parent | next [-] | | > PS5 has something like 16GB unified RAM, and no game is going to really push much beyond that in VRAM use, we don’t really get Crysis style system crushers anymore. This isn't really true from the recreational card side, nVidia themselves are reducing the number of 8GB models as a sign of market demand [1].
Games these days are regularly maxing out 6 & 8 GB when running anything above 1080p for 60fps. The prevalence of Unreal Engine 5 also recently with a low quality of optimization for weaker hardware is causing games to be released basically unplayable for most. For recreational use the sentiment is that 8GB is scraping the bottom of the requirements. Again this is partly due to bad optimizations, but games are being played in higher resolutions also, which required more memory for larger texture sizes. [1] https://videocardz.com/newz/nvidia-reportedly-reduces-supply... | | |
| ▲ | pjmlp 4 days ago | parent | next [-] | | As someone that started on 8 bit computing, Tim Sweeny is right the Electron garbage culture when applied to Unreal 5 is one of the reasons so much RAM is needed, with such bad performance. While I dislike some of the handmade hero culture, in one thing they are right, regarding how bad modern hardware happens to be used. | | |
| ▲ | anthk 3 days ago | parent [-] | | I remember UE1 being playable even software mode, such as the first Deus EX.
Now, I think the Surreal Engine (UE1 reimplementation) needs damn GL 3.3 (if not 4.5 and Vulkan) to play games I used to play in an Athlon. Now I can't use surreal to Play DX on my legacy n270 netbook with GL 2.1... something that was more than enough to play the game at 800x600 with everything turned on and much more. A good thing is that I turned myself into libre/indie gaming with games such as Cataclysm DDA:Bright Ness with far less requeriments than a UE5 game and yet being enyojable due to playability and in-game lore (and a proper ending compared to vanilla CDDA). | | |
| ▲ | keyringlight 3 days ago | parent | next [-] | | UE1 was in the timeframe that 3D acceleration was only starting to get adopted, and IIRC from some interview Epic continued with a software option for UT2003/2004 (licensed pixomatic?) because they found out a lot of players were still playing their games on systems where full GPUs weren't always available, such as laptops. I know this is going back to Intel's Larrabee where they tried it, but I'd be real interested to see what the limits of a software renderer is now considering the comparative strength of modern processors and amount of multiprocessing. While I know there's DXVK or projects like dgVoodoo2 which can be an option with sometimes better backwards compatibility, just software would seem like a stable reference target than the gradually shifting landscape of GPUs/drivers/APIs | | |
| ▲ | pjmlp 3 days ago | parent [-] | | One possible way would be to revisit such ideas while using AVX-512, the surviving pieces out of Larrabee. | | |
| ▲ | anthk 3 days ago | parent [-] | | llvmpipe/lavapipe under MESA, too. Lavapipe on Vulkan makes VKQuake playable even con Core Duo 2 systems. Just as a concept, of course. I know about software rendered Quakes since forever. |
|
| |
| ▲ | 3036e4 3 days ago | parent | prev | next [-] | | Vanilla CDDA has a lot of entertaining endings, proper or not. I tend to find one within the first one or two in-game days. Great game! I like to install it now and then just to marvel at all the new things that have been added and then be killed by not knowing what I am doing. Never got far enough to interact with most systems in the game or worry about proper endings. | | |
| ▲ | anthk 3 days ago | parent [-] | | CDDA:BN adds a true 'endgame' objective (story bound) but OFC you are free to do anything you want anytime, eve after 'finishing' the game. |
| |
| ▲ | magicalhippo 3 days ago | parent | prev | next [-] | | > the Surreal Engine (UE1 reimplementation) The Unreal Engine software renderer back then had a very distinct dithering pattern. I played it after I got a proper 3D card, but it didn't feel the same, felt very flat and lifeless. | |
| ▲ | pjmlp 3 days ago | parent | prev [-] | | Me too, mostly indies and retro gamming, most of the AAA stuff isn't appealing when one has been playing games since the Atari golden days. |
|
| |
| ▲ | 4 days ago | parent | prev [-] | | [deleted] |
| |
| ▲ | epolanski 3 days ago | parent | prev | next [-] | | I really doubt your claim considering how many people I've seen buy 5k Macbooks Pros with 48+ GB of ram for local inference. 500$ 32GB consumer GPU is an obvious best seller. Thus let's call it how it is: they don't want to cannibalize their higher end GPUs. | | | |
| ▲ | paool 3 days ago | parent | prev | next [-] | | Maybe today, but the more accessible and affordable they become, the more likely people can start offering "self hosted" options. We're already seeing competitors of AWS but only targeting things like Qwen , deepseek, etc. There's Enterprise customers who have compliance laws and literally want AI but cannot use any of the top models because everything has to be run on their own infrastructure. | |
| ▲ | Rohansi 4 days ago | parent | prev | next [-] | | > PS5 has something like 16GB unified RAM, and no game is going to really push much beyond that in VRAM use That's pretty funny considering that PC games are moving more towards 32GB RAM and 8GB+ VRAM. The next generation of consoles will of course increase to make room for higher quality assets. | | |
| ▲ | pjmlp 3 days ago | parent [-] | | In many cases due to bad programming, fixed by adding more RAM. | | |
| ▲ | Rohansi 3 days ago | parent [-] | | Sure, but not always. Future games will have more detailed assets which will require more memory. Running at 4K or higher resolution will be more common which also requires more memory. | | |
| ▲ | pjmlp 3 days ago | parent [-] | | "Because in the real world, I have to write up lists of stuff I have to go to the grocery store to buy. And I have never thought to myself that realism is fun. I go play games to have fun." Gabe Newell - https://www.gamesradar.com/gabe-newell-says-games-dont-need-... Detailed assets don't equate good games. | | |
| ▲ | Rohansi 2 days ago | parent [-] | | Doesn't mean games are going to abandon realistic graphics styles. I also believe Gabe Newell was referring more to gameplay mimicking real life rather than art style. Makes a lot more sense when you remember that the Half Life games have a realistic art style and pushed limits of what was capable at the time. |
|
|
|
| |
| ▲ | jantuss 4 days ago | parent | prev | next [-] | | Another use for high RAM GPUs is the simulation of turbulent flows for research. Compared to CPU, GPU Navier-Stokes solvers are super fast, but the size of the simulated domain is limited by the RAM. | |
| ▲ | fnord77 3 days ago | parent | prev | next [-] | | Marketing is misreading the room. I believe there's a bunch of people buying no video cards right now that would if there were high vram options available | |
| ▲ | FirmwareBurner 4 days ago | parent | prev | next [-] | | >Because the market of people who want huge RAM GPUs for home AI tinkering is basically about 3 Hacker News posters You're wrong. It's probably more like 9 HN posters. | | |
| ▲ | blitzar 4 days ago | parent [-] | | There are also 3 who, for retro reasons, want GPUs to have 8 bits and 256MB or less of VRAM | | |
| |
| ▲ | wpm 3 days ago | parent | prev [-] | | This isn’t a gaming card, so what the PS5 does or has is not relevant here. |
|
|
| ▲ | daemonologist 4 days ago | parent | prev | next [-] |
| This card does have double the VRAM of the more expensive Nvidia competitor (the A1000, which has 8 GB), but I take your point that it doesn't feel like quite enough to justify giving up the Nvidia ecosystem. The memory bandwidth is also... not great. They also announced a 24 GB B60 and a double-GPU version of the same (saves you physical slots), but it seems like they don't have a release date yet (?). |
| |
|
| ▲ | cmxch 4 days ago | parent | prev | next [-] |
| Maxsun does offer a high VRAM (48GB) dual Arc Pro B60, but the only US availability has it on par with a 5090 at ~$3000. |
| |
| ▲ | PostOnce 4 days ago | parent [-] | | I think that's actually two GPUs on one card, and not a single GPU with 48GB VRAM | | |
| ▲ | hengheng 4 days ago | parent [-] | | Needs a PCIe bifurcation chip on the main board for all we know. Compatibility is going to be fun. |
|
|
|
| ▲ | Ekaros 4 days ago | parent | prev | next [-] |
| I am not sure there is significant enough market for those. That is selling enough consumer units to cover all design and other costs. From gamer perspective 16GB is now a reasonable point. 32GB is most one would really want and even that not at more than say 100 more price point. This to me is the gamer perspective. This segment really does not need even 32GB, let alone 64GB or more. |
| |
| ▲ | drra 4 days ago | parent | next [-] | | Never underestimate bragging rights in gamers community. Majority of us run unoptimized systems with that one great piece of gear and as long as the game runs at decent FPS and we have some bragging rights it's all ok. | | |
| ▲ | rkomorn 4 days ago | parent [-] | | Exactly. A decade ago, I put 64GB of RAM in a PC and my friend asked me why I needed that much and I replied "so I can say I have 64GB RAM". The only time usage was "high" was when I created a VM with 48GB RAM just for kicks. It was useless. But I could say I had 64GB RAM. | | |
| ▲ | rvba 3 days ago | parent [-] | | My work computer with Widnows, Outlook, few tabs open and Excel already craps put with 16 GB If you hae a private computer, why wpuld you even bug something with 16GB in 2025?
My 10 year old laptop had that much. Im looking for a new laptop and Im looking at a 128GB setup - so those 200 chrome tabs can eat it, I have space to run other stuff, like those horrible electron chat apps + a game | | |
| ▲ | rkomorn 2 days ago | parent [-] | | I have 64GB in my current machine and, TBH, it's more than enough. But if I were to upgrade, I'd still get at least 128GB. Nobody's gonna be impressed with 64GB anymore. I don't need that in my life... |
|
|
| |
| ▲ | imiric 4 days ago | parent | prev [-] | | > I am not sure there is significant enough market for those. How so? The prosumer local AI market is quite large and growing every day, and is much more lucrative per capita than the gamer market. Gamers are an afterthought for GPU manufacturers. NVIDIA has been neglecting the segment for years, and is now much more focused on enterprise and AI workloads. Gamers get marginal performance bumps each generation, and side effect benefits from their AI R&D (DLSS, etc.). The exorbitant prices and performance per dollar are clear indications of this. It's plain extortion, and the worst part is that gamers accepted that paying $1000+ for a GPU is perfectly reasonable. > This segment really does not need even 32GB, let alone 64GB or more. 4K is becoming a standard resolution, and 16GB is not enough for it. 24GB should be the minimum, and 32GB for some headroom. While it's true that 64GB is overkill for gaming, it would be nice if that would be accessible at reasonable prices. After all, GPUs are not exclusively for gaming, and we might want to run other workloads on them from time to time. While I can imagine that VRAM manufacturing costs are much higher than DRAM costs, it's not unreasonable to conclude that NVIDIA, possibly in cahoots with AMD, has been artificially controlling the prices. While hardware has always become cheaper and more powerful over time, for some reason, GPUs buck that trend, and old GPUs somehow appreciate over time. Weird, huh. This can't be explained away as post-pandemic tax and chip shortages anymore. Frankly, I would like some government body to investigate this industry, assuming they haven't been bought out yet. Label me a conspiracy theorist if you wish, but there is precedent for this behavior in many industries. | | |
| ▲ | Fnoord 3 days ago | parent | next [-] | | I think the timeline is roughly: SGI (90s), Nvidia gaming (with ATi and then AMD) eating that cake. Then cryptocurrency took off at the end '00s / start '10s, but if we are honest things like hashcat were also already happening. After that AI (LLMs) took off during the pandemic. During the cryptocurrency hype, GPUs were already going for insane prices and together with low energy prices or surplus (which solar can cause, but nuclear should too) allows even governments to make cheap money (and for hashcat cracking, too). If I was North Korea I'd know my target. Turns out, they did, but in a different way. That was around 2014. Add on top of this Stadia and GeForce Now as examples of renting GPU for gaming (there are more, and Stadia flopped). I didn't mention LLMs since that has been the most recent development. All in all, it turns out GPUs are more valuable than what they were sold for if your goal isn't personal computer gaming. Hence the price gone up. Now, if you want to thoroughly investigate this market you need to figure what large foreign forces (governments, businesses, and criminal enterprises) use these GPUs for. US government is aware for long time of above; hence export restrictions on GPUs. Which are meant as slowing opponent down to catch up. The opponent is the non-free world (China, North Korea, Russia, Iran, ...), though current administration is acting insane. | | |
| ▲ | imiric 3 days ago | parent [-] | | You're right, high demand certainly plays a role. But it's one thing for the second-hand market to dictate the price of used hardware, and another for new hardware to steadily get more expensive while its objective capabilities only see marginal improvements. At a certain point it becomes blatant price gouging. NVIDIA is also taking consumers for a ride by marketing performance based on frame generation, while trying to downplay and straight up silence anyone who points out that their flagship cards still struggle to deliver a steady 4K@60 without it. Their attempts to control the narrative of media outlets like Gamers Nexus should be illegal, and fined appropriately. Why we haven't seen class-action lawsuits for this in multiple jurisdictions is beyond me. |
| |
| ▲ | rocqua 4 days ago | parent | prev [-] | | Why would intel willingly join this cartel then? Their GPU business is a slow upstart. If they have a play that could massively disrupt the competition, and has a small chance of epic failure, that should be very attractive to them. |
|
|
|
| ▲ | zdw 4 days ago | parent | prev | next [-] |
| I doubt you'd get linear scaling of price/capacity - the larger capacity modules are more expensive per GB than smaller ones, and in some cases are supply constrained. The number of chips on the bus is usually pretty low (1 or 2 of them on most GPUs), so GPUs tend to have to scale out their memory bus widths to get to higher capacity. That's expensive and takes up die space, and for the conventional case (games) isn't generally needed on low end cards. What really needs to happen is someone needs to make some "system seller" game that is incredibly popular and requires like 48GB of memory on the GPU to build demand. But then you have a chicken/egg problem. Example: https://wccftech.com/nvidia-geforce-rtx-5090-128-gb-memory-g... |
|
| ▲ | YetAnotherNick 4 days ago | parent | prev | next [-] |
| > I'd much prefer paying 3x cost for 3x VRAM Why not just buy 3 card then? These cards doesn't require active cooling anyways and you can just fit 3 in decent sized case. You will get 3x VRAM speed and 3x compute. And if your usecase is llm inference, it will be a lot faster than 1x card with 3x VRAM. |
| |
| ▲ | zargon 3 days ago | parent | next [-] | | We will buy 4 cards if they are 48 GB or more. At a measly 16 GB, we’re just going to stick with 3090s, P40s, MI50s, etc. > 3x VRAM speed and 3x compute LLM scaling doesn’t work this way. If you have 4 cards, you may get 2x performance increase if you use vLLM. But you’ll also need enough VRAM to run FP8. 3 cards would only run at 1x performance. | |
| ▲ | _zoltan_ 4 days ago | parent | prev [-] | | because then instead of RAM bandwidth now you're dealing with PCIe BW which is way less. | | |
| ▲ | YetAnotherNick 3 days ago | parent | next [-] | | For LLM inference of batch size 1, it's hard to be saturate PCIe bandwidth specially for less powerful chips. You would get close to linear performance[1]. The obvious issue is few things on multiple GPU is harder, and many softwares don't fully support it or isn't optimized for it. [1]: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen... | |
| ▲ | mythz 3 days ago | parent | prev [-] | | Also less power efficient, takes up more PCI slots and a lot of software doesn't support GPU clustering. Already have 4x 16GB GPUs which is unable to run large models exceeding 16GB. Currently running them different VMs to be able to make full use of them, used to have them running in different docker containers however OOM Exceptions would frequently bring down the whole server, which running in VMs helped resolve. | | |
| ▲ | zargon 3 days ago | parent [-] | | What’s your application for high-VRAM that doesn’t leverage multiple gpus? |
|
|
|
|
| ▲ | 0x500x79 3 days ago | parent | prev | next [-] |
| I think it's a bit of planned obsolescence as well. The 1080ti has been a monster with it's 11GB VRAM up until this generation. A lot of enthusiasts basically call out that Nvidia won't make that mistake again since it led to longer upgrade cycles. |
|
| ▲ | akvadrako 3 days ago | parent | prev | next [-] |
| AMD Strix Halo has about 100GB of VRAM for around $1500 if that's all you care about. |
|
| ▲ | kristopolous 4 days ago | parent | prev | next [-] |
| You want an M3 ultra Mac studio |
| |
| ▲ | ginko 4 days ago | parent [-] | | That only runs Mac OS so it's useless. | | |
| ▲ | kristopolous 4 days ago | parent | next [-] | | for ai workloads? You're wrong. I use mine as a server, just ssh into it. I don't even have a keyboard or display hooked up to it. You can get 96gb of vram and about 40-70% the speed of a 4090 for $4000. Especially when you are running a large number of applications you want to talk to each other it makes sense ... the only way to do it on a 4090 is to hit disk, shut the application down, start up the other applciation, read from disk ... it's slowwww... the other option is a multi-gpu system but then it gets into real money. trust me, it's a gamechanger. I just have it sitting in a closet. Use it all the time. The other nice thing is unlike with any Nvidia product, you can walk into an apple store, pay the retail price and get it right away. No scalpers, no hunting. | |
| ▲ | 3 days ago | parent | prev [-] | | [deleted] |
|
|
|
| ▲ | doctorpangloss 4 days ago | parent | prev [-] |
| they don't manufacture RAM, so none of the margin goes to them |
| |
| ▲ | nullc 4 days ago | parent | next [-] | | Even if they put out some super high memory models and just pass the ram through at cost it would increase sales -- potentially quite dramatically and increase their total income a lot and have a good chance of transitioning to being a market leader rather than an also-ran. AMD has lagged so long because of the software ecosystem but the climate now is that they'd only need to support a couple popular model architectures to immediately grab a lot of business. The failure to do so is inexplicable. I expect we will eventually learn that this was about yet another instance of anti-competitive collusion. | | |
| ▲ | doctorpangloss 4 days ago | parent | next [-] | | the whole RAM industry was twice sanctioned for price fixing, so I agree: any business that deals with RAM has, more likely than other industries by a lot, anti-competitive collusion | |
| ▲ | zargon 3 days ago | parent | prev [-] | | > The failure to do so is inexplicable. Lisa and Jensen are cousins. I think that explains it. Lisa can easily prove me wrong by releasing a high-memory GPU that significantly undercuts Nvidia's RTX 6000 Pro. |
| |
| ▲ | kube-system 4 days ago | parent | prev [-] | | They sell the completed card, which has margin. You can charge more money for a card with more vram. | | |
| ▲ | blitzar 4 days ago | parent [-] | | Or shrink the margin down to just 50% and sell 10x the number of cards (for the week or two it would take Nvidia to announce a 5090 with 128gb) | | |
| ▲ | AnthonyMouse 4 days ago | parent [-] | | Nvidia uses VRAM amount for market segmentation. They can't make a 128GB consumer card without cannibalizing their enterprise sales. Which means Intel or AMD making an affordable high-VRAM card is win-win. If Nvidia responds in kind, Nvidia loses a ton of revenue they'd otherwise have available to outspend their smaller competitors on R&D. If they don't, they keep more of those high-margin customers but now the ones who switch to consumer cards are switching to Intel or AMD, which both makes the company who offers it money and helps grow the ecosystem that isn't tied to CUDA. People say things like "it would require higher pin counts" but that's boring. The increase in the amount people would be willing to pay for a card with more VRAM is unambiguously more than the increase in the manufacturing cost. It's more plausible that there could actually be global supply constraints in the manufacture of GDDR, but if that's the case then just use ordinary DDR5 and a wider bus. That's what Apple does and it's fine, and it may even cost less in pins than you save because DDR is cheaper than GDDR. It's not clear what they're thinking by not offering this. | | |
| ▲ | blitzar 4 days ago | parent | next [-] | | > Intel or AMD making an affordable high-VRAM card is win-win. 100% agree. CUDA is a bit of a moat, but the earlier in the hype cycle viable alternatives appear, the more likely the non CUDA ecosystem becomes viable. > It's not clear what they're thinking by not offering this. They either dont like making money or have a fantasy that one day soon they will be able to sell pallets of $100,000 GPUs they made for $2.50 like Nvidia can. It doesn't take a PhD and MBA to figure out that the only reason Nvidia have, what should be a short term market available to them is the failings of Intel and AMD and the VC / Innovation side to offer any competition. It is such an obvious win-win that it would probably be worth skipping the engineering and just announcing the product, for sale by the end of the year and force everyones hand. | |
| ▲ | prmoustache 3 days ago | parent | prev | next [-] | | > The increase in the amount people would be willing to pay for a card with more VRAM is unambiguously more than the increase in the manufacturing cost. I guess you already have the paper if it is that unambiguous. Would you mond sharing the data/source? | | |
| ▲ | AnthonyMouse 3 days ago | parent [-] | | The cost of more pins is linear in the number of pins, and the pins aren't the only component of the manufacturing cost, so a card with twice as many pins will have a manufacturing cost of significantly less than twice that of a card with half as many pins. Cards with 16GB of VRAM exist for ~$300 retail. Cards with 80GB of VRAM cost >$15,000 and customers pay that. A card with 80GB of VRAM could be sold for <$1500 with five times the margin of the $300 card because the manufacturing cost is less than five times as much. <$1500 is unambiguously a smaller number than >$15,000. QED. | | |
| ▲ | doctorpangloss 2 days ago | parent [-] | | > the manufacturing cost is less than five times as much They don’t manufacture the RAM. This isn’t complicated. They make less margin (a percentage) in your scenario. And that’s what Wall Street cares about. | | |
| ▲ | AnthonyMouse 2 days ago | parent [-] | | They don't really manufacture anything. TSMC or Samsung make the chip and Samsung, Micron or Hynix make the RAM. Even Intel's GPUs are TSMC. Also, Wall St cares about profit, not margins. If you can move a billion units with a $100 margin, they're going to like you a lot better than if you move a million units with a $1000 margin. |
|
|
| |
| ▲ | singhrac 4 days ago | parent | prev | next [-] | | This is almost true but not quite - I don't think much of the (dollar) spend on enterprise GPUs (H100, B200, etc.) would transfer if there was a 128 GB consumer card. The problem is both memory bandwidth (HBM) and networking (NVLink), which NVIDIA definitely uses to segment consumer vs enterprise hardware. I think your argument is still true overall, though, since there are a lot of "gpu poors" (i.e. grad students) who write/invent in the CUDA ecosystem, and they often work in single card settings. Fwiw Intel did try this with Arctic Sound / Ponte Vecchio, but it was late out the door and did not really perform (see https://chipsandcheese.com/p/intels-ponte-vecchio-chiplets-g...). It seems like they took on a lot of technical risk; hopefully some of that transfers over to a future project though Falcon Shores was cancelled. They really should should have released some of those chips even at a loss, but I don't know the cost of a tape out. | | |
| ▲ | AnthonyMouse 3 days ago | parent [-] | | NVLink matters if you want to combine a whole bunch of GPUs, e.g. you need more VRAM than any individual GPU is available with. Many workloads exist that don't care about that or don't have working sets that large, particularly if the individual GPU actually has a lot of VRAM. If you need 128GB and you have GPUs with 40GB of VRAM then you need a fast interconnect. If you can get an individual GPU with 128GB, you don't. There is also work being done to make this even less relevant because people are already interested in e.g. using four 16GB cards without a fast interconnect when you have a 64GB model. The simpler implementation of this is to put a quarter of the model on each card split in the order it's used and then have the performance equivalent of one card with 64GB of VRAM by only doing work on the card with that section of the data in its VRAM and then moving the (much smaller) output to the next card. A more sophisticated implementation does something similar but exploits parallelism by e.g. running four batches at once, each offset by a quarter, so that all the cards stay busy. Not all workloads can be split like this but for some of the important ones it works. | | |
| ▲ | singhrac 3 days ago | parent [-] | | I think we might just disagree about how much of the GPU spend is on small vs large model (inference or training). I think it’s something like 99.9% of spending interest is on models that don’t fit into 128 GB (remember KV cache matters too). Happy to be proven wrong! |
|
| |
| ▲ | 3 days ago | parent | prev [-] | | [deleted] |
|
|
|
|