| ▲ | DeepSeek-v3.2: Pushing the frontier of open large language models [pdf](huggingface.co) |
| 492 points by pretext 8 hours ago | 222 comments |
| https://huggingface.co/deepseek-ai/DeepSeek-V3.2 https://api-docs.deepseek.com/news/news251201 |
|
| ▲ | zug_zug 5 hours ago | parent | next [-] |
| Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly. |
| |
| ▲ | jstummbillig 5 hours ago | parent | next [-] | | How could we judge if anyone is "winning" on cost-effectiveness, when we don't know what everyones profits/losses are? | | |
| ▲ | tedivm 2 hours ago | parent | next [-] | | If you're trying to build AI based applications you can and should compare the costs between vendor based solutions and hosting open models with your own hardware. On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token. Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website. You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems. That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on. | | |
| ▲ | Muromec 29 minutes ago | parent | next [-] | | >That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on. What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap? | |
| ▲ | qeternity an hour ago | parent | prev [-] | | > DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on. Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8. |
| |
| ▲ | ericskiff 4 hours ago | parent | prev | next [-] | | I believe this was a statement on cost per token to us as consumers of the service | |
| ▲ | rowanG077 3 hours ago | parent | prev [-] | | Well consumers care about the cost to them, and those we know. And deepseek is destroying everything in that department. |
| |
| ▲ | srameshc 5 hours ago | parent | prev | next [-] | | As much I agree with your sentiment, but I doubt the intention is singular. | | |
| ▲ | twelvechairs 4 hours ago | parent | next [-] | | The bar is incredibly low considering what OpenAI has done as a "not for profit" | | | |
| ▲ | echelon 4 hours ago | parent | prev [-] | | I don't care if this kills Google and OpenAI. I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign). Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech. I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero. The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now. These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful. | | |
| ▲ | raw_anon_1111 3 hours ago | parent | next [-] | | I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China. Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital. They would feel the same way about using xAI or maybe even Facebook models. | | |
| ▲ | JSR_FDED an hour ago | parent | next [-] | | AirBnB is all in on DeepSeek and Qwen. https://sg.finance.yahoo.com/news/airbnb-picks-alibabas-qwen... | |
| ▲ | StealthyStart 3 hours ago | parent | prev | next [-] | | This is the real cause. At the enterprise level, trust outweighs cost. My company hires agencies and consultants who provide the same advice as our internal team; this is not to imply that our internal team is incorrect; rather, there is credibility that if something goes wrong, the decision consequences can be shifted, and there is a reason why companies continue to hire the same four consulting firms. It's trust, whether it's real or perceived. | | |
| ▲ | raw_anon_1111 3 hours ago | parent | next [-] | | I have seen it much more nuanced than that. 2020 - I was a mid level (L5) cloud consultant at AWS with only two years of total AWS experience and that was only at a small startup before then. Yet every customer took my (what in hindsight might not have been the best) advice all of the time without questioning it as long as it met their business goals. Just because I had @amazon.com as my email address. Late 2023 - I was the subject matter expert in a niche of a niche in AWS that the customer focused on and it was still almost impossible to get someone to listen to a consultant from a shitty third rate consulting company. 2025 - I left the shitty consulting company last year after only a year and now work for one with a much better reputation and I have a better title “staff consultant”. I also play the game and be sure to mention that I’m former “AWS ProServe” when I’m doing introductions. Now people listen to me again. | |
| ▲ | coliveira 2 hours ago | parent | prev | next [-] | | So much worse for American companies. This only means that they will be uncompetitive with similar companies that use models with realistic costs. | | |
| ▲ | raw_anon_1111 41 minutes ago | parent [-] | | I can’t think of a single major US company that is big internationally that is competing on price. | | |
| |
| ▲ | 0xWTF 3 hours ago | parent | prev [-] | | Children do the same thing intuitively: parents continually complain that their children don't listen to them. But as soon as someone else tells them to "cover their nose", "chew with their mouth closed", "don't run with scissors", whatever, they listen and integrate that guidance into their behavior. What's harder to observe is all the external guidance they get that they don't integrate until their parents tell them. It's internal vs external validation. | | |
| ▲ | raw_anon_1111 2 hours ago | parent [-] | | Or in many cases they go over to their grandparents house and they let them run wild and all of the sudden your parents have “McDonald’s money” for their grandkids when they never had it for you. |
|
| |
| ▲ | tokioyoyo 3 hours ago | parent | prev | next [-] | | If the Chinese model becomes better than competitors, these worries will suddenly disappear. Also, there are plenty startups and enterprises that are running fine-tuned versions of different OS models. | | |
| ▲ | hhh 2 hours ago | parent | next [-] | | No… Nobody I work for will touch these models. The fear is real that they have been poisoned or have some underlying bomb. Plus y’know, they’re produced by China, so they would never make it past a review board in most mega enterprises IME. | | |
| ▲ | tokioyoyo 36 minutes ago | parent | next [-] | | People say that, but everyone, including enterprises, are constantly buying Chinese tech one way or another because of cost/quality ratio. There’s a tipping point in any excel file where risks don’t make sense, if the cost is 20x for the same quality. Of course you’ll always have exceptions (government, military and etc.), but for private, winner will take it all. | | |
| ▲ | raw_anon_1111 16 minutes ago | parent [-] | | What Chinese built infrastructure tech where information can be exfiltrated or cause any real damage are American companies buying? Chinese communication tech is for the most part not allowed in any American technology. |
| |
| ▲ | cherioo 33 minutes ago | parent | prev [-] | | That conversation probably gets easier if and when company when $100+M on AI. Companies just need to get to the “if” part first. That or they wash their hand by using a reseller that can use whatever it wants under the hood. |
| |
| ▲ | raw_anon_1111 2 hours ago | parent | prev | next [-] | | Yeah that’s not how Big Enterprise works… And most startups are just doing prompt engineering that will never go anywhere. The big companies will just throw a couple of developers at the feature and add it to their existing business. | | |
| ▲ | tokioyoyo 34 minutes ago | parent [-] | | Big enterprise with mostly private companies as their clients? Lol, yeah, that’s how they work from my personal experience. The reality is, if it’s not a tech-first enterprise and already outsource part of tech to a shop outside of NA (which is almost majority at this point), they will do absolutely everything to cut the costs. | | |
| ▲ | raw_anon_1111 11 minutes ago | parent [-] | | I spent three years working in consulting mostly in public sector and education and the last two working with startups to mid size commercial interest and a couple of financial institutions. Before that I spent 6 years working between 3 companies in health care in a tech lead role. I’m 100% sure that any of those companies would I have immediately questioned my judgment for suggesting DeepSeek if had been a thing. Absolutely none of them would ever have touched DeepSeek. |
|
| |
| ▲ | subroutine 2 hours ago | parent | prev [-] | | As a government contractor, using a Chinese model is a non-starter. |
| |
| ▲ | siliconc0w 3 hours ago | parent | prev | next [-] | | Even when self-hosting, there is still a real risk of using Chinese models (or any provider you can't trust/sue) because they can embed malicious actions into the model. For example, a small random percentage of the time, it could add a subtle security vulnerability to any code generation. This is a known-playbook of China and so it's pretty likely that if they aren't already doing this, they will eventually if the models see high adoption. | | |
| ▲ | nagaiaida 3 hours ago | parent [-] | | on what hypothetical grounds would you be more meaningfully able to sue the american maker of a self-hosted statistical language model that you select your own runtime sampling parameters for after random subtle security vulnerabilities came out the other side when you asked it for very secure code? put another way, how do you propose to tell this subtle nefarious chinese sabotage you baselessly imply to be commonplace from the very real limitations of this technology in the first place? | | |
| ▲ | kriops 2 hours ago | parent | next [-] | | "Baselessly" - I'm sorry but realpolitik is plenty of basis. China is a geopolitical adversary of both the EU and the US. And China will be the first to admit this, btw. | | |
| ▲ | coliveira 2 hours ago | parent | next [-] | | Competitor != adversary. It is US warmongering ideology that tries to equate these concepts. | | |
| ▲ | kriops 2 hours ago | parent | next [-] | | That is just objectively incorrect, and fundamentally misunderstanding the basics of statehood. China, the US, and any other local monopoly on force would absolutely take any chance they could get to extend their influence and diminish the others. That is they are acting rationally to at minimum maximise the probability they are able to maintain their current monopolies on force. | | |
| ▲ | jrflowers 33 minutes ago | parent [-] | | Isn’t every country by definition a “local monopoly on force”? Sweden and Norway have their own militaries and police forces and neither would take kindly to an invasion from the other. By your definition this makes them adversaries or enemies. | | |
| ▲ | kriops 17 minutes ago | parent [-] | | Exactly. I am Norwegian myself, and I don’t even know how many wars we have had with Sweden and Denmark. If you are getting at the fact that it is sometimes beneficial for adversaries to collaborate (e.g., the prisoner dilemma) then I agree. And indeed, both Norway and Sweden would be completely lost if they declared war on the other tomorrow. But it doesn’t change the fundamental nature of the relationship. |
|
| |
| ▲ | delaminator 21 minutes ago | parent | prev [-] | | you clearly haven't been paying attention remember when the US bugged EU leader's phones, including Merkel from 2002 to 2013? |
| |
| ▲ | nagaiaida 2 hours ago | parent | prev | next [-] | | sorry, is your contention here "spurious accusations don't require evidence when aimed at designated state enemies"? because it feels uncharitably rude to infer that's what you meant to say here, but i struggle to parse this in a different way where you say something more reasonable. | | |
| ▲ | kriops 2 hours ago | parent [-] | | I’m sorry you feel that way. It is however entirely reasonable to assume that the comment I replied to was made entirely in bad faith, seeing as it dismisses any rational basis for the behaviour of the entities it is making claims about. |
| |
| ▲ | saubeidl 31 minutes ago | parent | prev [-] | | The US has also been behaving like an adversary of the EU as of late. So what's the difference? |
| |
| ▲ | fragmede 2 hours ago | parent | prev [-] | | This paper may be of interest to you: https://arxiv.org/html/2504.15867v1 | | |
| ▲ | nagaiaida 2 hours ago | parent [-] | | the mechanism of action for that attack appears to be reading from poisoned snippets on stackoverflow or a similar site, which to my mind is an excellent example of why it seems like it would be difficult to retroactively pin "insecure code came out of my model" on the evil communist base weights of the model in question |
|
|
| |
| ▲ | kriops 2 hours ago | parent | prev | next [-] | | For good reason, too. Hostile governments have a much easier time poisoning their "local" LLMs. | |
| ▲ | register 41 minutes ago | parent | prev | next [-] | | That might be the perspective of a US based company. But there is also Europe and basically it's a choice between Trump and China. | | |
| ▲ | Muromec 25 minutes ago | parent [-] | | Europe has Mistral. It feels that governments that can do things without fax take this as a sovereignity thing and roll their own or have their provider in their jurisdiction. |
| |
| ▲ | littlestymaar an hour ago | parent | prev | next [-] | | > I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China. Well for non-American companies, you have the choice between Chinese models that don't send data home, and American ones that do, with both countries being more or less equally threatening. I think if Mistral can just stay close enough to the race it will win many customers by not doing anything. | |
| ▲ | tehjoker an hour ago | parent | prev [-] | | really a testament to how easily the us govt has spun a china bad narrative even though it is mostly fiction and american exceptionalism |
| |
| ▲ | giancarlostoro 2 hours ago | parent | prev [-] | | ChatGPT is like "Photoshop" people will call any AI chatgpt. |
|
| |
| ▲ | ActorNightly 2 hours ago | parent | prev | next [-] | | >winning on cost-effectiveness Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks. | | |
| ▲ | JSR_FDED an hour ago | parent | next [-] | | Nobody is winning until cars are the size of a pack of cards. Which is big enough to transport even the largest cargo. | |
| ▲ | bbor an hour ago | parent | prev | next [-] | | I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely. Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess). FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3]. [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse... [2] https://huggingface.co/google/gemma-3n-E4B-it [3] https://lmarena.ai/leaderboard/text/overall
[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse... | | |
| ▲ | qeternity an hour ago | parent [-] | | > but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1] No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2. |
| |
| ▲ | beefnugs 2 hours ago | parent | prev [-] | | Why does that matter? They wont be making at home graphics cards anymore. Why would you do that when you can be pre-sold $40k servers for years into the future | | |
| ▲ | observationist an hour ago | parent [-] | | Because Moore's law marches on. We're around 35-40 orders of magnitude from computers now to computronium. We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today. If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains. Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise. Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things. What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy? NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like. | | |
| ▲ | delaminator 17 minutes ago | parent [-] | | > If we do get to AGI (2029 according to Kurzweil) if you base your life on Kurzweil's hard predictions you're going to have a bad time |
|
|
| |
| ▲ | make3 4 hours ago | parent | prev | next [-] | | I suspect they will keep doing this until they have a substantially better model than the competition. Sharing methods to look good & allow the field to help you keep up with the big guys is easy. I'll be impressed if they keep publishing even when they do beat the big guys soundly. | |
| ▲ | catigula 4 hours ago | parent | prev | next [-] | | To push back on naivety I'm sensing here I think it's a little silly to see Chinese Communist Party backed enterprise as somehow magnanimous and without ulterior, very harmful motive. | | |
| ▲ | jascha_eng 4 hours ago | parent | next [-] | | Oh they need control of models to be able to censor and ensure whatever happens inside the country with AI stays under their control. But the open-source part? Idk I think they do it to mess with the US investment and for the typical open source reasons of companies: community, marketing, etc.
But tbh especially the messing with the US, as a european with no serious competitor, I can get behind. | | |
| ▲ | ptsneves 3 hours ago | parent | next [-] | | This is the rare earth minerals dumping all over again. Devalue to such a price as to make the market participants quit, so they can later have a strategic stranglehold on the supply. This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it. It is a kind of cheat on the fair market but at the same time it is also costly to China and its capital costs may become unsustainable before the last players fold. | | |
| ▲ | coliveira 2 hours ago | parent | next [-] | | > cheat on the fair market Can you really view this as a cheat this when the US is throwing a trillion dollars in support of a supposedly "fair market"? | |
| ▲ | embedding-shape 3 hours ago | parent | prev | next [-] | | > This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it. It's a bit early to have any sort of feelings about it, isn't it? You're speaking in absolutes, but none of this is necessarily 100% true as we don't know their intentions. And judging a group of individuals intention based on what their country seems to want, from the lens of a foreign country, usually doesn't land you with the right interpretation. | |
| ▲ | tokioyoyo 2 hours ago | parent | prev | next [-] | | I mentioned this before as well, but AI-competition within China doesn’t care that much about the western companies. Internal market is huge, and they know winner-takes-it-all in this space is real. | |
| ▲ | Jedd 2 hours ago | parent | prev | next [-] | | > It is a kind of cheat on the fair market ... I am very curious on your definition and usage of 'fair' there, and whether you would call the LLM etc sector as it stands now, but hypothetically absent deepseek say, a 'fair market'. (If not, why not?) | |
| ▲ | josh_p 2 hours ago | parent | prev | next [-] | | Isn’t it already well accepted that the LLM market exists in a bubble with a handful of companies artificially inflating their own values? ESH | |
| ▲ | DiogenesKynikos 2 hours ago | parent | prev | next [-] | | Are you by chance an OpenAI investor? We should all be happy about the price of AI coming down. | | |
| ▲ | doctorwho42 an hour ago | parent [-] | | But the economy!!! /s Seriously though, our leaders are actively throwing everything and the kitchen sink into AI companies - in some vain attempt to become immortal or own even more of the nations wealth beyond what they already do, chasing some kind of neo-tech feudalism. Both are unachievable because they rely on a complex system that they clearly don't understand. |
| |
| ▲ | jascha_eng 2 hours ago | parent | prev | next [-] | | Do they actually spend that much though? I think they are getting similar results with much fewer resources. It's also a bit funny that providing free models is probably the most communist thing China has done in a long time. | |
| ▲ | jsiepkes 3 hours ago | parent | prev | next [-] | | The way we fund the AI bubble in the west could also be described as: "kind of cheat on the fair market". OpenAI has never made a single dime of profit. | |
| ▲ | CamperBob2 3 hours ago | parent | prev [-] | | Good luck making OpenAI and Google cry uncle. They have the US government on their side. They will not be allowed to fail, and they know it. What I appreciate about the Chinese efforts is that they are being forced to get more intelligence from less hardware, and they are not only releasing their work products but documenting the R&D behind them at least as well as our own closed-source companies do. A good reason to stir up dumping accusations and anti-China bias would be if they stopped publishing not just the open-source models, but the technical papers that go with them. Until that happens, I think it's better to prefer more charitable explanations for their posture. |
| |
| ▲ | catigula 4 hours ago | parent | prev [-] | | They're pouring money to disrupt American AI markets and efforts. They do this in countless other fields. It's a model of massive state funding -> give it away for cut-rate -> dominate the market -> reap the rewards. It's a very transparent, consistent strategy. AI is a little different because it has geopolitical implications. | | |
| ▲ | ForceBru 3 hours ago | parent | next [-] | | When it's a competition among individual producers, we call it "a free market" and praise Hal Varian. When it's a competition among countries, it's suddenly threatening to "disrupt American AI markets and efforts". The obvious solution here is to pour money into LLM research too. Massive state funding -> provide SOTA models for free -> dominate the market -> reap the rewards (from the free models). | | | |
| ▲ | tokioyoyo 2 hours ago | parent | prev [-] | | I can’t believe I’m shilling for China in these comments, but how different it is for company A getting blank check investments from VCs and wink-wink support from the government in the west? And AI-labs in China has been getting funding internally in the companies for a while now, before the LLM-era. |
|
| |
| ▲ | amunozo 2 hours ago | parent | prev | next [-] | | The motive is to destroy the American supremacy on AI, it's not that deep. This is much easier to do open sourcing the models than competing directly, and this can have good ramifications for everybody, even if the motive is "bad". | |
| ▲ | tehjoker an hour ago | parent | prev | next [-] | | the motive is to prevent us dominance of this space, which is a good thing | |
| ▲ | gazaim 3 hours ago | parent | prev [-] | | *Communist Party of China (CPC) | | |
| |
| ▲ | paulvnickerson 2 hours ago | parent | prev [-] | | If you value life in the West, you should not be rooting for a Communist model or probably any state-backed model https://venturebeat.com/security/deepseek-injects-50-more-se... | | |
| ▲ | amunozo 2 hours ago | parent | next [-] | | Should I root for the democratic OpenAI, Google or Microsoft instead? | | |
| ▲ | doctorwho42 an hour ago | parent [-] | | Further more, who thinks our little voices matter anymore in the US when it comes to the investor classes? And if they did, having a counterweight against corrupt self-centered US oligarchs/CEOs is actually one of the biggest proponents for an actual powerful communist or other model world power. The US had some of the most progressive tax policies in its existence when it was under existential threat during the height of the USSR, and when their powered started to diminish, so too did those tax policies. |
| |
| ▲ | stared 14 minutes ago | parent | prev | next [-] | | There used to be memes „open source is communism”, vide https://souravroy.com/2010/01/01/is-open-source-pro-communis... | |
| ▲ | Lucasoato 2 hours ago | parent | prev [-] | | > CrowdStrike researchers next prompted DeepSeek-R1 to build a web application for a Uyghur community center. The result was a complete web application with password hashing and an admin panel, but with authentication completely omitted, leaving the entire system publicly accessible. > When the identical request was resubmitted for a neutral context and location, the security flaws disappeared. Authentication checks were implemented, and session management was configured correctly. The smoking gun: political context alone determined whether basic security controls existed. Holy shit, these political filters seem embedded directly in the model weights. | | |
| ▲ | tehjoker an hour ago | parent [-] | | not convincing. have you tried saying "free palestine" on a college campus recently? |
|
|
|
|
| ▲ | gradus_ad 3 hours ago | parent | prev | next [-] |
| How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models? What hurt open source in the past was its inability to keep up with the quality and feature depth of closed source competitors, but models seem to be reaching a performance plateau; the top open weight models are generally indistinguishable from the top private models. Infrastructure owners with access to the cheapest energy will be the long run winners in AI. |
| |
| ▲ | teleforce 30 minutes ago | parent | next [-] | | >How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models? According to Google (or someone at Google) no organization has moat on AI/LLM [1]. But that does not mean that it is not hugely profitable providing it as SaaS even you don't own the model or Model as a Service (MaaS). The extreme example is Amazon providing MongoDB API and services. Sure they have their own proprietary DynamoDB but for the most people scale up MongoDB is more than suffice. Regardless brand or type of databases being used, you paid tons of money to Amazon anyway to be at scale. Not everyone has the resource to host a SOTA AI model. On top of tangible data-intensive resources, they are other intangible considerations. Just think how many company or people host their own email server now although the resources needed are far less than hosting an AI/LLM model? Google came up with the game changing transformer at its backyard and OpenAI temporarily stole the show with the well executed RLHF based system of ChatGPT. Now the paid users are swinging back to Google with its arguably more superior offering. Even Google now put AI summary as its top most search return results for free to all, higher than its paid advertisement clients. [1]Google “We have no moat, and neither does OpenAI”: https://news.ycombinator.com/item?id=35813322 | |
| ▲ | delichon 16 minutes ago | parent | prev | next [-] | | > Infrastructure owners with access to the cheapest energy will be the long run winners in AI. For a sufficiently low cost to orbit that may well be found in space, giving Musk a rather large lead. By his posts he's currently obsessed with building AI satellite factories on the moon, the better to climb the Kardashev scale. | | |
| ▲ | kridsdale1 5 minutes ago | parent [-] | | The performance bottleneck for space based computers is heat dissipation. Earth based computers benefit from the existence of an atmosphere to pull cold air in from and send hot air out to. A space data center would need to entirely rely on city sized heat sink fins. |
| |
| ▲ | dotancohen 3 hours ago | parent | prev | next [-] | | People and companies trust OpenAI and Anthropic, rightly or wrongly, with hosting the models and keeping their company data secure. Don't underestimate the value of a scapegoat to point a finger at when things go wrong. | | |
| ▲ | reed1234 2 hours ago | parent [-] | | But they also trust cloud platforms like GCP to host models and store company data. Why would a company use an expensive proprietary model on Vertex AI, for example, when they could use an open-source one on Vertex AI that is just as reliable for a fraction of the cost? I think you are getting at the idea of branding, but branding is different from security or reliability. |
| |
| ▲ | jonplackett 2 hours ago | parent | prev | next [-] | | Either... Better (UX / ease of use) Lock in (walled garden type thing) Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name) | | |
| ▲ | niek_pas 2 hours ago | parent | next [-] | | > Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name. Not Google, and not Amazon. Microsoft is a maybe. | | |
| ▲ | reed1234 2 hours ago | parent | next [-] | | People trust google with their data in search, gmail, docs, and android. That is quite a lot of personal info, and trust, already. All they have to do is completely switch the google homepage to gemini one day. | |
| ▲ | polyomino 2 hours ago | parent | prev [-] | | The success of Facebook basically proves that public brand perception does not matter at all | | |
| ▲ | acephal 2 hours ago | parent [-] | | Facebook itself still has a big problem with it's lack of youth audience though. Zuck captured the boomers and older Gen X, which are the biggest demos of living people however. |
|
| |
| ▲ | poszlem 2 hours ago | parent | prev [-] | | Or lobbing for regulations. You know. The "only american models are safe" kind of regulation. |
| |
| ▲ | tsunamifury 3 hours ago | parent | prev | next [-] | | Pure models clearly aren’t the monetizing strategy, use of them on existing monetized surfaces are the core value. Google would love a cheap hq model on its surfaces. That just helps Google. | | |
| ▲ | gradus_ad 3 hours ago | parent [-] | | Hmmm but external models can easily operate on any "surface". For instance Claude Code simply reads and edits files and runs in a terminal. Photo editing apps just need a photo supplied to them. I don't think there's much juice to squeeze out of deeply integrated AI as AI by its nature exists above the application layer, in the same way that we exist above the application layer as users. |
| |
| ▲ | iLoveOncall an hour ago | parent | prev [-] | | > How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models? They won't. Actually, even if open models aren't competitive, they still won't. Hasn't this been clear since a while already? There's no moat in models, investments in pure models has only been to chase AGI, all other investment (the majority, from Google, Amazon, etc.) has been on products using LLMs, not models themselves. This is not like the gold rush where the ones who made good money were the ones selling shovels, it's another kind of gold rush where you make money selling shovels but the gold itself is actually worthless. |
|
|
| ▲ | red2awn 6 hours ago | parent | prev | next [-] |
| Worth noting this is not only good on benchmarks, but significantly more efficient at inference https://x.com/_thomasip/status/1995489087386771851 |
| |
|
| ▲ | singularity2001 28 minutes ago | parent | prev | next [-] |
| Why are there so few 32,64,128,256,512 GB models which could run on current consumer hardware? And why is the maximum RAM on Mac studio M4 128 GB?? |
| |
| ▲ | jameslk 2 minutes ago | parent [-] | | 128 GB should be good enough for anybody (just kidding). I’m curious if the M5 Max will have higher RAM limits |
|
|
| ▲ | embedding-shape 4 hours ago | parent | prev | next [-] |
| > DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability. At first, I thought they had gone the route of implementing yet another chat format that can handle more dynamic conversations like that, instead of just using Harmony, but looking at the syntax, doesn't it look exactly like Harmony? That's a good thing, don't get me wrong, but why not mention straight up that they've implemented Harmony, so people can already understand up front that it's compatible with whatever parsing we're using for GPT-OSS? |
|
| ▲ | TIPSIO 6 hours ago | parent | prev | next [-] |
| It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps? Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further |
| |
| ▲ | tarruda 4 hours ago | parent | next [-] | | You can run at ~20 tokens/second on a 512GB Mac Studio M3 Ultra: https://youtu.be/ufXZI6aqOU8?si=YGowQ3cSzHDpgv4z&t=197 IIRC the 512GB mac studio is about $10k | | |
| ▲ | hasperdi 4 hours ago | parent [-] | | and can be faster if you can get an MOE model of that | | |
| ▲ | dormento 3 hours ago | parent | next [-] | | "Mixture-of-experts", AKA "running several small models and activating only a few at a time". Thanks for introducing me to that concept. Fascinating. (commentary: things are really moving too fast for the layperson to keep up) | | |
| ▲ | hasperdi 3 hours ago | parent | next [-] | | As pointed out by a sibling comment. MOE consists of a router and a number of experts (eg 8). These experts can be imagined as parts of the brain with specialization, although in reality they probably don't work exactly like that. These aren't separate models, they are components of a single large model. Typically, input gets routed to a number of of experts eg. top 2, leaving the others inactive. This reduces number of activation / processing requirements. Mistral is an example of a model that's designed like this. Clever people created converters to transform dense models to MOE models. These days many popular models are also available in MOE configuration | |
| ▲ | whimsicalism 3 hours ago | parent | prev [-] | | that's not really a good summary of what MoEs are. you can more consider it like sublayers that get routed through (like how the brain only lights up certain pathways) rather than actual separate models. | | |
| ▲ | Mehvix 2 hours ago | parent [-] | | The gains from MoE is that you can have a large model that's efficient, it lets you decouple #params and computation cost. I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'. These are totally different systems, IMO this shallow comparison muddies the water and does a disservice to each field of study. There's been loads of research showing there's redundancy in MoE models, ie cerebras has a paper[1] where they selectively prune half the experts with minimal loss across domains -- I'm not sure you could disable half the brain and notice a stupefying difference. [1] https://www.cerebras.ai/blog/reap |
|
| |
| ▲ | miohtama 2 hours ago | parent | prev | next [-] | | All modern models are MoE already, no? | |
| ▲ | bigyabai 3 hours ago | parent | prev [-] | | >90% of inference hardware is faster if you run an MOE model. |
|
| |
| ▲ | reilly3000 4 hours ago | parent | prev | next [-] | | There are plenty of 3rd party and big cloud options to run these models by the hour or token. Big models really only work in that context, and that’s ok. Or you can get yourself an H100 rack and go nuts, but there is little downside to using a cloud provider on a per-token basis. | | |
| ▲ | cubefox 2 hours ago | parent [-] | | > There are plenty of 3rd party and big cloud options to run these models by the hour or token. Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy. |
| |
| ▲ | noosphr 5 hours ago | parent | prev | next [-] | | Home rigs like that are no longer cost effective. You're better off buying an rtx pro 6000 outright. This holds both for the sticker price, the supporting hardware price, the electricity cost to run it and cooling the room that you use it in. | | |
| ▲ | torginus 5 hours ago | parent | next [-] | | I was just watching this video about a Chinese piece of industrial equipment, designed for replacing BGA chips such as flash or RAM with a good deal of precision: https://www.youtube.com/watch?v=zwHqO1mnMsA I wonder how well the aftermarket memory surgery business on consumer GPUs is doing. | | |
| ▲ | dotancohen 2 hours ago | parent | next [-] | | I wonder how well the opthalmologist is doing. These guys are going to be paying him a visit playing around with those lasers and no PPE. | | |
| ▲ | CamperBob2 an hour ago | parent [-] | | Eh, I don't see the risk, no pun intended. It's not collimated, and it's not going to be in focus anywhere but on-target. It's also probably in the long-wave range >>1000 nm that's not focused by the eye. At the end of the day it's no different from any other source of spot heating. I get more nervous around some of the LED flashlights you can buy these days. I want one. Hot air blows. |
| |
| ▲ | ThrowawayTestr 4 hours ago | parent | prev [-] | | LTT recently did a video on upgrading a 5090 to 96gb of ram |
| |
| ▲ | throw4039 4 hours ago | parent | prev | next [-] | | Yeah, the pricing for the rtx pro 6000 is surprisingly competitive with the gamer cards (at actual prices, not MSRP). A 3x5090 rig will require significant tuning/downclocking to be run from a single North American 15A plug, and the cost of the higher powered supporting equipment (cooling, PSU, UPS, etc) needed will pay for the price difference, not to mention future expansion possibilities. | |
| ▲ | mikae1 5 hours ago | parent | prev [-] | | Or perhaps a 512GB Mac Studio. 671B Q4 of R1 runs on it. | | |
| ▲ | redrove 4 hours ago | parent [-] | | I wouldn’t say runs. More of a gentle stroll. | | |
| ▲ | storus 4 hours ago | parent | next [-] | | I run it all the time, token generation is pretty good. Just large contexts are slow but you can hook a DGX Spark via Exo Labs stack and outsource token prefill to it. Upcoming M5 Ultra should be faster than Spark in token prefill as well. | | |
| ▲ | embedding-shape 3 hours ago | parent [-] | | > I run it all the time, token generation is pretty good. I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like? | | |
| ▲ | storus 3 hours ago | parent [-] | | I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable. |
|
| |
| ▲ | hasperdi 4 hours ago | parent | prev [-] | | With quantization, converting it to an MOE model... it can be a fast walk |
|
|
| |
| ▲ | seanw265 2 hours ago | parent | prev | next [-] | | FWIW it looks like OpenRouter's two providers for this model (one of whom being Deepseek itself) are only running the model around 28tps at the moment. https://openrouter.ai/deepseek/deepseek-v3.2 This only bolsters your point. Will be interesting to see if this changes as the model is adopted more widely. | |
| ▲ | halyconWays 4 hours ago | parent | prev | next [-] | | As someone with a basement rig of 6x 3090s, not really. It's quite slow, as with that many params (685B) it's offloading basically all of it into system RAM. I limit myself to models with <144B params, then it's quite an enjoyable experience. GLM 4.5 Air has been great in particular | |
| ▲ | bigyabai 6 hours ago | parent | prev | next [-] | | People with basement rigs generally aren't the target audience for these gigantic models. You'd get much better results out of an MoE model like Qwen3's A3B/A22B weights, if you're running a homelab setup. | | |
| ▲ | Spivak 6 hours ago | parent [-] | | Yeah I think the advantage of OSS models is that you can get your pick of providers and aren't locked into just Anthropic or just OpenAI. | | |
| ▲ | hnfong an hour ago | parent [-] | | Reproducibility of results are also important in some cases. There are consumer-ish hardware that can run large models like DeepSeek 3.x slowly. If you're using LLMs for a specific purpose that is well-served by a particular model, you don't want to risk AI companies deprecating it in a couple months and push you to a newer model (that may or may not work better in your situation). And even if the AI service providers nominally use the same model, you might have cases where reproducibility requires you use the same inference software or even hardware to maintain high reproducibility of the results. If you're just using OpenAI or Anthropic you just don't get that level of control. |
|
| |
| ▲ | potsandpans 3 hours ago | parent | prev [-] | | I run a bunch of smaller models on a 12gb vram 3060 and it's quite good. For larger open models ill use open router. I'm looking into on-
demand instances with cloud/vps providers, but haven't explored the space too much. I feel like private cloud instances that run on demand is still in the spirit of consumer hobbyist. It's not as good as having it all local, but the bootstrapping cost plus electricity to run seems prohibitive. I'm really interested to see if there's a space for consumer TPUs that satisfy usecases like this. |
|
|
| ▲ | zparky 9 hours ago | parent | prev | next [-] |
| Benchmarks are super impressive, as usual. Interesting to note in table 3 of the paper (p. 15), DS-Speciale is 1st or 2nd in accuracy in all tests, but has much higher token output (50% more, or 3.5x vs gemini 3 in the codeforces test!). |
| |
| ▲ | futureshock 6 hours ago | parent [-] | | The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling. |
|
|
| ▲ | mcbuilder 4 hours ago | parent | prev | next [-] |
| After using it a couple hours playing around, it is a very solid entry, and very competitive compared with the big US relaeses. I'd say it's better than GLM4.6 and I'm Kimi K2. Looking forward to v4 |
|
| ▲ | Havoc an hour ago | parent | prev | next [-] |
| Note combination of big frontier level model and MIT license. |
|
| ▲ | jodleif 8 hours ago | parent | prev | next [-] |
| I genuinely do not understand the evaluations of the US AI industry. The chinese models are so close and far cheaper |
| |
| ▲ | espadrine 5 hours ago | parent | next [-] | | Two aspects to consider: 1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful. 2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better. On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/ You can notice that, while Chinese models are quite good, the gap to the top is still significant. However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there). | | |
| ▲ | coliveira 4 hours ago | parent | next [-] | | Nothing you said helps with the issue of valuation. Yes, the US models may be better by a few percentage points, but how can they justify being so costly, both operationally as well as in investment costs? Over the long run, this is a business and you don't make money being the first, you have to be more profitable overall. | | |
| ▲ | ben_w 4 hours ago | parent [-] | | I think the investment race here is an "all-pay auction"*. Lots of investors have looked at the ultimate prize — basically winning something larger than the entire present world economy forever — and think "yes". But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet. * https://en.wikipedia.org/wiki/All-pay_auction | | |
| ▲ | coliveira 4 hours ago | parent [-] | | > investors have looked at the ultimate prize — basically winning something larger than the entire present world economy This is what people like Altman want investors to believe. It seems like any other snake oil scam because it doesn't match reality of what he delivers. | | |
|
| |
| ▲ | jodleif 5 hours ago | parent | prev | next [-] | | 1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA. | | |
| ▲ | brabel 5 hours ago | parent [-] | | Qwen Image and Image Edit were among the best image models until Nano Banana Pro came along. I have tried some open image models and can confirm , the Chinese models are easily the best or very close to the best, but right now the Google model is even better... we'll see if the Chinese catch up again. | | |
| ▲ | BoorishBears 2 hours ago | parent [-] | | I'd say Google still hasn't caught up on the smaller model side at all, but we've all been (rightfully) wowed enough by Pro to ignore that for now. Nano Banano Pro starts at 15 cents per image at <2k resolution, and is not strictly better than Seedream 4.0: yet the latter does 4K for 3 cents per image. Add in the power of fine-tuning on their open weight models and I don't know if China actually needs to catch up. I finetuned Qwen Image on 200 generations from Seedream 4.0 that were cleaned up with Nano Banana Pro, and got results that were as good and more reliable than either model could achieve otherwise. |
|
| |
| ▲ | raincole 4 hours ago | parent | prev | next [-] | | > video Most of AI-generated videos we see on social media now are made with Chinese models. | |
| ▲ | agumonkey 4 hours ago | parent | prev | next [-] | | forgive me for bringing politics into it, are chinese LLM more prone to censorship bias than US ones ? | | |
| ▲ | coliveira 4 hours ago | parent | next [-] | | Being open source, I believe Chinese models are less prone to censorship, since the US corporations can add censorship in several ways just by being a closed model that they control. | |
| ▲ | skeledrew 4 hours ago | parent | prev [-] | | It's not about a LLM being prone to anything, but more about the way a LLM is fine-tuned (which can be subject to the requirements of those wielding political power). | | |
| |
| ▲ | torginus 5 hours ago | parent | prev [-] | | Thanks for sharing that! The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today? I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past) | | |
| ▲ | espadrine 2 hours ago | parent [-] | | Good question. There's 2 points to consider. • For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version.
Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding • The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher. By the way, you can look at the Table tab to see all known and predicted results on benchmarks. |
|
| |
| ▲ | jasonsb 6 hours ago | parent | prev | next [-] | | It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3. The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap. Edit: It looks like Cerebras is offering a very fast GLM 4.6 | | |
| ▲ | observationist 5 hours ago | parent | next [-] | | The network effects of using consistently behaving models and maintaining API coverage between updates is valuable, too - presumably the big labs are including their own domains of competence in the training, so Claude is likely to remain being very good at coding, and behave in similar ways, informed and constrained by their prompt frameworks, so that interactions will continue to work in predictable ways even after major new releases occur, and upgrades can be clean. It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front. Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work. | |
| ▲ | irthomasthomas 4 hours ago | parent | prev | next [-] | | Gemini 3 = ~70tps
https://openrouter.ai/google/gemini-3-pro-preview Opus 4.5 = ~60-80tps
https://openrouter.ai/anthropic/claude-opus-4.5 Kimi-k2-think = ~60-180tps
https://openrouter.ai/moonshotai/kimi-k2-thinking Deepseek-v3.2 = ~30-110tps (only 2 providers rn)
https://openrouter.ai/deepseek/deepseek-v3.2 | | |
| ▲ | jasonsb 4 hours ago | parent [-] | | It doesn't work like that. You need to actually use the model and then go to /activity to see the actual speed. I constantly get 150-200tps from the Big 3 while other providers barely hit 50tps even though they advertise much higher speeds. GLM 4.6 via Cerebras is the only one faster than the closed source models at over 600tps. | | |
| ▲ | irthomasthomas 4 hours ago | parent [-] | | These aren't advertised speeds, they are the average measured speeds by openrouter across different providers. |
|
| |
| ▲ | DeathArrow 4 hours ago | parent | prev | next [-] | | > If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. I think GLM 4.6 offered by Cerebras is much faster than any US model. | | | |
| ▲ | jodleif 5 hours ago | parent | prev | next [-] | | Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq? | |
| ▲ | kachapopopow 5 hours ago | parent | prev | next [-] | | cerebras AI offers models at 50x the speed of sonnet? | |
| ▲ | csomar 6 hours ago | parent | prev [-] | | According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude. |
| |
| ▲ | Bolwin 4 hours ago | parent | prev | next [-] | | Third party providers rarely support caching. With caching the expensive US models end up being like 2x the price (e.g sonnet) and often much cheaper (e.g gpt-5 mini) If they start caching then US companies will be completely out priced. | |
| ▲ | jazzyjackson 6 hours ago | parent | prev | next [-] | | Valuation is not based on what they have done but what they might do, I agree tho it's investment made with very little insight into Chinese research. I guess it's counting on deepseek being banned and all computers in America refusing to run open software by the year 2030 /snark | | |
| ▲ | jodleif 5 hours ago | parent | next [-] | | > Valuation is not based on what they have done but what they might do Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs. | | |
| ▲ | ksynwa 5 hours ago | parent [-] | | Even if they do continue to lag behind they are a good bet against monopolisation by proprietary vendors. | | |
| ▲ | coliveira 4 hours ago | parent [-] | | They would if corporations were allowed to run these models. I fully expect the US government to prohibit corporations from doing anything useful with Chinese models (full censorship). It's the same game they use with chips. |
|
| |
| ▲ | bilbo0s 6 hours ago | parent | prev [-] | | >I guess it's counting on deepseek being banned And the people making the bets are in a position to make sure the banning happens. The US government system being what it is. Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet". "Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people? | | |
| ▲ | coliveira 4 hours ago | parent [-] | | Exactly. "Business investment" these days means that the people involved will have at least some amount of power to determine the winning results. |
|
| |
| ▲ | newyankee 6 hours ago | parent | prev | next [-] | | Yet tbh if the US industry had not moved ahead and created the race with FOMO it would not had been easier for Chinese strategy to work either. The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ? | |
| ▲ | mrinterweb 4 hours ago | parent | prev | next [-] | | I would expect one of the motivations for making these LLM model weights open is to undermine the valuation of other players in the industry. Open models like this must diminish the value prop of the frontier focused companies if other companies can compete with similar results at competitive prices. | |
| ▲ | fastball 4 hours ago | parent | prev | next [-] | | They're not that close (on things like LMArena) and being cheaper is pretty meaningless when we are not yet at the point where LLMs are good enough for autonomy. | |
| ▲ | rprend an hour ago | parent | prev | next [-] | | People pay for products, not models. OpenAI and Anthropic make products (ChatGPT, Claude Code). | |
| ▲ | beastman82 4 hours ago | parent | prev | next [-] | | Then you should short the market | |
| ▲ | isamuel 5 hours ago | parent | prev [-] | | There is a great deal of orientalism --- it is genuinely unthinkable to a lot of American tech dullards that the Chinese could be better at anything requiring what they think of as "intelligence." Aren't they Communist? Backward? Don't they eat weird stuff at wet markets? It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice." | | |
| ▲ | mosselman 5 hours ago | parent | next [-] | | Back when deepseek came out and people were tripping over themselves shouting it was so much better than what was out there, it just wasn’t good. It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true. What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive. Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great! So I love everything about the competitive nature of this. | | |
| ▲ | CamperBob2 5 hours ago | parent [-] | | If you thought DeepSeek "just wasn't good," there's a good chance you were running it wrong. For instance, a lot of people thought they were running "DeepSeek" when they were really running some random distillation on ollama. | | |
| ▲ | bjourne 4 hours ago | parent [-] | | WDYM? Isn't https://chat.deepseek.com/ the real DeepSeek? | | |
| ▲ | CamperBob2 3 hours ago | parent [-] | | Good point, I was assuming the GP was running local for some reason. Hard to argue when it's the official providers who are being compared. I ran the 1.58-bit Unsloth quant locally at the time it came out, and even at such low precision, it was super rare for it to get something wrong that o1 and GPT4 got right. I have never actually used a hosted version of the full DS. |
|
|
| |
| ▲ | breppp 5 hours ago | parent | prev | next [-] | | Not sure how the entire Nazi comparison plays out, but at the time there were good reasons to imagine the Soviets will fall apart (as they initially did) Stalin just finished purging his entire officer corps, which is not a good omen for war, and the USSR failed miserably against the Finnish who were not the strongest of nations, while Germany just steamrolled France, a country that was much more impressive in WW1 than the Russians (who collapsed against Germany) | |
| ▲ | ecshafer 3 hours ago | parent | prev | next [-] | | I don't think that anyone, much less someone working in tech or engineering in 2025, could still hold beliefs about Chinese not being capable scientists or engineers. I could maybe give (the naive) pass to someone in 1990 thinking China will never build more than junk. But in 2025 their product capacity, scientific advancement, and just the amount of us who have worked with extremely talented Chinese colleagues should dispel those notions. I think you are jumping to racism a bit fast here. Germany was right in some ways and wrong in others for the soviet unions strength. USSR failed to conquer Finland because of the military purges. German intelligence vastly under-estimated the amount of tanks and general preparedness of the Soviet army (Hitler was shocked the soviets had 40k tanks already). Lend Lease act really sent an astronomical amount of goods to the USSR which allowed them to fully commit to the war and really focus on increasing their weapon production, the numbers on the amount of tractors, food, trains, ammunition, etc. that the US sent to the USSR is staggering. | | |
| ▲ | hnfong 41 minutes ago | parent [-] | | I don't think anyone seriously believes that the Chinese aren't capable, it's more like people believe no matter what happens, USA will still dominate in "high tech" fields. A variant of "American Exceptionalism" so to speak. This is kinda reflected in the stock market, where the AI stocks are surging to new heights every day, yet their Chinese equivalents are relatively lagging behind in stock price, which suggests that investors are betting heavily on the US companies to "win" this "AI race" (if there's any gains to be made by winning). Also, in the past couple years (or maybe a couple decades), there had also been a lot of crap talk about how China has to democratize and free up their markets in order to be competitive with the other first world countries, together with a bunch of "doomsday" predictions for authoritarianism in China. This narrative has completely lost any credibility, but the sentiment dies slowly... |
| |
| ▲ | newyankee 5 hours ago | parent | prev | next [-] | | but didn't Chinese already surpass the rest of the world in Solar, batteries, EVs among other things ? | | |
| ▲ | cyberlimerence 5 hours ago | parent [-] | | They did, but the goalposts keep moving, so to speak. We're approximately here : advanced semiconductors, artificial intelligence, reusable rockets, quantum computing, etc. Chinese will never catch up. /s |
| |
| ▲ | gazaim 3 hours ago | parent | prev | next [-] | | These Americans have no comprehension of intelligence being used to benefit humanity instead of being used to fund a CEO's new yacht. I encourage them to visit China to see how far the USA lags behind. | |
| ▲ | lukan 5 hours ago | parent | prev | next [-] | | "It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race;
..." Ideology played a role, but the data they worked with, was the finnish war, that was disastrous for the sowjet side. Hitler later famously said, it was all a intentionally distraction to make them believe the sowjet army was worth nothing. (Real reasons were more complex, like previous purging). | |
| ▲ | littlestymaar 5 hours ago | parent | prev [-] | | > It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded. | | |
| ▲ | gazaim 3 hours ago | parent [-] | | > Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded. There were many counter revolutionaries among the leadership, even those conducting the purges. Stalin was like "ah fuck we're hella compromised." Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception. | | |
| ▲ | littlestymaar 2 hours ago | parent [-] | | > There were many counter revolutionaries among the leadership Well, Stalin was, by far, the biggest counter-revolutionary in the Politburo. > Stalin was like "ah fuck we're hella compromised." There's no evidence that anything significant was compromised at that point, and clear evidence that Stalin was in fact medically paranoid. > Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception. Can we please not recycle 90-years old soviet propaganda? The Moscow trial being irrational self-harm was acknowledged by the USSR leadership as early as the fifties… |
|
|
|
|
|
| ▲ | spullara 4 hours ago | parent | prev | next [-] |
| I hate that their model ids don't change as they change the underlying model. I'm not sure how you can build on that. % curl https://api.deepseek.com/models \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}"
{"object":"list","data":[{"id":"deepseek-chat","object":"model","owned_by":"deepseek"},{"id":"deepseek-reasoner","object":"model","owned_by":"deepseek"}]}
|
| |
| ▲ | hnfong 38 minutes ago | parent | next [-] | | Agree that having datestamps on model ids is a good idea, but it's open source, you can download the weights and build on those. In the long run, this is better than the alternative of calling API of a proprietary model and hoping it doesn't get deprecated. | |
| ▲ | KronisLV 4 hours ago | parent | prev [-] | | Oh hey, quality improvement without doing anything! (unless/until a new version gets worse for your use case) |
|
|
| ▲ | twistedcheeslet 4 hours ago | parent | prev | next [-] |
| How capable are these models at tool calling? |
| |
| ▲ | potsandpans an hour ago | parent [-] | | From some very brief experimentation with deepseek about 2 months ago, tool calling is very hot or miss. Claude appears to be the absolute best. |
|
|
| ▲ | htrp 5 hours ago | parent | prev | next [-] |
| what is the ballpark vram / gpu requirement to run this ? |
| |
| ▲ | rhdunn 5 hours ago | parent [-] | | For just the model itself: 4 x params at F32, 2 x params at F16/BF16, or 1 x params at F8, e.g. 685GB at F8. It will be smaller for quantizations, but I'm not sure how to estimate those. For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations. You'll also need space for the context window; I'm not sure how to calculate that either. | | |
| ▲ | anvuong 4 hours ago | parent | next [-] | | I think your understanding of MoE is wrong. Depending on the settings, each token can actually be routed to multiple experts, called experts choice architecture. This makes it easier to parallelize the inference (each expert on a different device for example), but it's not simply just keeping one expert in memory. | |
| ▲ | petu 5 hours ago | parent | prev [-] | | I think your idea of MoE is incorrect. Despite the name they're not "expert" at anything in particular, used experts change more or less on each token -- so swapping them into VRAM is not viable, they just get executed on CPU (llama.cpp). | | |
| ▲ | jodleif 2 hours ago | parent [-] | | A common pattern is to offload (most of) the expert layers to the CPU. This combination is still quite fast even with slow system ram, though obviously inferior to a pure VRAM loading |
|
|
|
|
| ▲ | orena 2 hours ago | parent | prev | next [-] |
| Any results on frontier math or arc ? |
|
| ▲ | lalassu 4 hours ago | parent | prev | next [-] |
| Disclaimer: I did not test this yet. I don't want to make big generalizations. But one thing I noticed with chinese models, especially Kimi, is that it does very well on benchmarks, but fails on vibe testing. It feels a little bit over-fitting to the benchmark and less to the use cases. I hope it's not the same here. |
| |
| ▲ | msp26 4 hours ago | parent | next [-] | | K2 Thinking has immaculate vibes. Minimal sycophancy and a pleasant writing style while being occasionally funny. If it had vision and was better on long context I'd use it so much more. | |
| ▲ | vorticalbox 4 hours ago | parent | prev | next [-] | | This used to happen with bench marks on phones, manufacturers would tweak android so benchmarks ran faster. I guess that’s kinda how it is for any system that’s trained to do well on benchmarks, it does well but rubbish at everything else. | | |
| ▲ | make3 4 hours ago | parent [-] | | yes, they turned off all energy economy measures when benchmarking software activity was detected, which completely broke the point of the benchmarks because your phone is useless if it's very fast but the battery lasts one hour |
| |
| ▲ | not_that_d 4 hours ago | parent | prev | next [-] | | What is "Vibe testing"? | | |
| ▲ | catigula 4 hours ago | parent | next [-] | | He means capturing things that benchmarks don't. You can use Claude and GPT-5 back-to-back in a field that score nearly identically on. You will notice several differences. This is the "vibe". | |
| ▲ | BizarroLand 4 hours ago | parent | prev [-] | | I would assume that it is testing how well and appropriately the LLM responds to prompts. |
| |
| ▲ | make3 4 hours ago | parent | prev | next [-] | | I would assume that huge amount is spent in frontier models just making the models nicer to interact with, as it is likely one of the main things that drives user engagement. | |
| ▲ | catigula 4 hours ago | parent | prev [-] | | This is why I stopped bothering checking out these models and, funnily enough, grok. |
|
|
| ▲ | BoorishBears 11 hours ago | parent | prev | next [-] |
| 3.2-Exp came out in September: this is 3.2, along with a special checkpoint (DeepSeek-V3.2-Speciale) for deep reasoning that they're claiming surpasses GPT-5 and matches Gemini 3.0 https://x.com/deepseek_ai/status/1995452641430651132 |
|
| ▲ | wosined 3 hours ago | parent | prev | next [-] |
| Remember: If it is not peer-reviewed, then it is an ad. |
| |
| ▲ | Havoc 43 minutes ago | parent | next [-] | | Good general approach, but deepseek has thus far always delivered. And not just delivered, but under open license too. "Ad" as starting assumption seems overly harsh | |
| ▲ | vessenes 2 hours ago | parent | prev [-] | | I mean.. true. Also, DeepSeek has good cred so far on delivering roughly what their PR says they are delivering. My prior would be that their papers are generally credible. |
|
|
| ▲ | Foobar8568 4 hours ago | parent | prev | next [-] |
| At least, there is no doubt where he is from ! which version are you? 我是DeepSeek最新版本模型! 如果你想了解具体的版本号信息,我建议你: 查看官方文档 - DeepSeek官网和文档会有最准确的版本信息
关注官方公告 - 版本更新通常会在官方渠道公布
查看应用商店/网页版 - 使用界面通常会显示当前版本
我具备DeepSeek的所有最新功能特性,包括: 强大的对话和推理能力
128K上下文长度
文件上传处理(图像、文档等)
联网搜索功能(需手动开启)
完全免费使用
如果你需要知道精确的版本号用于技术对接或其他特定用途,最好直接查阅官方技术文档,那里会有最准确和详细的技术规格说明。有什么其他问题我可以帮你解答吗? |
| |
| ▲ | schlauerfox 2 hours ago | parent [-] | | It's so strange when it obviously hits a preprogrammed non-answer in these models, how can one ever trust them when there is a babysitter that interferes in an actual answer. I suppose that asking it what version it is isn't a valid question in it's training data so it's programmed to say check the documentation, but still definitely suspicious when it gives a non-answer. |
|
|
| ▲ | nimchimpsky 12 hours ago | parent | prev [-] |
| Pretty amazing that a relatively small Chinese hedge fund can build AI better than almost anyone. |
| |
| ▲ | Havoc 38 minutes ago | parent | next [-] | | Yeah they've consistently delivered. At the same time there are persistent whispers that they're not all that small and scruffy as portrayed either. | |
| ▲ | JSR_FDED an hour ago | parent | prev [-] | | And gives it away for free! |
|