| ▲ | ninjahawk1 7 hours ago |
| The way to develop in this space seems to be to give away free stuff, get your name out there, then make everything proprietary. I hope they still continue releasing open weights. The day no one releases open weights is a sad day for humanity. Normal people won’t own their own compute if that ever happens. |
|
| ▲ | culi 6 hours ago | parent | next [-] |
| I think that's an overgeneralization. We've seen all the American models be closed and proprietary from the start. Meanwhile the non-American (especially the Chinese ones) have been open since the start. In fact they often go the opposite direction. Many Chinese models started off proprietary and then were later opened up (like many of the larger Qwen models) |
| |
| ▲ | robot_jesus 6 hours ago | parent | next [-] | | > We've seen all the American models be closed and proprietary from the start What about Gemma and Llama and gpt-oss, not to mention lots of smaller/specialized models from Nvidia and others? I would never argue that China isn't ahead in the open weights game, of course, but it's not like it's "all" American models by any stretch. | | |
| ▲ | walthamstow 6 hours ago | parent [-] | | gpt-oss is good but I haven't heard anything about an update. It seems like one and done, to shut up people complaining about non-Open AI |
| |
| ▲ | embedding-shape 6 hours ago | parent | prev [-] | | > We've seen all the American models be closed and proprietary from the start. Most*. OpenAI, contrary to popular belief, actually used to believe in open research and (more or less) open models. GPT1 and GPT2 both were model+code releases (although GPT2 was a "staged" release), GPT3 ended up API-only. | | |
| ▲ | zozbot234 6 hours ago | parent | next [-] | | OpenAI has released their GPT-OSS series more recently. | | |
| ▲ | magicalhippo 3 hours ago | parent | next [-] | | Recently, more like 20 years ago in LLM-years. It's a good model though, would be nice with a refresh. | |
| ▲ | 6 hours ago | parent | prev [-] | | [deleted] |
| |
| ▲ | culi 6 hours ago | parent | prev [-] | | That's fair but those days seem so long gone now. Also the Chinese models aren't following a typical American SaaS playbook which relies on free/cheap proprietary software for early growth. They are not just publishing their weights but also their code and often even publishing papers in Open Access journals to explicitly highlight what methods and advancements were made to accomplish their results | | |
|
|
|
| ▲ | visarga 7 hours ago | parent | prev | next [-] |
| I think it is in the interest of chip makers to make sure we all get local models |
| |
| ▲ | qalmakka 6 hours ago | parent | next [-] | | I think they're in a win-win situation. Big AI companies would love to see local computing die in favour of the cloud because they are well aware the moment an open model that can run on non ludicrous consumer hardware appears, they're screwed. In this situation Nvidia, AMD and the like would be the only ones profiting from it - even though I'm not convinced they'd prefer going back to fighting for B2C while B2B Is so much simpler for them | | |
| ▲ | zozbot234 6 hours ago | parent | next [-] | | If you want to run AI models at scale and with reasonably quick response, there's not many alternatives to datacenter hardware. Consumer hardware is great for repurposing existing "free" compute (including gaming PCs, pro workstations etc. at the higher end) and for basic insurance against rug pulls from the big AI vendors, but increased scale will probably still bring very real benefits. | | |
| ▲ | qalmakka 6 hours ago | parent [-] | | Currently, yes. But I don't find it hard to imagine that in a while we could get reasonably light open models with a level of reasoning similar to current opus, for instance. In such a scenario how many people would opt to pay for a way more expensive cloud subscription? Especially since lots of people are already not that interested in paying for frontier models nowadays where it makes sense. Unless keep on getting a constant, never ending stream of improvements we're basically bound to get to a point where unless you really need it you are ok with the basic, cheaper local alternative you don't have to pay for monthly. | | |
| ▲ | zozbot234 6 hours ago | parent | next [-] | | I think average users are already okay with the reasoning level they'd get with current open models. But the big AI firms have pivoted their frontier models towards the enterprise: coding and research, as opposed to general chat. And scale is quite important for these uses, ordinary pro hardware is not enough. | |
| ▲ | twoodfin 6 hours ago | parent | prev [-] | | This is really just a question of product design meeting the technology. Today, lots of integer compute happens on local devices for some purposes, and in the cloud for others. Same is already true for matmul, lots of FLOPS being spent locally on photo and video processing, speech to text, … No obvious reason you wouldn’t want to specialize LLM tasks similarly, especially as long-running agents increasingly take over from chatbots as the dominant interaction architecture. |
|
| |
| ▲ | BobbyJo 6 hours ago | parent | prev [-] | | At a consistent amount of usage, datacenters are at least an order of magnitude more hardware efficient. I'm sure Nvidia and AMD would be fine fighting for B2C if it meant volume would be 10+x. Now, given they can't satisfy current volume, they are forced to settle for just having crazy margins. | | |
| ▲ | qalmakka 6 hours ago | parent [-] | | The problem with B2C is that you need to have leverage of some kind (more demanding applications, planned obsolescence, ...) in order to get people to keep on buying your product. The average consumer may simply consider themselves satisfied with their old product they already own and only replace it when it breaks down. On the contrary, with the cloud you can keep people hooked on getting the latest product whether they need it or not, and get artificial demand from datacentres and such. | | |
| ▲ | BobbyJo 4 hours ago | parent [-] | | I think businesses running datacenters are much less likely to frivolously buy the latest GPUs with no functional incentive than general consumers are... |
|
|
| |
| ▲ | zozbot234 7 hours ago | parent | prev | next [-] | | Definitely. Many big hardware firms are directly supporting HuggingFace for this very reason. | |
| ▲ | ninjahawk1 7 hours ago | parent | prev [-] | | True, chip companies have the opposite mindset, Nvidia is making their own open weights I believe |
|
|
| ▲ | elorant 6 hours ago | parent | prev | next [-] |
| This is obviously a strategic move at a national level. Keep publishing competing free models to erode the moat western companies could have with their proprietary models. As long as the narrative serves China there will be no turn to proprietary models. |
|
| ▲ | baq 7 hours ago | parent | prev | next [-] |
| Always has been, it’s literally saas; the slight difference is that the lowest tier subscriptions at the frontier labs are basically free trials nowadays, too |
|
| ▲ | Zavora 6 hours ago | parent | prev | next [-] |
| Its the new freeware model! |
|
| ▲ | CamperBob2 6 hours ago | parent | prev | next [-] |
| I'm a little more optimistic than that. I suspect that the open-weight models we already have are going to be enough to support incremental development of new ones, using reasonably-accessible levels of compute. The idea that every new foundation model needs to be pretrained from scratch, using warehouses of GPUs to crunch the same 50 terabytes of data from the same original dumps of Common Crawl and various Russian pirate sites, is hard to justify on an intuitive basis. I think the hard work has already been done. We just don't know how to leverage it properly yet. |
| |
| ▲ | thesz 6 hours ago | parent | next [-] | | Change layer size and you have to retrain. Change number of layers and you have to retrain. Change tokenization and you have to retrain. | | |
| ▲ | dTal 5 hours ago | parent | next [-] | | None of that is true, at least in theory. You can trivially change layer size simply by adding extra columns initialized as 0, effectively embedding your smaller network in a larger network. You can add layers in a similar way, and in fact LLMs are surprisingly robust to having layers added and removed - you can sometimes actually improve performance simply by duplicating some middle layers[0]. Tokenization is probably the hardest but all the layers between the first and last just encode embeddings; it's probably not impossible to retrain those while preserving the middle parts. [0]
https://news.ycombinator.com/item?id=47431671
https://news.ycombinator.com/item?id=47322887 | | |
| ▲ | thesz 3 hours ago | parent | next [-] | | You took a simple path, embedding smaller into larger. What if you need to reduce number of layers and/or width of hidden layers? How will you embed larger into smaller? As for the "addition of same layers" - would the process of "layers to add" selection be considered training? What if you still have to obtain the best result possible for given coefficient/tokenization budget? I think that my comment express general case, while yours provide some exceptions. | |
| ▲ | andriy_koval 3 hours ago | parent | prev [-] | | there is evidence it is useful in some cases, but obviously no evidence it is enough if you chase to beat SOTA. |
| |
| ▲ | altruios 5 hours ago | parent | prev | next [-] | | Hopefully we will find a way to make it so that making minor changes don't require a full retrain. Training how to train, as a concept, comes to mind. | |
| ▲ | CamperBob2 5 hours ago | parent | prev [-] | | And yet the KL divergence after changing all that stuff remains remarkably similar between different models, regardless of the specific hyperparameters and block diagrams employed at pretraining time. Some choices are better, some worse, but they all succeed at the game of next-token prediction to a similar extent. To me, that suggests that transformer pretraining creates some underlying structure or geometry that hasn't yet been fully appreciated, and that may be more reusable than people think. Ultimately, I also doubt that the model weights are going to turn out to be all that important. Not compared to the toolchains as a whole. | | |
| ▲ | thesz 3 hours ago | parent [-] | | That "underappreciated underlying structure or geometry" can be just an artifact of the same tokenization used with different models. Tokenization breaks up collocations and creates new ones that are not always present in the original text as it was. Most probably, the first byte pair found by simple byte pair encoding algorithm in enwik9 will be two spaces next to each other. Is this a true collocation? BPE thinks so. Humans may disagree. What does concern me here is that it is very hard to ablate tokenization artifacts. |
|
| |
| ▲ | pduggishetti 6 hours ago | parent | prev [-] | | I do not think it's common crawl anymore, its common crawl++ using paid human experts to generate and verify new content, weather its code or research. I believe US is building this off the cost difference from other countries using companies like scale, outlier etc, while china has the internal population to do this |
|
|
| ▲ | testbjjl 7 hours ago | parent | prev | next [-] |
| Any reason for them to do this other than altruism? I don’t think this can be regulated. |
| |
|
| ▲ | WarmWash 6 hours ago | parent | prev | next [-] |
| The Chinese state wants the world using their models. People think that Chinese AI labs are just super cool bros that love sharing for free. The don't understand it's just a state sponsored venture meant to further entrench China in global supply and logistics. China's VCs are Chinese banks and a sprinkle of "private" money. Private in quotes because technically it still belongs to the state anyway. China doesn't have companies and government like the US. It just has government, and a thin veil of "company" that readily fool westerners. |
| |
| ▲ | devilsdata 13 minutes ago | parent | next [-] | | I'm Aussie. Please explain to me; why should I care whether Chinese SOEs or the US tech companies are winning? Neither have my best interests at heart. | |
| ▲ | subw00f 6 hours ago | parent | prev | next [-] | | As opposed to the US, which just has companies and a thin veil of “government”. | | |
| ▲ | culi 6 hours ago | parent [-] | | Also many of these Chinese companies aren't just opening their weights. They are open sourcing their code AND publishing detailed research papers alongside them to reveal how they accomplished what they accomplished. That's very different from an American SaaS model which relies of free but proprietary software for early growth |
| |
| ▲ | zozbot234 6 hours ago | parent | prev | next [-] | | I'm not sure how local AI models are meant to "entrench China in global supply and logistics". The two areas have nothing to do with one another. You can easily run a Chinese open model on all-American hardware. | | |
| ▲ | WarmWash 6 hours ago | parent [-] | | They are building a pipeline, and the goal is to get people in the door. If you forever stand at the entrance eating the free samples, that's fine, they don't care. Other people are going through the door and you are still consuming what they feed you. Doesn't mean it's going to be bad or evil, but they are staking their territory of control. | | |
| ▲ | zozbot234 6 hours ago | parent [-] | | Oh for sure, they're getting a whole lot of Chinese people and other non-Westerners through the door already - mostly, the people who are being ignored or even blocked outright by the big Western labs. That's territory we purposely abandoned, and they're going to control it by default. |
|
| |
| ▲ | jillesvangurp 6 hours ago | parent | prev | next [-] | | Like with nuclear technology, it's not healthy for only one country to dominate AI. The cat is already out of the bag and many countries now have the ability to train and run models. Silicon Valley has bootstrapped this space. But it should be noted that they are using AI talent from all over the world and it was sort of inevitable that this technology would get around. Lots of Chinese, Indian, Russian, and Europeans are involved. As for what comes next, it's probably going to be a bit of a race for who can do the most useful and valuable things the cheapest. If OpenAI and Anthropic don't make it, the technology will survive them. If they do, they'll be competing on quality and cost. As for state sponsorship, a lot of things are state sponsored. Including in the US. Silicon Valley has a rich history that is rooted in massive government funding programs. There's a great documentary out there the secret history of Silicon Valley on this. Not to mention all the "cheap" gas that is currently powering data centers of course comes on the back of a long history of public funding being channeled into the oil and gas industry. | | |
| ▲ | WarmWash 6 hours ago | parent [-] | | >As for state sponsorship, a lot of things are state sponsored. You can make any comparison you want if you use adjectives rather than values. I can say that cars use a massive amount of water (all those radiators!) to try and downplay agricultural water usage. But its blatantly disingenuous. SV is overwhelmingly private (actual constitutional private) money. To the point that you should disregard people saying otherwise, just like you would the people saying cars use massive amounts of water. |
| |
| ▲ | OtomotO 6 hours ago | parent | prev | next [-] | | So an OPEN model that I can run on my own fucking hardware will entrench China in global supply and logistics how? Contrary: How will the closed, proprietary models from Anthropic, "Open"AI and Co. lead us all to freedom? Freedom of what exactly? Freedom of my money? At some point this "anti-communism" bullshit propaganda has to stop. And that moment was decades ago! | | |
| ▲ | Zetaphor 6 hours ago | parent [-] | | Anything that isn't explicitly to the benefit of US interests must be against them /s |
| |
| ▲ | grttsww 6 hours ago | parent | prev | next [-] | | So what? I still prefer that over US total dominance. Let them fight it out. | | |
| ▲ | joquarky 5 hours ago | parent | next [-] | | Yeah, a lot of people are still living within the paradigm of tribalism: my team good, other team bad. But the events of the past decade or so have clearly demonstrated that there are no "good" actors. I personally couldn't care less who wins in the China vs US AI competition, both sides have a long list of pros and cons. | |
| ▲ | spwa4 6 hours ago | parent | prev [-] | | I'd get a bit informed about what exactly Chinese dominance entails. Ask a few Uyghurs, Cantonese Hong Kongers, or even Tibetans. Then decide ... | | |
| ▲ | joquarky 5 hours ago | parent [-] | | Ask a few Native Americans about dominance. Or maybe families of African descent. Or maybe families of Japanese Americans who lived in the US during WWII. Or maybe people of Latin descent living in the US today. | | |
| ▲ | jazz9k 5 hours ago | parent [-] | | The US examples you just gave happened decades (and in some cases hundreds) of years ago. The difference is that it's happening in China right now, and nobody cares. You really don't see the difference? | | |
| ▲ | well_ackshually 5 hours ago | parent [-] | | The US is the biggest threat to the world right now, and is actively supporting a genocide in Palestine as well as war crimes in Lebanon. I'm perfectly happy to let the chinese get a piece of the pie and fight the US, no matter how bad they are right now. |
|
|
|
| |
| ▲ | darkwater 6 hours ago | parent | prev [-] | | Well, isn't this what the US and really any other power in the world has always done, since forever? |
|
|
| ▲ | ai_fry_ur_brain 5 hours ago | parent | prev [-] |
| Why is it sad? These things are useles all around, along with the people who overuse them. It would be a great day for humanity if people would stopping glazing text autocomplete as revolutionary. |