Remix.run Logo
I_am_tiberius 2 days ago

Open weight!

alecco 2 days ago | parent | next [-]

Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.

Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels

Others worth mentioning:

https://github.com/deepseek-ai/DeepGEMM a competitive foundational library

https://github.com/deepseek-ai/Engram

https://github.com/deepseek-ai/DeepSeek-V3

https://github.com/deepseek-ai/DeepSeek-R1

https://github.com/deepseek-ai/DeepSeek-OCR-2

They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all

And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.

The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.

patshead 2 days ago | parent | next [-]

DeepSeek's models are indeed open weight. Why do you feel that pointing this out would be considered slander?

culi 2 days ago | parent | next [-]

I think they were reading GP's comment as a correction. Like "not open-source, just open weight". I'm not sure if their reading was accurate but I enjoyed their high effort comment nonetheless

alecco 2 days ago | parent [-]

X is full of "open weights!" corrections as a dog whistle by the anti-China crowd. And they are right about models from the Chinese Big Tech, but completely wrong about DeepSeek.

alecco 2 days ago | parent | prev [-]

>> Truly open source coming from China.

> Open weight!

They clearly were implying it's not open source.

patshead 2 days ago | parent [-]

Correct. We have open-weight models from OpenAI, Facebook, Mistral, DeepSeek, Z.ai, MiniMax, and all sorts of other companies. Most of them have fantastic and open licensing terms.

If we can't build the weights, then we don't have the source. I'm not entirely sure what an open-source model would even look like, but I am confident that these binary blobs that we are loading into llama.cpp and vllm aren't the equivalent of source code. We have absolutely no idea what sort of data went into them.

This is fine. It isn't slanderous. It is what we have, and it is awesome. Just because it is awesome doesn't make it open source.

kortilla 2 days ago | parent | prev [-]

It’s not slander to say something true. These are open weights, not open source. They don’t provide the training data or the methodology requires to reproduce these weights.

So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.

This model is as open source as a windows XP installation ISO.

alecco 2 days ago | parent [-]

> These are open weights, not open source.

Did you even read my comment?

jatora 2 days ago | parent [-]

I did. Show me the source code.

alecco 2 days ago | parent [-]

> DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there.

they-might-take-a-bit-to-publish

0-_-0 2 days ago | parent | prev [-]

Weights are the source, training data is the compiler

crazylogger 2 days ago | parent | next [-]

Training data == source code, training algorithm == compiler, model weights == compiled binary.

0-_-0 2 days ago | parent [-]

Training algorithm is the programmer, weights are the code that you run in an interpreter

ngruhn 2 days ago | parent | prev [-]

isn't it more like the data is the source, the training process is the compiler, and the weights are the binary output.