| ▲ | HarHarVeryFunny 11 hours ago | |
> The science behind these models are being worked on IN PUBLIC. The research is not secret. The implementations will all catch up. Only to a limited extent - the US companies stopped sharing research a long time ago, other than Anthropic's interpretability research (which also seems to have dried up?). Interestingly most of the sharing is now coming from the Chinese side, largely DeepSeek. Ziphu/Z.ai (GLM) is also partner in the Slime RL training framework. I wouldn't call much, if any, of this "science" - it's all empiricalism. Throw spaghetti at the wall and see what sticks. There's a famous quote from Noam Shazeer: "We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence" https://arxiv.org/abs/2002.05202v1 Jakob Uszkoreit has also talked about the empiricalism that it took to make what would become the Transformer, and any complex neural network architecture work. | ||
| ▲ | adrian_b 7 hours ago | parent [-] | |
While OpenAI and Anthropic have not provided any useful information for a long time, there still are some research publications from a few US companies, e.g. NVIDIA about its Nemotron models, or Google and IBM about their small LLMs. | ||