Remix.run Logo
SubiculumCode 3 hours ago

I doubt it. By not releasing it, Chinese companies will be unable to break TOS and use it to acquire high quality training data...which, I suspect, is how they've kept pace

cedws 2 hours ago | parent [-]

Z.AI, Moonshot, DeepSeek all have a pipeline of data of their own now due to capturing a slice of the market through cheap tokens. It's not impossible to imagine that they might share the data too if the CCP thinks that will help their AI strategy.

SubiculumCode 10 minutes ago | parent [-]

No. Most data generated this way is poor quality. It's not the user responses and or queries. If the user does not know better than the LLM, you can generate bad responses. The value is in taking a superior model, submitting a query, and getting a higher quality output than you yourself could have generated, and using that to boost your model.