Remix.run Logo
nl 2 days ago

> Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek

This is not the same thing at all. Current legal doctrine is that ChatGPT output is not copyrightable, so at most Deepseek violated the terms of use of ChatGPT.

That isn't IP theft.

To add to that example, there are numerous open-source datasets that are derived from ChatGPT data. Famously, the Alpaca dataset kick-started the open source LLM movement by fine tuning Llama on a GPT-derived dataset: https://huggingface.co/datasets/tatsu-lab/alpaca