| ▲ | martin-t 6 hours ago | |||||||||||||||||||||||||
This feels like the OSS community is giving up. LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0]. The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which 1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies 2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#. I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public. Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either. [0]: https://news.ycombinator.com/item?id=47356000 [1]: http://prize.hutter1.net/ [2]: https://en.wikipedia.org/wiki/ELIZA_effect [3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro... | ||||||||||||||||||||||||||
| ▲ | ninjagoo 4 hours ago | parent | next [-] | |||||||||||||||||||||||||
> Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either. I think you'll find that this is not settled in the courts, depending on how the data was obtained. If the data was obtained legally, say a purchased book, courts have been finding that using it for training is fair use (Bartz v. Anthropic, Kadrey v. Meta). Morally the case gets interesting. Historically, there was no such thing as copyright. The English 1710 Statute of Anne establishing copyright as a public law was titled 'for the Encouragement of Learning' and the US Constitution said 'Congress may secure exclusive rights to promote the progress of science and useful arts'; so essentially public benefits driven by the grant of private benefits. The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited? The more the people that copy your work with attribution, the more famous you'll be. Now that's the currency of the future*. [1] You'll do it for the kudos. [2][3] | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | KK7NIL 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
> I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section. We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | tmp10423288442 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||
On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||