▲ | yibers 3 hours ago | |
I agree that the training sets for LLMs have much more training data for Python than for Rust. But C++ has existed before Python I believe. So I doubt there is 2 orders of magnitude of Python code more than C++. | ||
▲ | hibikir 3 hours ago | parent | next [-] | |
You miss how many fewer programmers were there in the early years, how much of that code was ever public, and even if it was, how useful it was, as C++ has changed drastically since, say, what we used to write in 2001. | ||
▲ | vidarh 3 hours ago | parent | prev [-] | |
It's not just a question of whether there is more actual code in a given language, but how much is available in the public and private training data. I've done work on reviewing and fine-tuning training data with a couple of providers, and the amount of Python code I got to see at least out-distanced C++ code by far more than 2 orders of magnitude. It could be a heavily biased sample, but I have no problems believing it also could be representative. |