Remix.run Logo
bbarnett 7 days ago

We're certainly not going to get accurate data via the internet, that's for sure.

Just taking a look at python. How often does the AI know it's python 2.7 vs 3? You may think all the headers say /usr/bin/python3, but they don't. And code snippets don't.

How many coders have read something, then realised it wasn't applicable to their version of the language? My point is, we need to train with certainty, not with random gibberish off the net. We need curated data, to a degree, and even SO isn't curated enough.

And of course, that's even with good data, just not categorized enough.

So one way is to create realms of trust. Some data trusted more deeply, others less so. And we need more categorization of data, and yes, that reduces model complexity and therefore some capabilities.

But we keep aiming for that complexity, without caring about where the data comes from.

And this is where I think smaller companies will come in. The big boys are focusing in brute force. We need subtle.

ipaddr 7 days ago | parent [-]

New languages will emerge or at least versions of existing languages till come with codenames. What about Thunder python or uber python for the next release.