| ▲ | citizenpaul 8 hours ago | |||||||||||||
I think we hit peak AI improvement velocity sometime mid last year. The reality is all progress was made using a huge backlog of public data. There will never be 20+ years of authentic data dumped on the web again. I've hoped against but suspected that as time goes on LLMs will become increasingly poisoned by the the well of the closed loop. I don't think most companies can resist the allure of more free data as bitter as it may taste. Gemini has been co opted as a way to boost youtube views. It refuses to stop showing you videos no matter what you do. | ||||||||||||||
| ▲ | darth_aardvark 4 hours ago | parent | next [-] | |||||||||||||
> I don't think most companies can resist the allure of more free data as bitter as it may taste. Mercor, Surge, Scale, and other data labelling firms have shown that's not true. Paid data for LLM training is in higher demand than ever for this exact reason: Model creators want to improve their models, and free data no longer cuts it. | ||||||||||||||
| ||||||||||||||
| ▲ | tehjoker 7 hours ago | parent | prev | next [-] | |||||||||||||
When I asked ChatGPT for its training cutoff recently it told me 2021 and when I asked if that's because contamination begins in 2022 it said yes. I recall that it used to give a date in 2022 or even 2023. | ||||||||||||||
| ||||||||||||||
| ▲ | Imustaskforhelp 7 hours ago | parent | prev [-] | |||||||||||||
To be honest for most things probably yea. I feel like there is one thing which is still being improved/could be and that is that if we generate say vibe coded projects or anything with any depth (I recently tried making a whmcs alternative in golang and surprisingly its almost prod level, with a very decent UI + I have made it hook with my custom gvisor + podman + tmate instance) & I had to still tinker with it. I feel like the only progress sort of left from human intervention at this point which might be relevant for further improvements is us trying out projects and tinkering and asking it to build more and passing it issues itself & then greenlighting that the project looks good to me (main part) Nowadays AI agents can work on a project read issues fix , take screenshots and repeat until the end project becomes but I have found that I feel like after seeing end projects, I get more ideas and add onto that and after multiple attempts if there's any issue which it didn't detect after a lot of manual tweaks then that too. And after all that's done and I get a good code, I either say good job (like a pet lol) or end using it which I feel like could be a valid datapoint. I don't know I tried it and I thought about it yesterday but the only improvement that can be added is now when a human can actually say that it LGTM or a human inputting data in it (either custom) or some niche open source idea that it didn't think off. | ||||||||||||||