Remix clone Hacker News

programming

lumenwrites a day ago | parent | next [-]

Why would it get 60-80% as good as human programmers (which is what the current state of things feels like to me, as a programmer, using these tools for hours every day), but stop there?

▲

burningion a day ago | parent | next [-]

So I think there's an assumption you've made here, that the models are currently "60-80% as good as human programmers".

If you look at code being generated by non-programmers (where you would expect to see these results!), you don't see output that is 60-80% of the output of domain experts (programmers) steering the models.

I think we're extremely imprecise when we communicate in natural language, and this is part of the discrepancy between belief systems.

Will an LLM model read a person's mind about what they want to build better than they can communicate?

That's already what recommender systems (like the TikTok algorithm) do.

But will LLMs be able to orchestrate and fill in the blanks of imprecision in our requests on their own, or will they need human steering?

I think that's where there's a gap in (basically) belief systems of the future.

If we truly get post human-level intelligence everywhere, there is no amount of "preparing" or "working with" the LLMs ahead of time that will save you from being rendered economically useless.

This is mostly a question about how long the moat of human judgement lasts. I think there's an opportunity to work together to make things better than before, using these LLMs as tools that work _with_ us.

▲

kody a day ago | parent | prev | next [-]

It's 60-80% as good as Stack Overflow copy-pasting programmers, sure, but those programmers were already providing questionable value.

It's nowhere near as good as someone actually building and maintaining systems. It's barely able to vomit out an MVP and it's almost never capable of making a meaningful change to that MVP.

If your experiences have been different that's fine, but in my day job I am spending more and more time just fixing crappy LLM code produced and merged by STAFF engineers. I really don't see that changing any time soon.

▲

lumenwrites a day ago | parent [-]

I'm pretty good at what I do, at least according to myself and the people I work with, and I'm comparing its capabilities (the latest version of Claude used as an agent inside Cursor) to myself. It can't fully do things on its own and makes mistakes, but it can do a lot.

But suppose you're right, it's 60% as good as "stackoverflow copy-pasting programmers". Isn't that a pretty insanely impressive milestone to just dismiss?

And why would it just get to this point, and then stop? Like, we can all see AIs continuously beating the benchmarks, and the progress feels very fast in terms of experience of using it as a user.

I'd need to hear a pretty compelling argument to believe that it'll suddenly stop, something more compelling than "well, it's not very good yet, therefore it won't be any better", or "Sam Altman is lying to us because incentives".

Sure, it can slow down somewhat because of the exponentially increasing compute costs, but that's assuming no more algorithmic progress, no more compute progress, and no more increases in the capital that flows into this field (I find that hard to believe).

▲

kody a day ago | parent [-]

I appreciate your reply. My tone was a little dismissive; I'm currently deep deep in the trenches trying to unwind a tremendous amount of LLM slop in my team's codebase so I'm a little sensitive.

I use Claude every day. It is definitely impressive, but in my experience only marginally more impressive than ChatGPT was a few years ago. It hallucinates less and compiles more reliably, but still produces really poor designs. It really is an overconfident junior developer.

The real risk, and what I am seeing daily, is colleagues falling for the "if you aren't using Cursor you're going to be left behind" FUD. So they learn Cursor, discover that it's an easy way to close tickets without using your brain, and end up polluting the codebase with very questionable designs.

	▲	lumenwrites a day ago \| parent \| next [-]
		Oh, sorry to hear that you have to deal with that! The way I'm getting a sense of the progress is using AI for what AI is currently good at, using my human brain to do the part AI is currently bad at, and comparing it to doing the same work without AI's help. I feel like AI is pretty close to automating 60-80% of the work I would've had to do manually two years ago (as a full-stack web developer). It doesn't mean that the remaining 20-40% will be automated very quickly, I'm just saying that I don't see the progress getting any slower.
	▲	senordevnyc 21 hours ago \| parent \| prev [-]
		GPT-4 was released almost exactly two years ago, so “a few years ago” means GPT-3.5. And Claude 3.7 + Cursor agent is, for me, way more than “marginally more impressive” compared to GPT-3.5

▲

boringg a day ago | parent | prev | next [-]

Because ewe still haven't figured out fusion but its been promised for decades. Why would everything thats been promised by people with highly vested interests pan out any different?

One is inherently a more challenging physics problem.

▲

coolThingsFirst a day ago | parent | prev | next [-]

Try this, launch Cursor.

Type: print all prime numbers which are divisible by 3 up to 1M

The result is that it will do a sieve. There's no need for this, it's just 3.

	▲	mysfi a day ago \| parent [-]
		Just tried this with Gemini 2.5 Pro. Got it right with meaningful thought process.

▲

a day ago | parent | prev [-]

[deleted]

▲

mitthrowaway2 a day ago | parent | prev [-]

Can you phrase this in a concrete way, so that in 2027 we can all agree whether it's true or false, rather than circling a "no true scotsman" argument?

	▲	abecedarius a day ago \| parent [-]
		Good question. I tried to phrase a concrete-enough prediction 3.5 years ago, for 5 years out at the time: https://news.ycombinator.com/item?id=29020401 It was surpassed around the beginning of this year, so you'll need to come up with a new one for 2027. Note that the other opinions in that older HN thread almost all expected less.