Remix.run Logo
xp84 5 hours ago

The ending is a really powerful point. Most people apparently agree on two things:

1. AI is a great boon for all tasks and specialties we don’t have the skills to do ourselves. Understandable, since (A) we’re ill equipped to see the flaws in its output because it isn’t our area of expertise, and (B) it often can unlock great gains because if we trust it, we then don’t have to pay and wait for humans to do that thing.

2. AI is a terrible replacement for me - my skills are at such a high level that it’s almost theoretical that it’ll ever be good enough to replace me for 90% of what I get paid to do. It’s a tool at best.

This is why I use AI for all my medical questions and doctors use AI to write software, and we both smirk at the quality the other person is getting from it.

Aurornis 4 hours ago | parent | next [-]

> This is why I use AI for all my medical questions and doctors use AI to write software, and we both smirk at the quality the other person is getting from it.

There is an interesting third group emerging: People who acknowledge the quality problem, but think they can deal with it by applying more AI to the output.

This takes the form of people who spin up a lot of "agents" and give them personalities like security director or quality director (which are unnecessarily complex and maddeningly unpredictable ways to trigger an LLM session for doing a security review or a quality check pass).

It also includes the person who knows that their app is full of bugs, but thinks it's not a problem because they can have the AI fix the bugs as they show up. People in this class haven't encountered security breaches or data loss bugs yet. They think it's all about having Claude fix that div that isn't centered or handle that error code that shows up some times.

throw-the-towel 2 hours ago | parent | next [-]

> People who acknowledge the quality problem, but think they can deal with it by applying more AI to the output.

Brute Force: if it doesn't work, you're just not using enough.

What if they're right though?

tgma 4 minutes ago | parent | next [-]

It does not have to be brushed away as "brute force" necessarily. We can, and do, build more reliable systems out of less reliable components. In fact, most industrial engineering accepts some defect rate and builds margins around it.

Software is no different. Even without AI, you already have buggy compilers and buggy OSes and buggy libraries. You just tend to accept the risk because you have some idea of what the failure modes are and can work around it or manage the risk in some other way (buy literal insurance.)

pianopatrick 18 minutes ago | parent | prev | next [-]

There are other places where some process has an error rate and you make up for that error rate by doing the work more than once and then comparing results. For example, I've heard in a video that satellites and other space craft often have 3 or 4 processors and compare the results to make sure there were no errors due to radiation. Similarly, we have RAID arrays that store data multiple times because disks can fail. So, even if AI has a failure rate of like 20%, maybe you can make up for that by running the same prompt multiple times with slight variations or with different models, comparing the results and choosing the best.

keeganpoppen an hour ago | parent | prev [-]

they are right. bad output is user error. there, am i suiting the role appropriately? i do like 65% believe that, fwiw.

toddmorey 3 hours ago | parent | prev | next [-]

I always imagine the model rolling its silicon eyes when it’s assigned a personality (“you are an expert growth hacker”) at the start of the prompt. Was that ever actually shown to be effective? Is it still?

not_a_bot_4sho 2 hours ago | parent | next [-]

> Was that ever actually shown to be effective? Is it still?

Yes! Personas demonstrated measurable improvement in a few different ways, with caveats of course. The common intuition is that personas influence token space in beneficial ways.

I'll come back here later on desktop and link a few (still) relevant papers on this topic.

bryanrasmussen 3 hours ago | parent | prev | next [-]

I remember there were some studies that this kind of thing was effective a year or so ago, so essentially a lifetime in Model years.

However to me it seems completely reasonable that it would work, because my understanding of what happens is the model interprets what you said as:

Look for a group of people who are considered to be expert growth hackers by the world at large and answer my questions as though they were answering them.

So assuming that there are a set of questions that can best be answered by people that most other people identify as expert growth hackers then yes, I believe assigning a personality in this way should obviously work.

code_biologist 2 hours ago | parent | next [-]

It's been interesting to see how aggressively some reasoning models like to "reason" by analogy. They love to say things like "it's like a CPU" or "it's like a highway", and then they start to make logical leaps based off that rather than just using it for user explanation. Gemini 2.5 and 3.1 Pro have been particularly bad for this type of behavior. Telling models to "speak as though you are a physiologist considering the case with an expert colleague" gets them to "reason" using a more correct linguistic substrate.

The Opus models over the last year doesn't seem as vulnerable to this type of behavior and I've noticed the "identify as expert" prompt tricks aren't as meaningful there.

FeteCommuniste 2 hours ago | parent | prev | next [-]

I imagined it as kind of a shorthand for "you should be spending my tokens on looking for / addressing issues like X, Y, and Z," where X, Y, and Z are the sorts of things that an expert in [insert domain here] would be likely to care most about.

bryanrasmussen 2 hours ago | parent [-]

right, but the thing is how do they know what an expect in [insert domain here] would care about? Obviously by finding content created by

people who claim to be experts in [domain] people who others claim to be experts in [domain]

hopefully valuing membership in group two over membership in group 1.

xpct 2 hours ago | parent | prev [-]

I propose we move away from the framing of "Model years" - they're standard human research years. Yes, likely more people are working on it, and also working harder, but ever since we acquired a certain amount of compute in the world, many people were able to independently find the same patterns and train models.

antonvs 6 minutes ago | parent | prev | next [-]

The reason it seems suspicious is that it's phrased in a way that's oriented towards humans. I haven't tested this, but I suspect you'd get similar results if you said something like "orient your response to that of a growth hacker." Either one is likely to have the desired effect on the stochastic result.

Sharlin 2 hours ago | parent | prev | next [-]

There was a time when stuff like "Unreal Engine, trending on ArtStation, 8K resolution" actually worked when prompting image gen models because such labels actually correlated with higher-quality images in the web-crawled training datasets available back then.

spudlyo 3 hours ago | parent | prev | next [-]

It reminds me when people would stuff their image prompts with things like NO DEFORMED FINGERS.

cwillu 2 hours ago | parent [-]

Instructions unclear, digitized subject into a mass of fingers.

badc0ffee a few seconds ago | parent | next [-]

Perfectly formed fingers.

sebastiennight 43 minutes ago | parent | prev [-]

Thanks for reigniting the PTSD of reading about SCP-4051.

throw-the-towel 38 minutes ago | parent [-]

You mean the 4051 from There's No Antimemetics Division and not the mainline 4051, right?

gs17 3 hours ago | parent | prev | next [-]

I've always wondered if the go-to should have been prefilling its response with "I am an expert growth leader, and here are my thoughts:".

Blackthorn 2 hours ago | parent | prev | next [-]

At least in the beginning of spicy autocomplete, this sort of role-play did work pretty dramatically at aligning a conversation to a task, though I don't think anyone ever tested it versus somewhat less cringe priming.

After that, cargo cults do what they do best.

customguy an hour ago | parent [-]

> though I don't think anyone ever tested it versus somewhat less cringe priming.

I really wonder if phrasing it differently would make a difference. In good faith conversations, it just doesn't happen that someone tells someone else who that person is.

techpression 3 hours ago | parent | prev [-]

I feel it helps for the personality aspect, how it handles answers and general vocabulary, but it doesn’t in any way improve skill level, at least that’s my take from building an assistant.

MichaelZuo 3 hours ago | parent | prev | next [-]

How did you get over 52,000 karma in under 3 years with no submissions at all?

Are you averaging like 2000+ comments a month?

soperj 2 hours ago | parent | next [-]

They spin up agents, and then give them roles like commenter, and director of quality for the commenter. Although I'm unsure how the director helps since I've never seen one do actual work.

Aurornis 3 hours ago | parent | prev | next [-]

Commenting more than I should, to be honest.

I have a few periods during my daily routine where I’m waiting somewhere away from the computer and need a break from email.

A lot of my comments have double digit upvotes and some get into the mid hundreds. I try to actually read articles and provide thoughtful comments, which gets upvoted a lot more than the throwaway.

> Are you averaging like 2000+ comments a month?

52000 / 3 years would be under 1500 points per month or 48 points per day. That could be done with 1-2 helpful comments per day on popular threads.

dotancohen an hour ago | parent [-]

Serious, non-acusatory question. Your writing looks human. Do you use any writing assistants?

Where else, other than HN, do you post?

mschild 3 hours ago | parent | prev [-]

3 pages deep into their comment history only brings me to 5 days ago so probably yes.

mc32 an hour ago | parent | prev [-]

Looks like you’re hinting at the people who use things like gstack. Is there anyone who one could say is using it within its limits?

CGMthrowaway 5 hours ago | parent | prev | next [-]

Well said. Everyone agrees AI can't do their job, so it ends up doing everyone else's.

I'm not sure how to formulate it yet but it seems there is some Peter Principle/Gell-Mann Effect corollary that is AI-related we can say here.

Perhaps: "AI rises to the level of its users' incompetence."

Or: "Confidence in AI output is inversely proportional to one's ability to verify it"

baby_souffle 5 hours ago | parent | next [-]

> Confidence in AI output is inversely proportional to one's ability to verify it

I like this / generally agree. The only wrinkle is that - for some tasks - the verification _is_ "run the script, see if it worked, don't care how... just that it did" which is distinctly different from "not only did it do it correctly, it did so in the most direct and performant way possible".

For a _lot_ of what I use LLMs to build, the former is all I need.

OptionOfT 4 hours ago | parent [-]

And for as long that that runs on your computer, I don't care.

But the problem is that for many people they now believe it's ok to present a 10k line vibe-coded PR that only has been verified against external behavior, and some Senior Engineer needs to review it, in time, under pressure, without too much push-back, and lastly, it's the Senior Engineer that gets paged at 2am because something has fallen over.

Also, those scripts tend to start a life of their own, and because it looks good enough, people don't look at them again.

I recall a bug of someone vibe-coding a cleanup script for folders older than $x (on Windows).

Get the CreationDate, and sort. Delete older than $x. Except CreationDate can be null and null is always smaller than $x.

Oops.

theendisney 4 hours ago | parent | prev | next [-]

>Well said. Everyone agrees AI can't do their job, so it ends up doing everyone else's.

Its like basic income, everyone will stop working except from you.

cwmoore 3 hours ago | parent [-]

It is not at all like universal basic income, except that both of those are misleadingly simple quips.

Kiro an hour ago | parent | prev | next [-]

> Everyone agrees AI can't do their job, so it ends up doing everyone else's.

In real life I haven't met a single programmer who doesn't think AI can do their job.

If someone would actually say that I would immediately think they have hubris and overestimate their skills.

whazor 3 hours ago | parent | prev [-]

But using AI itself is a job too. It takes effort to correctly prompt, to steer it, to verify it, and to improve the harness.

kingkongjaffa 3 hours ago | parent [-]

show me a prompt that is meaningfully expertly crafted beyond just providing Do's, Do not's, task context, and a goal.

> Correctly prompt, to steer it, to verify it, and to improve the harness.

I doubt this a lot. The average AI user is running claude code as the harness, or Codex etc. prompting has no secret incantations, and steer and verify is just knowing what the answer should roughly look like, which is a domain skill, not an AI skill.

dools 2 hours ago | parent [-]

> show me a prompt that is meaningfully expertly crafted beyond just providing Do's, Do not's, task context, and a goal.

The way that information is organised and formatted matters for compliance. It’s pretty similar to writing good procedural documentation for humans.

s_tec 5 hours ago | parent | prev | next [-]

It seems to be a general principle: If AI is better than you at something, you use it. If AI is worse than you, you don't.

Each time the frontier models get better, I see another wave of AI doubters suddenly become believers. People say things like, "AI couldn't code last year, but now I use it for everything!" Interesting. Now we know how that the person who said this has the coding skills of a Claude Opus 4.5 or whenever the frontier was when they flipped.

Meanwhile, the rest of us keep using AI as simple tools, like the person in the article. I wonder how long it will take before computers can program better than me, and I flip too.

r3trohack3r 3 hours ago | parent | next [-]

I’m not sure I agree with this but maybe I just lack self awareness?

There are large portions of my codebases that are essentially extremely verbose grunt work. My UI stack, IaC YAML, thin CRUD routes, etc.

I know what the code is supposed to look like when it’s done being written, but it’s going to take me for freaking ever to type it all out.

I can just few shot it now in an hour. Plan -> feedback loop -> build -> review loop.

Does it try to do weird stuff? Yeah. And then I’m just like “that’s weird, no, the components should be broken up like XYZ” and then it’s not weird anymore. Occasionally (1% of the time) I just do a quick refactor myself instead of trying to tell the agent harness what to do.

I can get something fairly close to the ballpark of what I would have done but in like single digit percentage of the time.

And the result is that I can spit out a bunch of purpose built tools (personal tools, internal tools for teams, etc.) that I never would have been able to justify building otherwise.

greiskul 3 hours ago | parent | prev | next [-]

> the person who said this has the coding skills of a Claude Opus 4.5 or whenever the frontier was when they flipped

It's not about just skill. It's a matter of skill, time, and how critical the software you are writing is. There is a lot of software that is not critical. That is not close to security mechanisms. And that even if the code quality is not the highest, it does not matter.

Even if you are the best coder in the world, you would already become more productive by using ai. Things that in the past you might have not coded yourself but delegated to an intern, or things that you wouldn't even delegate to an intern because they are just too boring to do like some refactorings.

Like I had this project at work that was written without typescript strict mode turned on. When I turned it on, it had over 700 errors. I might be better than AI to fix every single of one these errors. But my time is worth more than that in doing other things. But I can, and did, ask AI to fix every single one. And then I reviewed it batches, and something that my team wanted to do for multiple years and nobody had the time for, finally got done.

black3r 2 hours ago | parent | prev [-]

the sentiment "AI couldn't code last year, but now I use it for everything!" rings true for me... but I didn't flip cause AI is now better than me... I flipped cause now I am faster with AI than without it...

A year ago the AI output was so bad that getting it up to my standards took more than writing it myself from scratch. And nowadays it is faster for me to start with AI output and iterate from there to reach quality submission.

The ninety-ninety[0] rule was a thing talked about 40 years ago, long before anyone thought of AI coding. AI can nowadays make the first 90% of the task very fast and good enough. The last 10% is still the hardest part of coding by far.

[0]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule

ozgung 2 hours ago | parent | prev | next [-]

I feel like I am the only one thinking AI is actually much better than me in the things I'm supposed to do well. I feel like that for years now, so it's not about the latest generation of models. I can't imagine a single thing I can really compete with an AI at this stage. I am not sure if I am under-skilled or others are overconfident. Maybe people who feel like me don't say this out laud.

dfee 2 hours ago | parent [-]

agree. it's strange reading the loud voices that are counter to my lived experience. llms just have seemingly infinite depth - or can at least debug and execute without fatigue.

PaulRobinson 5 hours ago | parent | prev | next [-]

I was saying something like this a few years ago when people were getting first excited about ChatGPT. The gap has narrowed, but not by as much as people think.

AI produces output that is very convincing to a non-expert, and (dangerously), it's so good at looking like an expert, they might believe that it is an expert. But the moment you ask someone to use it for something they're an expert in themselves, the holes appear wide, consistent & obvious.

My favourite moment of seeing this in action was watching AI-worrier TV host/comedian Bill Maher. He has spent years talking about the dangers of AI taking everyone's jobs, destroying civilisation, ruining the economy, starting wars, "it's just getting better and better all the time", and so on. But one night he let slip a tell. "It's no good at writing jokes. Not yet, anyway". There you go, Bill... connect those dots...

There is real utility in it being a tool to help experts apply their expertise, as in this story where it speeds up some tasks to help the translator do part of the work, enhance their expertise, allow them to be more productive.

It's a better screwdriver, a better hammer, in the hands of somebody who knows what needs a screwdriver or a hammer. It doesn't replace them. It can't replace them. It's a tool that enhances the human, not an alternative.

I don't understand why this is not widely understood yet, but I'm sure it will in due course.

And I don't expect this to change. Even if the latest model scores 100% on every benchmark, all that really tells us is that it's now more productive/efficient than it was before at helping experts do that work, not that it can replace everyone in that category of work.

perrygeo an hour ago | parent | prev | next [-]

At what point does this become an issue for data quality and global epistemology?

It seems inevitable that we ask for more AI assistance on topics we don't understand. And therefore have the least context to correct. Result: a flood of poor quality information.

In areas we DO understand, we'll either not ask AI at all, or treat its results with a higher degree of skepticism. Result: a lack of high quality information.

Inevitably this means a higher volume of non-expert prompts gets translated into the next generation of internet content. AIs are pumping out more novice-level text and less expert guidance.

The result will be an internet full content written from the perspective of an ignoramus; not addressing any complex issues, staying surface level on every topic. Which will cascade into future models, etc.

ben_w 3 hours ago | parent | prev | next [-]

> 2. AI is a terrible replacement for me - my skills are at such a high level that it’s almost theoretical that it’ll ever be good enough to replace me for 90% of what I get paid to do. It’s a tool at best.

Most? Perhaps it's depression, but I look back at my career and wonder if any code I've ever been paid to write is beyond what current AI can do.

Sure, this leaves me with the non-coding tasks of UX taste, and code review + a few other forms of QA (and, when self-employed, project management, game design, etc.), but man, I'm someone who actually learned to read in part on the Commodore 64 user manual (as in, trying to understand what PEAK and POKE meant concurrent with having "Jack and Jill go up the hill" picture books).

(And no, I'm not claiming LLMs make bug-free code, I see the bugs LLMs make during my code review of their output and some of them are awful, hence "this leaves me with …").

borzi 3 hours ago | parent [-]

And? How valuable are individual lines of code? To the author's point, I'm sure AI can translate individual sentences perfectly, but miss the nuance of communication in a bigger project or body of text. In the same vein, when was the last time someone put an AI on a ralph loop, posted the result on r/vibecoding and ended up with actual users.

ben_w 2 hours ago | parent [-]

> How valuable are individual lines of code?

Don't care, only time I've measured them was personal curiosity about hand-written projects, and one time I was trying to work out how many blank comments a co-worker had put into their codebase*.

How valuable are features? Management kept giving me them, and I always just assumed they'd decided which ones were important. But I've seen git histories of apps where the same feature was added twice, 5 years apart, by the same developer.

> In the same vein, when was the last time someone put an AI on a ralph loop, posted the result on r/vibecoding and ended up with actual users.

How often do the megacorps currently boasting that 80% of their code is now vibed, post anything (other than adverts) to reddit?

* 20% of the whole project, or 24 thousand blank comments.

madrox 3 hours ago | parent | prev | next [-]

This is a new form of Gell-Mann Amnesia: https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect

holmesworcester 5 hours ago | parent | prev | next [-]

Reminded me of this post by EY. (You're making a different point about existing expertise, not LLM expertise, but I think it holds in general.)

Every month a new guy discovers LLMs; discovers a skill the current LLMs require to get good results; and writes about the future jobs that will always be available for smart people like HIM, that are SKILLED in using LLMs.

The next generation of AIs doesn't need his fancy prompt. The image model goes from needing to type in just the right set of weird words and cryptic sorcerous invocations, to most people being able to type in English what they want and get a pretty good result.

There are still tasks that require careful invocation. But they are a much smaller fraction of all the tasks people are trying to do, or you can get a bleh result without the elaborate invocation to get it really good. And to improve on the bleh result you need to be substantially more of an expert than back when the Guy was memorizing a rule about adding "trending on Artstation" to the image prompts, as would always require a human paid to do that.

Another generation of AIs comes out. The next generation of Clever Skills is obsolete. Image models just obey the instructions for compositing panels without mixing them up, and you don't need to be an expert to get them to do it right. Another human value-add is gone. A wider set of tasks require no human expert.

Now a new Guy notices LLMs have become useful in his field for the first time. He discovers they require SKILL to use CORRECTLY. He posts about how there will always be jobs for humans who are SKILLED in using LLMs like HIM.

But it is not an infinite cycle. It is not the same each time it repeats. Now the Guy is a highly paid programmer or a career mathematician in 2026, instead of a graphic artist in 2023.

In six months the models will no longer require his vaunted Skills.

And by then there will be another Guy.

But the process doesn't continue forever. The Guys are coming from fields that were harder and harder for AIs. The brief centaur eras are shorter and shorter.

Today it is writers who are laughing at how bad the LLMs are at their job, and who will perhaps soon be posting about how it takes Skill to get an LLM to do their job Correctly. But the models are coming faster, and the eras of kinds of human value-add in each field are shortening.

There is a point when you run out of Guys, either because the centaur eras are too short for people to develop SKILLs and post to Twitter about them; or because there are not lands left for AIs to conquer; or because ordinary people are not reassured by some Nobel laureate proclaiming there will always be jobs for Nobel laureates with the SKILLS to prompt robotized biology labs Correctly.

But we'll never run out of amateur economists who assert entirely without a brief contemporary example that there will always be jobs for humans skilled at operating AIs!

We'll run out of professional economists saying it when nobody is paid for that work anymore.

I guess we'll also run out of amateur economists when they're dead.

Source: https://x.com/allTheYud/status/2057136382817231151

chrsw 3 hours ago | parent | prev | next [-]

My fear is in the future it won't matter. People will accept slop because while they can be convinced it's not as good as it could be, it's good enough. To them it's good enough because it's fast and cheap not because it's actually good. There won't be any room in the economy for the value human output brings because the economy will rearrange itself around AI and become completely dependent on cheap output, good enough or not.

singpolyma3 an hour ago | parent [-]

[dead]

aphroz an hour ago | parent | prev | next [-]

Except that it is also quite difficult to assess the quality of a doctor or a software developer if you don't work in the field.

I've heard numerous cases where AI solved medical issues that doctor couldn't.

huflungdung an hour ago | parent | prev [-]

[dead]