Remix.run Logo
Aurornis 4 hours ago

> This is why I use AI for all my medical questions and doctors use AI to write software, and we both smirk at the quality the other person is getting from it.

There is an interesting third group emerging: People who acknowledge the quality problem, but think they can deal with it by applying more AI to the output.

This takes the form of people who spin up a lot of "agents" and give them personalities like security director or quality director (which are unnecessarily complex and maddeningly unpredictable ways to trigger an LLM session for doing a security review or a quality check pass).

It also includes the person who knows that their app is full of bugs, but thinks it's not a problem because they can have the AI fix the bugs as they show up. People in this class haven't encountered security breaches or data loss bugs yet. They think it's all about having Claude fix that div that isn't centered or handle that error code that shows up some times.

throw-the-towel 2 hours ago | parent | next [-]

> People who acknowledge the quality problem, but think they can deal with it by applying more AI to the output.

Brute Force: if it doesn't work, you're just not using enough.

What if they're right though?

tgma 7 minutes ago | parent | next [-]

It does not have to be brushed away as "brute force" necessarily. We can, and do, build more reliable systems out of less reliable components. In fact, most industrial engineering accepts some defect rate and builds margins around it.

Software is no different. Even without AI, you already have buggy compilers and buggy OSes and buggy libraries. You just tend to accept the risk because you have some idea of what the failure modes are and can work around it or manage the risk in some other way (buy literal insurance.)

pianopatrick 22 minutes ago | parent | prev | next [-]

There are other places where some process has an error rate and you make up for that error rate by doing the work more than once and then comparing results. For example, I've heard in a video that satellites and other space craft often have 3 or 4 processors and compare the results to make sure there were no errors due to radiation. Similarly, we have RAID arrays that store data multiple times because disks can fail. So, even if AI has a failure rate of like 20%, maybe you can make up for that by running the same prompt multiple times with slight variations or with different models, comparing the results and choosing the best.

keeganpoppen an hour ago | parent | prev [-]

they are right. bad output is user error. there, am i suiting the role appropriately? i do like 65% believe that, fwiw.

toddmorey 3 hours ago | parent | prev | next [-]

I always imagine the model rolling its silicon eyes when it’s assigned a personality (“you are an expert growth hacker”) at the start of the prompt. Was that ever actually shown to be effective? Is it still?

not_a_bot_4sho 2 hours ago | parent | next [-]

> Was that ever actually shown to be effective? Is it still?

Yes! Personas demonstrated measurable improvement in a few different ways, with caveats of course. The common intuition is that personas influence token space in beneficial ways.

I'll come back here later on desktop and link a few (still) relevant papers on this topic.

bryanrasmussen 3 hours ago | parent | prev | next [-]

I remember there were some studies that this kind of thing was effective a year or so ago, so essentially a lifetime in Model years.

However to me it seems completely reasonable that it would work, because my understanding of what happens is the model interprets what you said as:

Look for a group of people who are considered to be expert growth hackers by the world at large and answer my questions as though they were answering them.

So assuming that there are a set of questions that can best be answered by people that most other people identify as expert growth hackers then yes, I believe assigning a personality in this way should obviously work.

code_biologist 2 hours ago | parent | next [-]

It's been interesting to see how aggressively some reasoning models like to "reason" by analogy. They love to say things like "it's like a CPU" or "it's like a highway", and then they start to make logical leaps based off that rather than just using it for user explanation. Gemini 2.5 and 3.1 Pro have been particularly bad for this type of behavior. Telling models to "speak as though you are a physiologist considering the case with an expert colleague" gets them to "reason" using a more correct linguistic substrate.

The Opus models over the last year doesn't seem as vulnerable to this type of behavior and I've noticed the "identify as expert" prompt tricks aren't as meaningful there.

FeteCommuniste 3 hours ago | parent | prev | next [-]

I imagined it as kind of a shorthand for "you should be spending my tokens on looking for / addressing issues like X, Y, and Z," where X, Y, and Z are the sorts of things that an expert in [insert domain here] would be likely to care most about.

bryanrasmussen 2 hours ago | parent [-]

right, but the thing is how do they know what an expect in [insert domain here] would care about? Obviously by finding content created by

people who claim to be experts in [domain] people who others claim to be experts in [domain]

hopefully valuing membership in group two over membership in group 1.

xpct 3 hours ago | parent | prev [-]

I propose we move away from the framing of "Model years" - they're standard human research years. Yes, likely more people are working on it, and also working harder, but ever since we acquired a certain amount of compute in the world, many people were able to independently find the same patterns and train models.

Sharlin 2 hours ago | parent | prev | next [-]

There was a time when stuff like "Unreal Engine, trending on ArtStation, 8K resolution" actually worked when prompting image gen models because such labels actually correlated with higher-quality images in the web-crawled training datasets available back then.

spudlyo 3 hours ago | parent | prev | next [-]

It reminds me when people would stuff their image prompts with things like NO DEFORMED FINGERS.

cwillu 2 hours ago | parent [-]

Instructions unclear, digitized subject into a mass of fingers.

badc0ffee 4 minutes ago | parent | next [-]

Perfectly formed fingers.

sebastiennight an hour ago | parent | prev [-]

Thanks for reigniting the PTSD of reading about SCP-4051.

throw-the-towel 42 minutes ago | parent [-]

You mean the 4051 from There's No Antimemetics Division and not the mainline 4051, right?

gs17 3 hours ago | parent | prev | next [-]

I've always wondered if the go-to should have been prefilling its response with "I am an expert growth leader, and here are my thoughts:".

antonvs 9 minutes ago | parent | prev | next [-]

The reason it seems suspicious is that it's phrased in a way that's oriented towards humans. I haven't tested this, but I suspect you'd get similar results if you said something like "orient your response to that of a growth hacker." Either one is likely to have the desired effect on the stochastic result.

techpression 3 hours ago | parent | prev | next [-]

I feel it helps for the personality aspect, how it handles answers and general vocabulary, but it doesn’t in any way improve skill level, at least that’s my take from building an assistant.

Blackthorn 2 hours ago | parent | prev [-]

At least in the beginning of spicy autocomplete, this sort of role-play did work pretty dramatically at aligning a conversation to a task, though I don't think anyone ever tested it versus somewhat less cringe priming.

After that, cargo cults do what they do best.

customguy 2 hours ago | parent [-]

> though I don't think anyone ever tested it versus somewhat less cringe priming.

I really wonder if phrasing it differently would make a difference. In good faith conversations, it just doesn't happen that someone tells someone else who that person is.

MichaelZuo 3 hours ago | parent | prev | next [-]

How did you get over 52,000 karma in under 3 years with no submissions at all?

Are you averaging like 2000+ comments a month?

soperj 2 hours ago | parent | next [-]

They spin up agents, and then give them roles like commenter, and director of quality for the commenter. Although I'm unsure how the director helps since I've never seen one do actual work.

Aurornis 3 hours ago | parent | prev | next [-]

Commenting more than I should, to be honest.

I have a few periods during my daily routine where I’m waiting somewhere away from the computer and need a break from email.

A lot of my comments have double digit upvotes and some get into the mid hundreds. I try to actually read articles and provide thoughtful comments, which gets upvoted a lot more than the throwaway.

> Are you averaging like 2000+ comments a month?

52000 / 3 years would be under 1500 points per month or 48 points per day. That could be done with 1-2 helpful comments per day on popular threads.

dotancohen an hour ago | parent [-]

Serious, non-acusatory question. Your writing looks human. Do you use any writing assistants?

Where else, other than HN, do you post?

mschild 3 hours ago | parent | prev [-]

3 pages deep into their comment history only brings me to 5 days ago so probably yes.

an hour ago | parent | prev [-]
[deleted]