Remix.run Logo
ilamont 3 hours ago

The problem we're seeing across many professions is AI output is not getting vetted by knowledgeable people, whether it's an experienced analyst, senior engineer, expert attorney, or the resident physician. At best they skim, at worst they don't even see it at all before it's published, pushed to production, distributed to clients, or submitted to the court.

In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.

Anyone remember that item a few months back about Amazon now having senior engineers vet generative AI output (https://news.ycombinator.com/item?id=47323017)? I had to LOL when I read that. These folks are already slammed. And the idea that Amazon would allow human bottlenecks to multiply across projects and underlying infrastructure development is ridiculous.

_puk 3 hours ago | parent | next [-]

Part of the problem: you get given a complete document to review after it's been fully baked.

I'm pushing the need for basic engineering principles across whole organisations.

You wouldn't give an engineer 1000 lines of code to review without the original spec of what you're trying to achieve for context (at a minimum, ideally the reviewer was in the room when the work was introduced, and has full context).

So, these docs, they're given as an all or nothing.

Do you push back on the 39th metric that is defined to the utmost detail? Or just resign yourself to the fact that it is what it is?

A one (6 is the goto if we're talking Amazon?!) pager.. "this is what I am proposing" at least gives the skeleton of the idea to push back at the general shape of the idea, refine it, before all the emotional investment of your precious report being complete.

Y'know.. the traditional product running through the spec in a SCRUM* environment.. the engineers doing proper code reviews..

* Yes SCRUM is dead, but that's another thing.

JoshTriplett 3 hours ago | parent | next [-]

> Part of the problem: you get given a complete document to review after it's been fully baked.

Not fully baked, worse: made to sound confidently correct, orthogonal to its actual correctness.

bradleyankrom 3 hours ago | parent [-]

Like the fake food they make for commercials. Looks great on TV.

s0rce 2 hours ago | parent | prev [-]

I've had this situation and basically just had to throw out stuff that was written because its completely terrible/wrong. Either start again or just give up.

ChrisMarshallNY 3 hours ago | parent | prev | next [-]

> AI output is not getting vetted by knowledgeable people

You mean the people they fired and demoralized?

One of the things that "great [wo]men" like about "vibe-coding" (and that includes blindly producing non-code product), is that they, and they alone can now do what used to require the painful process of "passing it to context experts."

Now, the LLM is a "built-in context expert," and they don't need to vet the output anymore.

ilamont 3 hours ago | parent [-]

> Now, the LLM is a "built-in context expert," and they don't need to vet the output anymore.

Serious orgs are going to have to figure out the human layer. It will be needed, no matter how 'hallucination-free' the AI tooling gets. AI will still have some spectacularly bad fuck ups or even worse time bombs that get embedded in a system and don't become apparent until months or years later.

A lot of this will be dumped on existing staff with predictable results as they don't have the bandwidth to do it right. I can envision "output compliance" or "AI QA" becoming dedicated positions at many orgs. It's clearly needed.

asdff an hour ago | parent | next [-]

Let's be honest, how many orgs are really serious? Playing the game of the day for shareholder appeasement is taken far more seriously than whatever the domain experts might think.

anal_reactor an hour ago | parent | prev | next [-]

> It's clearly needed.

Once the hallucination rate drops below error rate of human workers, it won't be needed anymore.

cassianoleal an hour ago | parent [-]

Once the hallucination rate drops, the remaining LLM failures will become increasingly harder to spot.

hansmayer an hour ago | parent | prev [-]

[dead]

mminer237 3 hours ago | parent | prev | next [-]

As an attorney, I feel like vetting AI output takes longer than just doing it from scratch, let alone versus just using a traditional form.

With AI, I have to read through everything, often explain why it's wrong, and then rewrite everything anyways. I mean, I get way more billables, but I think it's symptomatic of how AI loses its advantage of being quick and accessible to those who don't understand the subject matter.

root-parent 7 minutes ago | parent | next [-]

Be afraid, be very afraid:

"AI Hallucination Cases" - https://www.damiencharlotin.com/hallucinations/

jimmydddd 36 minutes ago | parent | prev | next [-]

Another attorney here. I understand your plight. But I can't believe law firms are sending out briefs and opinions without carefully checking all of the citations. I mean, even when Lexis or Westlaw identifies an (actual) case on point, you still have to check if the case has been overturned, whether it is truly on point, or if it can be distinuished from your case. So even if the cited case is not a halucination, someone would still have to read and analyze the cited case in the context of the present case.

root-parent 7 minutes ago | parent [-]

>> But I can't believe law firms are sending out briefs and opinions without carefully checking all of the citations.

Update your priors: https://www.damiencharlotin.com/hallucinations/

smelendez 2 hours ago | parent | prev | next [-]

Fact-checking and editing a mediocre piece of writing be way harder than writing from scratch. Proving that something isn’t true or can’t be substantiated is hard work, and so is arguing that a word choice is subtly inappropriate.

And making a ton of corrections to a document everyone was hoping was ready to go is never fun politically.

claaams an hour ago | parent | prev | next [-]

This is the realization I had too. We had a manager update a policy at our org. He just shit it out through AI. It had tons of mistakes, people who read it had questions. Not only did it have mistakes it was causing people to do things in a way that added a manual step when an automatic process existed. Then the engineer VP commented on it asking the original author what its about who then had to bring it back up to the attention of the manager who made the first change.

It wasted many people's time, probably an order of magnitude of time wasted (and money) than if the initial person put a modicum of effort into making it right in the first place. Instead they hand it off to their life partner claude and just assume its good enough.

It's to the point where I am feeling insulted when I get ai slop like this from people. If I am expected to perform at a high level then I expect that at the very minimum the slop throwers will proof read their slop.

__turbobrew__ 2 hours ago | parent | prev | next [-]

I have experienced this several times lately when writing software with claude/codex. Sometimes vetting and steering the agent takes longer than it would have taken me if done manually. Sure you can just decide not to vet the output and go into full vibecode, but agents tend to do a lot of dumb things (such as not deleting unused private methods or having temporary variables that are not needed).

In my experience the most effective work pattern for me is using agents to perform research and feedback on high level design, then I write the code manually, then I ask the agent to review the code for potential bugs/issues and fix those. The agents have a much easier time making small changes once the design is 90% there without going fully off the rails and generating slop.

I am working on writing skills to make the agent better but it is a bit painstaking. For example I had to write this inside of a skill because sometimes the agent would just stub out methods and leave TODOs: “always fully complete the requested task before finishing edits unless input is needed”.

VTuberTTV an hour ago | parent [-]

[dead]

CamperBob2 43 minutes ago | parent | prev | next [-]

You can also feed the document or source file to another frontier-level model, ideally two others, and tell it to vet it aggressively. The goal is to goad the models into erring on the side of false positive findings rather than potentially missing true positives.

I find that if Gemini Pro agrees with Claude Opus 4.8 and GPT 5.5 on something, it's almost certainly correct at a level where I wouldn't be likely to catch any errors myself.

csomar 2 hours ago | parent | prev | next [-]

It's not really any different in programming. Like if you have a well structured code and want to do a clear refactoring across it and you know what to expect, it can speed things up. But if it's generating any significant (and relatively complex) new code, you have to go through the whole thing manually again and then you find out you have to fix way to many things and get bogged down in different paths the AI didn't do correctly.

Of course, it's pretty much impossible to hear a dissenting point of view today and everyone is going crazy on these drugs. I might be hilariously wrong but I think this is the best time to start a software company.

2fff an hour ago | parent [-]

Youre not wrong I believe.

I think its the perfect time to be contrarian - think about it. If youre wrong - So what? The world will have changed for everyone in the field. If you are right? You stand to be positioned to win big financially whilst everyone elses brain is rotting away.

Izikiel43 3 hours ago | parent | prev | next [-]

How do you use it, as in, hey, write a doc about this, or do you iterate more like a conversation?

I do the second approach for coding with smallish steps and the output is fine

SV_BubbleTime 3 hours ago | parent | prev [-]

I’m against “vibe” anything important, but the fundamental flaw with this reasoning is that unknown unknowns exist.

I can’t cite “from scratch” for something outside of my knowledge but I side LLM training or assisted search.

root-parent 21 minutes ago | parent | prev | next [-]

>>The problem we're seeing across many professions is AI output is not getting vetted by knowledgeable people

I am particularly interested in Education and Human Knowledge Management. I have seen the rate of IT training going to zero. Think about specialized training, where if you make a mistake, the consequence of your errors, are talked about on the tv news of the evening.

The whole idea everybody is just planning to save their butt, using these strings coming out of these numeric matrices, while suspending judgement, just shudders me in horror. A bit like those South Asia Airline companies, that were forbidding their pilots from landing airplanes with manual piloting, leading to an increase loss of skills causing some well known disasters...

If well paid consultants cant even bother to check their links...

fzeindl 2 hours ago | parent | prev | next [-]

> In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.

This is an interesting topic. We treat vetting output the same as doing the work ourselves, but that is not the case.

Doing the work is not the same as reviewing work done by others.

I have heard reports of software engineering companies that have gone full agentic. Their seniors only review stuff written by LLMs and it burns them out, because they have to switch context constantly.

I find this interesting because part of being a senior developer is that you are experienced enough that you won‘t make grave mistakes anymore. This is the case in many professions: you are relied upon to not make grave mistakes.

But those same people are now swamped with stuff that they are not able to review, so they will let a grave mistake slip through at some point.

So they really can‘t trust themselves anymore?

kloop 2 hours ago | parent | prev | next [-]

> The problem we're seeing across many professions is AI output is not getting vetted by knowledgeable people

The problem is that output sometimes take longer to verify than to create in the first place.

That turns AI into a deeply negative ROI system for many applications.

3 hours ago | parent | prev | next [-]
[deleted]
Ekaros 2 hours ago | parent | prev | next [-]

Also wondering on this whole review process with someone who wrote it with AI. Even if you comment and noted all issues. Do they have skills or willingness to correctly correct it all? And how many times would you need to keep the loop going for error free outcome? Is there even enough calendar time for that?

wrs 2 hours ago | parent | prev | next [-]

But wait, if knowledgeable people have to vet the output, the process will not be 10X faster and you will not be able to fire the knowledgeable people. Therefore, your objection makes no sense. QED.

ChrisLTD 3 hours ago | parent | prev | next [-]

> the idea that Amazon would allow human bottlenecks to appear across projects and underlying infrastructure is ridiculous.

Why?

SoftTalker 3 hours ago | parent [-]

Amazon is fairly well known to ruthlessly optimize every process.

So if they're having humans proofread what the AI produces, they must have found that to be necessary.

DrewADesign 2 hours ago | parent | prev | next [-]

> The problem we're seeing across many professions is AI output is not getting vetted by knowledgeable people, whether it's an experienced analyst, senior engineer, expert attorney, or the resident physician.

Yeah probably not for the same reason I left VFX rather than have a lifetime of completely disregarding my own generative creativity and cleaning up LLM-generated bullshit. Fuck that. Double-fuck creating ‘content’ to train the models.

In code, LLMs automate away a lot of the drudgery. I wasn’t sad to avoid spending a couple hours looking up the usage patterns and idioms for some ported library, or do some rote task that didn’t make the project significantly better. In most other jobs, they automate away the only fun part and leave humans with all of the drudgery.

The tech industry has always been arrogant to some extent, but assuming the world of talented professional knowledge workers and creatives would be content to professionally proofread, apply lipstick to pigs, and polish turds is a whole new level of out-of-touch. I’d rather live out of my car and dig through the garbage for bottles with deposits.

xienze 3 hours ago | parent | prev | next [-]

> In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.

I think a lot of the time it's just pure laziness. AI gives people a magical "do all the work for me" button and it can bring out the worst in them.

canyp 3 hours ago | parent [-]

I constantly battle this dichotomy where I care about the work I do but I also cannot possibly care about the corporate model, given 0 ownership of flawed processes across the org and the looming layoff that'll happen any day now.

Some people are given the button and really do not care.

fabian2k 3 hours ago | parent | prev | next [-]

If the main job is putting out a report, starting with AI is wrong in any case. What's the value of an AI-generated report, even if experts fix the biggest issues with it? Maybe this kind of report didn't have all that much value before, I don't know. But starting with AI just makes sure it's generic drivel.

watwut 2 hours ago | parent | prev [-]

It is harder to check everything then to create a thing without lying in the first place.