Remix.run Logo
bensyverson 5 hours ago

The article asserts that the quality of human knowledge work was easier to judge based on proxy measures such as typos and errors, and that the lack of such "tells" in AI poses a problem.

I don't know if I agree with either assertion… I've seen plenty of human-generated knowledge work that was factually correct, well-formatted, and extremely low quality on a conceptual level.

And AI signatures are now easy for people to recognize. In fact, these turns of phrase aren't just recognizable—they're unmistakable. <-- See what I did there?

Having worked with corporate clients for 10 years, I don't view the pre-LLM era as a golden age of high-quality knowledge work. There was a lot of junk that I would also classify as a "working simulacrum of knowledge work."

bambax 5 hours ago | parent | next [-]

It's not that pre-LLM era was a "golden age of quality", far form it. It's that LLMs have removed yet another tell-tale of rushed bullshit jobs.

bensyverson 4 hours ago | parent [-]

Have they though?

happytoexplain 3 hours ago | parent | next [-]

Absolutely. Our heuristics for judging human output are useless with LLMs. We can either trust it blindly, or tediously pick over every word (guess which one people do). I've watched this cause havoc over and over at my job (I work with many different teams, one at a time).

AI signatures don't mean low quality, they just mean AI. And humans do use them (I have always used the common AI signatures). And yes, humans produce good-looking garbage, but much more commonly they produce bad-looking garbage. This is all tangential to the point.

mbbutler 21 minutes ago | parent | prev [-]

Yes. Without a doubt. An example from the software world: I came across a Rust rewrite of scikit-learn the other day that looked impressive at first glance but was full of correctness bugs upon further inspection. In the past, this kind of shoddy work would have had a code smell that was easy to clock. But now, thanks to LLMs, these kinds of projects appear to be professionally done when in reality it's just a beautiful facade in front of a pile of shit.

manquer an hour ago | parent | prev | next [-]

It was and still is a negative filter, not a positive one. Meaning it is easy to reject work because there typos and basic factual errors, absence of them is not a good measure of quality. Typically such checks is the first pass not the only criteria.

It is valuable to have this, because it the work passes the first check then it easier to identify the actual problems. Same reason we have code quality, lint style fixed before reasoning with the actual logic being written.

torben-friis 2 hours ago | parent | prev | next [-]

For me the issue is the lack of human explanation for mistakes. With a person, low quality comes from a source. Sometimes the source is lack of knowledge, sometimes time pressure, sometimes selfish goals.

Most importantly, those sources of errors tend to be consistent. I can trust a certain intern to be careful but ignorant, or my senior colleague with a newborn daughter to be a well of knowledge who sometimes misses obvious things due to lack of sleep.

With AI it's anyone's guess. They implement a paper in code flawlessly and make freshman level mistakes in the same run. so you have to engage in the non intuitive task of reviewing assuming total incompetence, for a machine that shows extreme competence. Sometimes.

mbreese 4 hours ago | parent | prev | next [-]

I’m also not sure I agree with the assertion that LLMs will produce a high quality (looking) report with correct time frames, lack of typos, and good looking figures. I’m just as willing to disregard human or LLM reports with obvious tells. An LLM or a person can produce work that’s shoddy or error filled. It may be getting harder to differentiate between a good or bad report, but that helps to shift the burden more onto the evaluator.

This is especially true if we start to see more of a split in usage between LLMs based on cost. High quality frontier models might produce better work at a higher cost, but there is also economic cost pressure from the bottom. And just like with human consultants or employees, you’ll pay more for higher quality work.

I’m not quite sure what I’m trying to argue here. But the idea that an LLM won’t produce a low quality report just seemed silly to me.

yarekt 3 hours ago | parent [-]

You’ve missed the point of original article about the proxy for quality disappearing. LLMs are trained adversarially, if that’s a word. They are trained to not have any “tells”.

Working in a team isn’t adversarial, if i’m reviewing my colleague’s PR they are not trying to skirt around a feature, or cheat on tests.

I can tell when a human PR needs more in depth reviewing because small things may be out of place, a mutex that may not be needed, etc. I can ask them about it and their response will tell me whether they know what they are on about, or whether they need help in this area.

I’ve had LLM PRs be defended by their creator until proven to be a pile of bullshit, unfortunately only deep analysis gets you there

puttycat 3 hours ago | parent | prev | next [-]

The goal of automation is to automate consistently perfect competence, not human failures.

You wouldn't use a calculator that is as good as a human and makes mistakes as often.

downboots 5 hours ago | parent | prev | next [-]

Yes. I think the main warning here is that it is an added risk. A little glitch here and there until something breaks.

maplethorpe 2 hours ago | parent | prev [-]

[dead]