new | show | ask | jobs Github

nfw2 2 days ago

[flagged]

▲

AstroBen 2 days ago | parent | next [-]

I don't get it? Yes you should require a valid reason before believing something

The only objective measures I've seen people attempt to take have at best shown no productivity loss:

https://substack.com/home/post/p-172538377

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

This matches my own experience using agents, although I'm actually secretly optimistic about learning to use it well

▲

johnfn 2 days ago | parent | next [-]

The burden you are placing is too high here. Do you demand controlled trials for everything you do or else you refuse to use it or accept that other people might see productivity gains? Do you demand studies showing that static typing is productive? Syntax highlighting? IDEs or Vim? Unit testing? Whatever language you use?

Obviously not? It would be absurd to walk into a thread about Rust and say “Rust doesn’t increase your productivity and unless you can produce a study proving it does then your own personal anecdotes are worthless.”

Why the increased demand for rigor when it comes to AI specifically?

▲

AstroBen 2 days ago | parent | next [-]

Typically I hear how other people are doing things and I test it out for myself. Just like I'm doing with AI

Actually IDEs vs vim are a perfect analogy because they both have the ability to feel like they're helping a tonne, and at the end of the work day neither group outperforms the other

I'm not standing on the sidelines criticizing this stuff. I'm using it. I'm growing more and more skeptical because it's not noticably helping me deliver features faster

At this point I'm at "okay record a video and show me these 3x gains you're seeing because I'm not experiencing the same thing"

The increased demand for rigor is because my experience isn't matching what others say

I can see a 25% bump in productivity being realistic if I learn where it works well. There are people claiming 3-10x. It sounds ridiculous

▲

bluGill 2 days ago | parent | next [-]

I canzt see a 25% jump in productivity because writting code isn't even 25% of what I do. Even if it was infitiely fast I still can't get that high.

▲

bonesss 2 days ago | parent [-]

Given a 25% hypothetical boost: there are categories of errors vibe testing vibed code will bring in, we know humans suck at critical reading. On the support timeline of an Enterprise product that’s gonna lead to one or more true issues.

At what point is an ‘extra’ 25% coding overhead worth it to ensure a sane human reasonably concerned about criminal consequences for impropriety read all code when making it, and every change around it? To prevent public embarrassment that can and will chase off customers? To have someone to fire and sue if need be?

[Anecdotally, the inflection point was finding tests updated to short circuit through mildly obfuscated code (introduced after several reviews). Paired with a working system developed with TDD, that mistake only becomes obvious when the system stops working but the tests don’t. I wrote it, I ran the agents, I read it, I approved it, but was looking for code quality not intentional sabotage/trickery… lesson learned.]

From a team lead perspective in an Enterprise space, using 25% more time on coding to save insane amounts of aggressive and easy to flubb review and categories of errors sounds like a smart play. CYA up front, take the pain up front.

	▲	bluGill a day ago \| parent [-]
		Not that you are wrong, but you don't seem to understand my point. I spend less than 25% of my time writing code. I also do code review, various story/architecture planning, testing, bug triage, required training, and other management/people activities; these take up more than 75% of my time. Even if AI could do vibe code as well as me infinitely fast it still wouldn't be a 75% improvement.

▲

Rapzid 2 days ago | parent | prev [-]

Anecdotally the people who seem to be most adamant about the efficiency of things like vim or Python are some of the slowest engineers I've worked with when it comes to getting shit done. Even compared to people who don't really care for their preferred tech much lol.

I wonder how many 10x AI bros were 1/10th engineers slacking off most of the week before the fun new tech got them to actually work on stuff.

Obviously not all, and clearly there are huge wins to be had with AI. But I wonder sometimes..

▲

shimman 2 days ago | parent | prev | next [-]

I honestly wish we had studies that truly answered these Qs. Modern programming has been a cargo cult for a good 20 years now.

	▲	nfw2 2 days ago \| parent [-]
		People who think syntax highlighting is useful are a cargo cult?

▲

llmslave2 2 days ago | parent | prev [-]

Do you just believe everything everybody says? No quantifiable data required, as long as someone somewhere says it it must be true?

One of the reasons software is in decline is because it's all vibes, nobody has much interest in conducting research to find anything out. It doesn't have to be some double blinded peer reviewed meta analysis, the bar can still be low, it just should be higher than "I feel like"...

▲

johnfn 2 days ago | parent | next [-]

You don't seem to have answered my questions - you are just reiterating your own point (which I already responded to). Again I ask you - do you have studies to prove that syntax highlighting is useful or are you just using it because of vibes? Do you have research showing that writing in your language of choice is faster than Assembly?

▲

llmslave2 2 days ago | parent [-]

I actually prefer no syntax highlighting, and I certainly wouldn't make any claims about it being useful. But something being "useful" is often personal - I find IDEs useful, others find Vim useful, maybe one is better or worse than the other or maybe we're all different and our brains function in different ways and that explains the difference.

With assembly versus say, Go for writing a web server? That's trivially observable, good luck arguing against that one.

▲

nfw2 2 days ago | parent [-]

That's the whole point. The sky is blue is trivially observable. Any claim that someone has disproven something that is trivially observable should be met with skepticism.

If you have something that needs to be done, and an agent goes and does the whole thing for you without mistakes, it is trivially observable that that is useful. That is the definition of usefulness.

	▲	llmslave2 2 days ago \| parent [-]
		But useful in the context of these debates isn't that it solves any single problem for someone. Nobody is arguing that LLM's have zero utility. So I don't really see what your point is?

▲

nfw2 2 days ago | parent | prev [-]

here are some

https://resources.github.com/learn/pathways/copilot/essentia...

https://www.anthropic.com/research/how-ai-is-transforming-wo...

https://www.mckinsey.com/capabilities/tech-and-ai/our-insigh...

▲

llmslave2 2 days ago | parent [-]

They're all marketing slop lol. Go look at their methodology. Absolutely shite.

▲

nfw2 2 days ago | parent [-]

This is what you claimed the bar was "it just should be higher than 'I feel like'"

Now you are moving it because your statement is provably false.

Your criticism of it is based on vibes. What specifically is wrong with the methodologies?

One of them broke randomly developers into two groups, one with access to ai and one without, timed them to complete the same task, and compared the results. That seems fine? Any measurement of performance in a lab environment comes with caveats, but since real world accounts you dismiss as vibes, that seems like the best you can do.

	▲	llmslave2 2 days ago \| parent [-]
		I'm sorry but I'm not going to take "research" about Claude seriously from Anthropic, the company who makes and sells Claude. I'm also not going to do that for Copilot from Microsoft, the company who makes and sells Copilot.

▲

nfw2 2 days ago | parent | prev [-]

Why do you believe that the sky is blue? What randomized trial with proper statistical controls has shown this to be true?

▲

admdly 2 days ago | parent | next [-]

I’m not sure why you’d need or want a randomised controlled trial to determine the colour of the sky. There have been empirical studies done to determine the colour and the reasoning for it - https://acp.copernicus.org/articles/23/14829/2023/acp-23-148... is an interesting read.

▲

AstroBen 2 days ago | parent | prev | next [-]

I can see it, it's independently verifiable by others, and it's measurable

▲

nfw2 2 days ago | parent [-]

The same is true of AI productivity

https://resources.github.com/learn/pathways/copilot/essentia...

https://www.anthropic.com/research/how-ai-is-transforming-wo...

https://www.mckinsey.com/capabilities/tech-and-ai/our-insigh...

▲

intended 2 days ago | parent | next [-]

> https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

Shows that devs overestimate the impact of LLMs on their productivity. They believe they get faster when they take more time.

Since Anthropic, GitHub are fair game here’s one from Code Rabbit - https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-gen...

▲

2 days ago | parent | prev | next [-]

[deleted]

▲

davidgerard 2 days ago | parent | prev [-]

lol those are all self-reports of vibes

then they put the vibes on a graph, which presumably transforms them into data

▲

nfw2 2 days ago | parent [-]

"Both GitHub and outside researchers have observed positive impact in controlled experiments and field studies where Copilot has conferred:

55% faster task completion using predictive text

Quality improvements across 8 dimensions (e.g. readability, error-free, maintainability)

50% faster time-to-merge"

how is time-to-merge a vibe?

	▲	Orygin a day ago \| parent [-]
		The subject is productivity. Time to merge is as useful metric as Lines of Code to determine productivity. I can merge 100s of changes but if they are low quality or incur bugs, then it's not really more productive.

▲

llmslave2 2 days ago | parent | prev [-]

If you point a spectrometer at the sky during the day in non-cloudy conditions you will observe readings peaking in the roughly 450-495 nanometers range, which crazily enough, is the definition of the colour blue [0]!

Then you can research Rayleigh scattering, of which consists of a large body of academic research not just confirming that the sky is blue, but also why.

But hey, if you want to claim the sky is red because you feel like it is, go ahead. Most people won't take you seriously just like they don't take similar claims about AI seriously.

[0] https://scied.ucar.edu/image/wavelength-blue-and-red-light-i...

	▲	lkjdsklf 2 days ago \| parent \| next [-]
		Ever seen a picture of the blue sky from the ISS?
	▲	nfw2 2 days ago \| parent \| prev [-]
		you needed a spectrometer to tell you the sky is blue?

▲

2 days ago | parent | prev | next [-]

[deleted]

▲

llmslave2 2 days ago | parent | prev | next [-]

[flagged]

▲

nfw2 2 days ago | parent [-]

pretending the only way anybody comes to a conclusion about anything is by reading peer-journals is an absurdly myopic view of epistemological practices in the real world

▲

llmslave2 2 days ago | parent [-]

Nobody is pretending that's the case...

▲

nfw2 2 days ago | parent [-]

your argument was that it's laughable on its face that anyone should be more skeptical of one claim vs another a priori

▲

llmslave2 2 days ago | parent [-]

No, it's that it's hypocritical to make a bunch of unfounded claims and then whine that someone who is conducting actual research and trying to be objective isn't doing it well enough or whatever.

▲

nfw2 2 days ago | parent [-]

To say that anyone who says they are more productive with ai is making an unfounded claim is evidence that you believe you that the only path to knowledge is formal research, which you claimed to not believe.

	▲	llmslave2 2 days ago \| parent [-]
		Paste this conversation into ChatGPT and have it explain what I said because I just can't be arsed to correct you any longer.

▲

bdangubic 2 days ago | parent | prev [-]

[flagged]