| ▲ | dang 2 hours ago |
| [stub for offtopicness] [see https://news.ycombinator.com/item?id=48416020 for how all this happened in the first place] |
|
| ▲ | logicprog 8 hours ago | parent | next [-] |
| Some notes on this: - I used GLM 5.1 to help with the coding and math for this. - However, I explicitly dictated where the data should be pulled from (GitHub, Bugzilla, mailing list), how it should be tagged and grouped, and what data to look at (e.g. bugs instead of regressions) - Additionally, I consulted with my wife, who has a master's degree in statistics from Penn State University for what sort of statistical methodology would be justified for this very limited data set, while still giving as much information as possible. - I know the website looks like we stereotypically consider vibe-coded websites to look, but I actually explicitly asked for that. The original HTML design looked like a website from 1995, and I just prefer how this looks. It's pretty! |
| |
| ▲ | jchw 8 hours ago | parent | next [-] | | I really struggle to believe you wrote text like: > A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement. | | |
| ▲ | logicprog 8 hours ago | parent | next [-] | | No, I didn't write the text itself. I'm typically significantly more verbose and elliptical, and more than that, the numbers and methodology changed often enough over the course of the last couple days I was working on this because I was trying to get it to be as accurate and fair as possible that trying to keep the whole thing up to date manually would have been problematic. | | |
| ▲ | jchw 7 hours ago | parent | next [-] | | Sorry to say but I'm absolutely certain I would've preferred to read your worst attempt at a write-up over the grating utter shite LLMs output. It's not even a question, this is unreadable. | | |
| ▲ | logicprog 7 hours ago | parent | next [-] | | That's interesting; IME, most people get equally angry and are as likely to disengage with a superior tone over my autism-infodump verbose essay prose as with LLM output. | | |
| ▲ | ok_dad 2 hours ago | parent [-] | | At least when I write an autistic info dump people know I wrote it. Why give your voice over to a corpo slop factory? Heck, I use LLM assistance for coding and I’ve even coded up whole features with the clankers, but giving it the right to speak for me is too much. I should also add that I read and understand every line of clanker output that I publish for others, so I’m not a vibe coder either, just adhd. |
| |
| ▲ | skeledrew 7 hours ago | parent | prev [-] | | I read it perfectly fine. I see content, not style. | | |
| ▲ | grey-area 2 hours ago | parent | next [-] | | Style is also part of the content. Word choice, grammar, register, and tone all affect meaning and communication of that meaning. The medium is part of the message. So your statement betrays a significant misunderstanding - there is no neat clean divide between style and content. Also, LLMs often generate text that is plausible, but wrong, in ways big and small. | | |
| ▲ | skeledrew 2 hours ago | parent [-] | | Well, I got the meaning in the article fine, and have no complaints. > Also, LLMs often generate text that is plausible, but wrong, in ways big and small. So do humans. Always have, always will. | | |
| ▲ | grey-area an hour ago | parent [-] | | Humans acting with intention do it a lot less. The difference is that LLMs don’t act with intention. | | |
| ▲ | skeledrew 36 minutes ago | parent [-] | | No, the difference is in the education/experience of any given human, which is mostly gated by age. Like you'd generally expect someone young to make a lot of mistakes, and as time went on they'd learn and make fewer. Pretty much the same with LLMs, which have been around for... a bit over 5 years now? What would you expect of a 5 year old acting with intention? Or 10? Or even a 15 year old? |
|
|
| |
| ▲ | 2 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | jchw 6 hours ago | parent | prev [-] | | When you say, "I see content, not style," you are separating what is being said from how it is being said. While it is great that you can extract the core message, you are missing a fundamental truth about writing: style and content are rarely completely separate. Writing involves both. Poor prose does not just make writing ugly — it creates friction, obscures nuance, and introduces ambiguity. You can eat a gourmet meal out of a dirty paper bowl. You still get the calories, but the delivery mechanism definitely impacts the experience and the perceived value of the food. Same food, different response. See? I can write slop too, I don't even need to burn down a forest to do it. If you are OK with every fucking thing being written exactly like this, good for you. I am not. | | |
| ▲ | skeledrew 2 hours ago | parent [-] | | The internet is going to really suck for you if you keep that attitude, because LLM use will only increase. Though also maybe not too much as the LLM-isms will likely be fine-tuned out of them to the point that the only way you'll be sure something is done with one is if the author left a note saying such. But maybe that'll make it suck even more as then you'd be without a definite target most of the time, always wondering how much of the thing you're reading is by human and how much by LLM... | | |
| ▲ | jchw an hour ago | parent [-] | | Uh... huh. I waited a minute to make sure you weren't going to delete this post because frankly, if I had written it, I would have. Guess not, so... Here goes. No. It is not the fault of my "attitude" that the Internet is going to suck. That is a complete reversal of the reality. The fact that even people without bad intent are already spreading slop everywhere should be enough evidence to essentially prove that there was never any hope. If this is what good actors are doing, what exactly do you expect from bad actors? Also, to stress it yet again, I don't care if people use LLMs in general. I'll even say that I don't particularly care very much if people use them without disclosing it in most cases. If you're using it like a normal tool and not merely just dumping the output verbatim there is not any particular need to disclose it any more than you'd disclose other tools, though I think people would prefer if you did just for transparency. My chief complaint is just how bad LLM slop writing is. It simply is not good at all. It would literally be much better for the Internet if they weren't so turboshit at writing. There is almost no writing style I don't prefer over garbage LLM writing. I'm dead serious. Early LLMs were worse at almost everything else, but they were a lot better at writing for sure. Something went wrong somewhere. But I do also believe that it is inherently bad to dump prose as-if you are communicating as a human, but said prose isn't actually written by a human. If someone shows me a cool drawing that they made, that means that they sat there and went through the process of sketching, possibly multiple drafts, inking, coloring/shading/painting/etc. to create an expression. This involves many human skills that take years to hone, and every detail carries someone's explicit intention. I think that this is cool, and shows a great degree of skill and effort. When you, of course, generate some crap from an image generator, it may very well look similar. It may emulate some actual defects that make it look like someone really drew it. But someone didn't. A model went directly from a text prompt and dumped out pixels on screen. No sketching. No layers. No thought processes about how to frame things or what details to include. That doesn't mean zero effort went in: I'm sure in many cases someone sat around and fudged with LoRas and inpainting for a couple hours and pulled the slot machine lever to get good seeds and etc. That doesn't mean that an AI model does not have some model for how to structure an appealing image: it does, that's obviously why the results can look decent to begin with. But when you dump out an image from an image generator and you wink wink nudge nudge present it as your own and people evaluate it as if you drew it, this is basically fraud. Everyone looking at it who doesn't know it is AI generated actually believes you went through the normal effort of drawing that image and all of the years of practicing skills and acquiring knowledge that takes. That's bullshit, and it takes away from the actual accomplishments of people who put in the work like cheating in sports does. Like yeah, a lot of people are cheating at chess, by passing off engine play as their own, but does that really make it okay? When the entire point is using your brain and not just the raw outputs themselves, doesn't that hit you as a problem? For generative AI, I personally draw this line at what I feel are expressions of creativity. If you use AI for drawing references, whatever. If you use AI to generate globs of repetitive code, whatever. Code can be creative but I do not view it as an expression of creativity and almost any tool is fair game. If you are using ML models for motion capture or some other data processing thing where humans had to do repetitive work before, whatever. Maybe these tools sometimes do devalue the work, but the LLMs are not doing the interesting part here, they're doing the boring part. (This is, in part, an admission that actually writing code is often pretty boring in and of itself, something that I realize programmers have been inconsistent with in an attempt to justify their value. But, I still believe it to be true.) So okay fine. People are reluctant to disclose that they used AI to generate text because they fear the backlash that it will get them. This is understandable. What upsets me about this is that well-meaning people are apparently falling back to the idea that because LLM backlash is strong, what would be better than either trying to just simply write your own damn posts or be honest about your usage of LLMs... Is to just try to wink wink nudge nudge pass off more or less verbatim LLM writing as if it's a post that you wrote. I am not ruining the Internet. There is literally nothing I or any group of angry mobs could do that would even remotely slow down the decay of the Internet even if we desperately wanted to. So in fact, I'm not even trying to not ruin the Internet. I don't particularly care if my attitude is not helping or hurting. I'm not having an attitude as part of some grand strategy to save or destroy the internet. I'm having an attitude, because I am pissed off. And I am pissed off because I am tired of reading posts the author probably only skimmed themselves. |
|
|
|
| |
| ▲ | moomoo11 3 hours ago | parent | prev [-] | | [flagged] |
| |
| ▲ | aozgaa 7 hours ago | parent | prev | next [-] | | In general, it seems HN does not like to read llm-generated articles. I ran into this myself when using an llm to edit some stuff I wrote. At the time, I found this a bit irritating, but with a few weeks time I see the merit. The informational content tends to fall into “derivative” territory when LLM’s write stuff. And people are here for novelty and some socialization. Also LLM prose seems optimized for engagement rather than concise communication. Takes longer to sift through linguistic boilerplate to get to the point. (The quoted bit being a case in point) | | |
| ▲ | fireflash38 2 hours ago | parent | next [-] | | Why would anyone spend time reading something that someone couldn't even spend the time to write themselves? | |
| ▲ | jchw 6 hours ago | parent | prev [-] | | I just find it to be utter dreck. It has one of the most agitating prose styles I've ever seen. I would legitimately rather read actual broken English than the cliché polished turds Claude pops out. I am not an LLM hater, I think these tools are pretty impressive and often even useful, but even if I didn't care about the fact that I want to read communication from humans and not robots (and I do care about that, FWIW) I just find the current LLMs are horrid at writing. And while the comments are always flooded with people like me, the upvotes seem to tell a different story; clearly LLM writing really does appeal to some people. Or idk, maybe a lot of people who vote on stories and don't comment don't actually read them. Hard to say for sure. | | |
| ▲ | grey-area 2 hours ago | parent [-] | | I think it’s just people don’t read before voting, they upvote on the headline and then come to discuss it here. |
|
| |
| ▲ | 8 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | otabdeveloper4 2 hours ago | parent | prev | next [-] | | I don't even know what "just placement" is. (I need a better model to translate from llmese.) | | | |
| ▲ | noctuid 8 hours ago | parent | prev [-] | | [flagged] |
| |
| ▲ | CuriouslyC 8 hours ago | parent | prev | next [-] | | I'd suggest writing the lead-in yourself and boxing AI prose separately from your prose in the analysis for future articles. You can give the humanized summary/eli5/key points, then have "details according to AI" boxes that go into nitty-gritty. People seem to dislike AI ghostwriting, but most of these people still use AI, so perhaps keeping authorship clear and separate will avoid some of the flak. | | |
| ▲ | logicprog 7 hours ago | parent [-] | | This seems fair. Of course, now that I've posted this here once, I doubt it'll get constructive engagement again, but I can at least improve this for the future |
| |
| ▲ | bri3k 7 hours ago | parent | prev [-] | | Even if everything in the article is true you should not use AI to write this. A analogy would be tobacco company report on how smoking isn’t so bad for you. |
|
|
| ▲ | ex-aws-dude 2 hours ago | parent | prev | next [-] |
| So the original unfounded claim has 400+ comments because its perfect HN ragebait The author provides evidence to the contrary and the HNers won't even engage with it instead just talking about the writing of the article in classic HN bikeshedding fashion. How about after that we talk about the formatting of the website and the colors? This site is really going down hill Where is the accountability for your own opinions? Are you guys only upvoting things that confirm your existing gripes? |
| |
| ▲ | dang 2 hours ago | parent [-] | | Comments like this do more of what they complain about, only with an extra layer of judgment. It would be preferable if someone would seed a better discussion by engaging with the article's claims/observations. | | |
| ▲ | ex-aws-dude an hour ago | parent [-] | | Why did you the admin allow such ragebait to stay on the front page then? Is that the kind of low effort posts we want around here? Just a link to a github comment of a screenshot? You're complicit here in fueling the harassment of an open source project | | |
| ▲ | dang an hour ago | parent [-] | | I don't have enough background info to understand what you're referring to here. Even if you're right, though, you shouldn't be posting comments that break the site guidelines. |
|
|
|
|
| ▲ | dang 3 hours ago | parent | prev | next [-] |
| This submission was heavily flagged, presumably because the article sounded like genai. But the article now says the following: > After posting this on Hacker News and recieving almost no substantive input, discussion, or response on the actual content of the article, I decided to rewrite all of the prose in my own voice. I've therefore turned off the flags and hopefully people can actually now discuss the claims/findings being reported. |
| |
| ▲ | hypfer 2 hours ago | parent | next [-] | | > I decided to rewrite all of the prose in my own voice. Soo... it didn't just sound like genai but was genai? ___ Huh. From the article: > If anyone complains about my verbosity or sentence structure — as they usually do, which is the reason I originally let the AI write the prose, among other reasons obsoleted by templating — they can go fuck themselves. This is kinda sad, honestly.
But also should show the author that doing what people try to bully you into doing will not stop them from bullying you. Just stick with your unique voice man. If people don't want to read that that's fine. They do not have to. You're fine .. what are those em-dashes doing there though? | | |
| ▲ | ellyagg 2 hours ago | parent | next [-] | | Right so it’s gonna be a litmus test for knowledge workers going forward if they can separate style over substance. Genai tells are style. You have to be able to evaluate the ideas. | | |
| ▲ | dang 2 hours ago | parent | next [-] | | I doubt that you can separate style from substance in that way, because you can't separate writing from thinking. I agree that it will be interesting to see how this develops going forward. One can imagine wildly varying scenarios. | |
| ▲ | hypfer 2 hours ago | parent | prev [-] | | Hm. Nah. Why? Why should I care? If it's a good thought, chances are it appears without slop around it.
If it doesn't re-appear, life will still go on regardless. No need to shift through noise just to avoid FOMO. |
| |
| ▲ | logicprog 2 hours ago | parent | prev | next [-] | | > .. what are those em-dashes doing there though? You're literally doing exactly the bullying I was trying to avoid, even while denouncing it. I like em-dashes. I have AuDHD, and they help me represent how I think. | | |
| ▲ | hypfer 2 hours ago | parent [-] | | > You're literally doing exactly the bullying I was trying to avoid Uhm, no. Really just no.
And, frankly, I find it shameful that you'd throw such an accusation at me. But I guess we can stop here. Idk man. The internet can be a bit too much sometimes. I truly get that, but this was too much from your side. Wish you all the best. | | |
| ▲ | skeledrew an hour ago | parent [-] | | Why did you point at the em-dashes? It looks very much as though you're accusing the author of an update that was also generated (possible but they seem sincere enough about wanting honest feedback on the content, and making changes for that). Or you're saying the author - and maybe everyone in general? - should no longer use em-dashes because they're a LLM smell. Yeah I'd feel offended too. It's a real pity I can't find em-dashes on my keyboard, or I'd stick them in this comment. |
|
| |
| ▲ | ajkjk 2 hours ago | parent | prev [-] | | The em dashes are fine. If someone gives them shit about their writing, that's on the critic for being shitty. If they use AI to write, that's on them for being fake. But, to write online at all requires being ready to have people be shitty to you and ideally not reacting in a way that makes the situation worse. Sounds like they need work on that part. Anyway it is basically always possible for someone to find something legitimately bad about anything a person does. The question is, how much of an issue is that? Not much actually. So you have flaws. Fine, just be flawed. It had no affect on your life beyond your reaction to the attack. And putting aside that reaction is a prerequisite for learning anything useful (or discerning that there is nothing to learn) from the experience. Good people will trust good intentions through the flaws, while shitty people will write off your work and your intentions because of the flaws (and try to make sure you feel bad about it in the process). But it's always they're too weak to express disagreement maturely, or sometimes because they're bitter and threatened by your good intentions directly. Either way, it's their flaw, not yours. | | |
| ▲ | hypfer 2 hours ago | parent [-] | | I don't think that you can successfully dismiss an obvious AI writing marker with "No these are fine, now look over there!! <lotsoftext>" Pay no attention to the man behind the curtain? | | |
| ▲ | ajkjk 10 minutes ago | parent | next [-] | | What? You are confused--human beings write em dashes also. Also you're being a dick to the OP, grow up. | |
| ▲ | logicprog 2 hours ago | parent | prev [-] | | Great, so I rewrite everything in my own prose, and now it's still "obvious AI writing," just because I'm literate. |
|
|
| |
| ▲ | otabdeveloper4 2 hours ago | parent | prev [-] | | > I decided to rewrite all of the prose in my own voice "Claude, rewrite all of the prose in my own voice." The funny part is that it probably works. |
|
|
| ▲ | roywiggins 8 hours ago | parent | prev | next [-] |
| > A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement. If you want me to read your analysis, you are going to have to make it not read like Claude wrote it. What does "placement" even mean here? |
| |
| ▲ | rroblak 8 hours ago | parent | next [-] | | Yeah, made me chuckle that an LLM— probably Claude— was used to write this. The use of "regime shift" is what gave it away for me. I've never seen a human write that, but Claude does from time to time. At least they removed occurrences of "load-bearing". | | | |
| ▲ | gamegod 8 hours ago | parent | prev | next [-] | | It's the ultimate product for marketers. It inserts itself as an advertisement into every conversation now and defends itself against criticism. Just crazy. There's no hope for the rest of us. | | |
| ▲ | logicprog 8 hours ago | parent [-] | | It's not defending itself here, both because I used GLM 5.1, not Claude, and because I was the one who decided to do this analysis, iterated through six or seven different methodologies to try to find the one that was most honest with the data that I had (all of the methodologies showed directionally and often in magnitude the exact same thing, but I wanted to do something that fit the purpose, in consultation with my wife, who, as I've mentioned elsewhere, has a master's degree in statistics), and, of course, I specifically chose all of the metrics and sources for the data. If you don't want to read the LLM prose, you can just go to the GitHub of my project, grab the scripts, and run the full pipeline. It will gather the data, build the database, and run the analysis from scratch for you, and you can look at the numbers directly. It's all repeatable. |
| |
| ▲ | logicprog 8 hours ago | parent | prev [-] | | "Placement" as in where the Claude-driven releases exist within the existing distribution of bugs per 100 commits. If they're not OOD, then nothing is unusual. Also, it wasn't written by Claude FWIW, GLM 5.1. |
|
|
| ▲ | 8 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | tappio 7 hours ago | parent | prev | next [-] |
| A lot of people criticizing because it's heavily written with LLM, but I mean, if someone produced this piece pre-LLM, would they criticize it? is the critique due to use of LLM or due to the content being truly hard to follow? I read it and I would say, there are some problems with the writing, but its not a bad piece. Of course this is a bigger problem, as its now harder to distinguish content that is "AI slop" with "content co-authored with AI that is carefully reviewed" with a quick glimpse, and the "AI smell" is quite off-putting. My initial reaction was also negative, but after glimpsing it through and reading the summaries, I found it decent summary, which also... speaks of this thread, of the content of the blog post and everything about the discussion and the strong feelings people have developed around the use of LLMs. Anyhow, it would be good to disclose the repo with the code for the statistics & use of LLM in the writing right up front. Which model, and why it was used to do the writing, etc. Its enough to say "I think it writes better than I do" or "I was in a hurry, sorry" or what ever, but it really should be disclosed. It reads more honest. ps. really... that sideways scroll? plz fix it. |
| |
| ▲ | JasonSage 7 hours ago | parent | next [-] | | > content co-authored with AI that is carefully reviewed The problem I see is that this is indistinguishable to a reader at a glance. Distancing the writing from the "AI smell" not only improves the quality by dropping the unnecessary ocean of rhetorical devices, it forces the human to have real weight and agency on what's being said. I think that act of distancing from raw LLM output through refinement is a huge quality leap. Even if you're only doing the refinement with an LLM, it forces the writing to have more voice and ideas from the author. I can see the work that went into the analysis here but again, as a casual reader, it's impossible to tell that there were any original ideas here expressed by the author. | |
| ▲ | logicprog 7 hours ago | parent | prev | next [-] | | Thank you for your constructive input, you're one of only a few others here who had any. I'll definitely do that. I didn't think, since the output was templated directly from the numbers generated by a reproducible python script, that people would get so up in arms about the aesthetics, but I guess I forgot to say that. | |
| ▲ | rjh29 3 hours ago | parent | prev [-] | | The most quoted line here is "A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement." Not only is it cringe to read, it's also nonsensical ("placement" means what?) If OP had said "here's an AI summary of the data" and generated a conscise summary, I think I would fine with it. But default AI writing is really verbose -- the opposite of a compression algorithm, spewing out cliched phrases that don't add information. It's exhausting to read, and it lacks the interesting noise of a human response. |
|
|
| ▲ | mschuster91 8 hours ago | parent | prev | next [-] |
| This article reeks of LLM "assistance" at the very least. Please, why can't people write stuff by hand themselves any more? It's a good analysis but how can I trust it without reviewing everything myself?! |
| |
| ▲ | logicprog 8 hours ago | parent [-] | | I mean, you can literally clone my repo, run the Python that rebuilds the database and does the whole data analysis and to end from scratch, and verify that the numbers are accurate. I made the code for this analysis public for that exact reason. This wasn't just an LLM running unsupervised in a loop. I came up with the methodologies and metrics and data scraping strategies precisely myself, iterated on it to try to be as honest with what the data could show as possible. | | |
| ▲ | sanitycheck 8 hours ago | parent | next [-] | | I think the point people are making is that when the text has an "AI smell" (it does), we immediately lose trust in the veracity of any claim being made and feel like continuing to read what is possibly a hallucinated fiction is a complete waste of time. At this point we're all used to skimming through thousands of AI-generated sentences every working day and constantly thinking "this is likely to be 20% bullshit", it's hard to turn that off even if I try. | | |
| ▲ | logicprog 8 hours ago | parent [-] | | Do you think it would help if I went through and manually rewrote all of the prose? If it would get people to listen, I'd be totally willing to do it. It's not like I don't like writing. I just was focused on something else when I was making this, namely trying to find a good methodology that isn't insane for this low amount of data. | | |
| ▲ | JasonSage 7 hours ago | parent | next [-] | | When there's no discernable human filter on the text output, reading the text suggests it's what the LLM produced and not what a human considered. This is low-quality--every single day I witness Codex and Claude misunderstand, mislead, and hallucinate responses based on "assumptions" and I have to fact-check them. If I wanted a statistical analysis and to be the human in the loop, I would ask the LLM myself, and I would definitely NOT read an article that just dumps the LLM output as-is. | |
| ▲ | bradrn 8 hours ago | parent | prev | next [-] | | Yes, that would help considerably. (Also, I suggest clearly acknowledging where AI was/wasn’t used. I like CuriosityC’s suggestion: https://news.ycombinator.com/item?id=48411968) | | |
| ▲ | logicprog 7 hours ago | parent [-] | | Alright, I'll do that. Although, sadly, I already posted it here, so I won't be able to post it again — I'll be stuck with this trash comments section that doesn't deal with any of the actual claims, just the aesthetics. |
| |
| ▲ | sanitycheck 7 hours ago | parent | prev [-] | | I'm pretty sure more people would read it to the end if it didn't seem like AI output, yes.. At the very least you would have fewer (maybe not 0!) comments here saying it's AI slop. |
|
| |
| ▲ | BigTTYGothGF 7 hours ago | parent | prev [-] | | > I mean, you can literally You didn't care enough to make a good writeup, why should we believe that you cared enough to make a good analysis? | | |
| ▲ | skeledrew 7 hours ago | parent [-] | | You don't have to believe. The repository is there for anyone to attempt reproducing the results. Criticisms without proof when there's a pretty straightforward way toward that proof are pointless. Go run the experiment and rip that apart if it doesn't hold up. And until then, refrain from criticizing. |
|
|
|
|
| ▲ | sfink 7 hours ago | parent | prev | next [-] |
| Wow. I am pretty insensitive to AI writing. I have never commented before about something sounding like AI, because mostly I don't notice. But this was so over the top that I spent the whole article trying to decide whether it was an intentional parody of AI writing style. This article's language is not en-US. It's not en-BR. It's en-SLOP. Yes, that was my clumsy attempt at AI parody. Here's another: this article doesn't just have AI tells. It is AI tells. Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself. As for the substance of the analysis, it seems pretty good to me but I see some flaws that weaken it a bit. The presence of "The Outlier Nobody Noticed" proves nothing and deserves no more than a passing mention. A random release introduced way more bugs than the Claude-containing releases. That provides evidence that Claude doesn't introduce more bugs only if your hypothesis is a very naive "AI is the only thing that can ever increase bug introduction rates." The whole analysis has very limited data. It's necessarily based off a single pair of releases at the very end of the chronological timeline. You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis. (By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.) "The critics' claim is a simple comparison: did the rate go up?" That's reductive. For one, these releases are known to be in reaction to a flood of (AI-discovered!) security reports, which is a novel situation and in fact is a huge confound to anyone arguing about what those two releases mean -- they're both heavily AI-written, but in response to an unusual situation. When the samples are only drawn from a distinct scenario, statistic analysis can only speak to the quality of code in that scenario. Also, another reasonable hypothesis could be: AI-written code has bugs of a different flavor that bothers users more. It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it. (If true, it still doesn't support the claim that depending on AI code is a catastrophe, fwiw.) I'm not arguing the conclusion is wrong. I'm saying the analysis proves far less than it claims to. As for whether it's a debacle for rsync to become dependent on AI code generation, I think that's a reasonable debate to have but it's not going to be resolved this reductively. |
| |
| ▲ | logicprog 5 hours ago | parent [-] | | > The presence of "The Outlier Nobody Noticed" proves nothing and deserves no more than a passing mention. A random release introduced way more bugs than the Claude-containing releases. That provides evidence that Claude doesn't introduce more bugs only if your hypothesis is a very naive "AI is the only thing that can ever increase bug introduction rates." It does not statistically prove anything, but as I thought I made extremely clear in the card where I discuss it, the point of bringing it up is different: to prove the hypocrisy of the anti-AI crowd. > By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code. The entire outrage is because people noticed what they thought was an unusual number of bugs and/or regressions in the release, saw it had Claude in it, and assumed a causal link, not just "priors about AI code." > You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis. The point I'm trying to make is that there is no evidence, based on these two releases, to think Claude made anything worse, whatsoever, and so the outrage is unfounded. This doesn't require me to prove Claude didn't cause any problems. If I ever made the latter claim, I should clean that up. > It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it. Tridge actually explicitly says he made that tradeoff on purpose, not the AI. > Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself. I've since rewritten nearly 100% of the prose in the analysis with my own, more inflammatory and verbose style. I also intentionally left in my natural mispellings and typos, to prove it was me. | | |
| ▲ | sfink 4 hours ago | parent [-] | | My post wasn't written in a way to make friends, but: > I've since rewritten nearly 100% of the prose in the analysis with my own, more inflammatory and verbose style. I also intentionally left in my natural mispellings and typos, to prove it was me. Thank you thank you thank you. I would love to be able to describe how hard it was for me to think about the actual evidence you're presenting when reading about it through the AI writing, but I suspect it's one of those things where it bothers you or it doesn't. If you'd like to empathize, maybe I'll give it one try: imagine an otherwise solid PhD thesis written in crayon. The facts and evidence and reasoning are unaffected, but it's just so hard to take it seriously. Anyway, with the rewrite I don't have to battle my kneejerk reactivity nearly as much. I'm no expert like she is, but based on what I know, I agree with your wife on the statistics. That style of analysis is going to be the best you can do with the data available. It's an accepted way to stretch data without being too dependent on an assumed distribution. It's a good analysis. I still don't come away with the conclusion that concerns about AI code maintenance are necessarily overblown, but that's fine. I think your analysis project is a very solid contribution, and it's a hell of a lot more evidence-based than the rants people were posting. |
|
|
|
| ▲ | duk3luk3 8 hours ago | parent | prev | next [-] |
| This article is unfortunately unreadable because all of the prose is unfiltered LLM slop. |
|
| ▲ | volume_tech 8 hours ago | parent | prev | next [-] |
| [flagged] |
|
| ▲ | perching_aix 8 hours ago | parent | prev [-] |
| [dead] |