I completely agree.

On a side note.. ya’ll must be prompt wizards if you can actually use the LLM code.

I use it for debugging sometimes to get an idea, or a quick sketch up of an UI.

As for actual code.. the code it writes is a huge mess of spaghetti code, overly verbose, with serious performance and security risks, and complete misunderstanding of pretty much every design pattern I give it..

▲ brushfoot 4 days ago | parent | next [-]

I read AI coding negativity on Hacker News and Reddit with more and more astonishment every day. It's like we live in different worlds. I expect the breadth of tooling is partly responsible. What it means to you to "use the LLM code" could be very different from what it means to me. What LLM are we talking about? What context does it have? What IDE are you using?

Personally, I wrote 200K lines of my B2B SaaS before agentic coding came around. With Sonnet 4 in Agent mode, I'd say I now write maybe 20% of the ongoing code from day to day, perhaps less. Interactive Sonnet in VS Code and GitHub Copilot Agents (autonomous agents running on GitHub's servers) do the other 80%. The more I document in Markdown, the higher that percentage becomes. I then carefully review and test.

▲ systemf_omega 4 days ago | parent | next [-]

> B2B SaaS

Perhaps that's part of it.

People here work on all kinds of industries. Some of us are implementing JIT compilers, mission-critical embedded systems or distributed databases. In code bases like this you can't just wing it without breaking a million things, so LLM agents tend to perform really poorly.

▲ sunrunner 4 days ago | parent | next [-]

> People here work on all kinds of industries.

Yes, it would be nice to have a lot more context (pun intended) when people post how many LoC they introduced.

B2B SaaS? Then can I assume that a browser is involved and that a big part of that 200k LoC is the verbose styling DSL we all use? On the other hand, Nginx, a production-grade web server, is 250k LoC (251,232 to be exact [1]). These two things are not comparable.

The point being that, as I'm sure we all agree, LoC is not a helpful metric for comparison without more context, and different projects have vastly different amounts of information/feature density per LoC.

[1] https://openhub.net/p/nginx

▲ Fr0styMatt88 4 days ago | parent | next [-]

I primarily work in C# during the day but have been messing around with simple Android TV dev on occasion at night.

I’ve been blown away sometimes at what Copilot puts out in the context of C#, but using ChatGPT (paid) to get me started on an Android app - totally different experience.

Stuff like giving me code that’s using a mix of different APIs and sometimes just totally non-existent methods.

With Copilot I find sometimes it’s brilliant but it’s so random as to when that will be it seems.

	▲	motorest 4 days ago \| parent [-]
		> Stuff like giving me code that’s using a mix of different APIs and sometimes just totally non-existent methods. That has been my experience as well. We can control the surprising pick of APIs with basic prompt files that clarify what and how to use in your project. However, when using less-than-popular tools whose source code is not available, the hallucinations are unbearable and a complete waste of time. The lesson to be learned is that LLMs depend heavily on their training set, and in a simplistic way they at best only interpolate between the data they were fed. If a LLM is not trained with a corpus covering a specific domain them you can't expect usable results from it. This brings up some unintended consequences. Companies like Microsoft will be able to create incentives to use their tech stack by training their LLMs with a very thorough and complete corpus on how to use their technologies. If Copilot does miracles outputting .NET whereas Java is unusable, developers have one more reason to adopt .NET to lower their cost of delivering and maintaining software.

▲ godelski 4 days ago | parent | prev [-]

  > when people post how many LoC they introduced.

Pretty ironic you and the GP talk about lines of code.

From the article:

  Garman is also not keen on another idea about AI – measuring its value by what percentage of code it contributes at an organization.

  “It’s a silly metric,” he said, because while organizations can use AI to write “infinitely more lines of code” it could be bad code.

  “Often times fewer lines of code is way better than more lines of code,” he observed. “So I'm never really sure why that's the exciting metric that people like to brag about.”

I'm with Garman here. There's no clean metric for how productive someone is when writing code. At best, this metric is naive, but usually it is just idiotic.

Bureaucrats love LoC, commits, and/or Jira tickets because they are easy to measure but here's the truth: to measure the quality of code you have to be capable of producing said code at (approximately) said quality or better. Data isn't just "data" that you can treat as a black box and throw in algorithms. Data requires interpretation and there's no "one size fits all" solution. Data is nothing without its context. It is always biased and if you avoid nuance you'll quickly convince yourself of falsehoods. Even with expertise it is easy to convince yourself of falsehoods. Without expertise it is hopeless. Just go look at Reddit or any corner of the internet where there's armchair experts confidently talking about things they know nothing about. It is always void of nuance and vastly oversimplified. But humans love simplicity. You need to recognize our own biases.

▲ sunrunner 4 days ago | parent | next [-]

> Pretty ironic you and the GP talk about lines of code.

I was responding specifically to the comment I replied to, not the article, and mentioning LoC as a specific example of things that don't make sense to compare.

	▲	godelski 3 days ago \| parent [-]
		`> the comment I replied to` Which was the "GP", or "grand parent" (your comment is the parent of my comment), that I was referring to.

▲ darkwater 4 days ago | parent | prev [-]

> Bureaucrats love LoC

Looks like vibe-coders love them too, now.

▲

overfeed 4 days ago | parent [-]

...but you repeat yourself (c:

	▲	godelski 3 days ago \| parent [-]
		Made me think of a post from a few days ago where Pournelle's Iron Law of Bureaucracy was mentioned[0]. I think vibe coders are the second group. "dedicated to the organization itself" as opposed to "devoted to the goals of the organization". They frame it as "get things done" but really, who is not trying to get things done? It's about what is getting done and to what degree is considered "good enough." [0] https://news.ycombinator.com/item?id=44937893

▲ drusepth 4 days ago | parent | prev | next [-]

On the other hand, fault-intolerant codebases are also often highly defined and almost always have rigorous automated tests already, which are two contexts where coding agents specifically excel in.

▲ JambalayaJimbo 4 days ago | parent | prev | next [-]

I work on brain dead crud apps much of my time and get nothing from LLMs.

▲

benjaminwootton 4 days ago | parent | next [-]

Try Claude Code. You’ll literally be able to automate 90% of the coding part of your job.

▲

dns_snek 4 days ago | parent | next [-]

We really need to add some kind of risk to people making these claims to make it more interesting. I listened to the type of advice you're giving here on more occasions than I can remember, at least once for every major revision of every major LLM and always walked away frustrated because it hindered me more than it helped.

> This is actually amazing now, just use [insert ChatGPT, GPT-4, 4.5, 5, o1, o3, Deepseek, Claude 3.5, 3.9, Gemini 1, 1.5, 2, ...] it's completely different from Model(n-1) you've tried.

I'm not some mythical 140 IQ 10x developer and my work isn't exceptional so this shouldn't happen.

▲

ramesh31 4 days ago | parent [-]

The dark secret no one from the big providers wants to admit is that Claude is the only viable coding model. Everything else descends into a mess of verbose spaghetti full of hallucinations pretty quickly. Claude is head and shoulders above the rest and it isn't even remotely close, regardless of what any benchmark says.

▲

JackFr 4 days ago | parent | next [-]

Stopping by to concur.

Tried about four others, and to some extent I always marveled about capabilities of latest and greatest I had to concede they didn’t make faster. I think Claude does.

▲

jve 4 days ago | parent | prev [-]

As a GPT user, your comment triggered me wanting to search how superior is Claude... well, these users don't think it is: https://www.reddit.com/r/ClaudeAI/comments/1l5h2ds/i_paid_fo...

	▲	ramesh31 3 days ago \| parent [-]
		>As a GPT user, your comment triggered me wanting to search how superior is Claude... well, these users don't think it is: https://www.reddit.com/r/ClaudeAI/comments/1l5h2ds/i_paid_fo... That poster isn't comparing models, he's comparing Claude Code to Cline (two agentic coding tools), both using Claude Sonnet 4. I was pretty much in the same boat all year as well; using Cline heavily at work ($1k+/month token spend) and I was sold on it over Claude Code, although I've just recently made the switch, as Claude Code has a VSCode extension now. Whichever agentic tooling you use (Cline, CC, Cursor, Aider, etc.) is still a matter of debate, but the underlying model (Sonnet/Opus) seems to be unanimously agreed on as being in a league of its own, and has been since 3.5 released last year.

▲

delta_p_delta_x 4 days ago | parent | prev [-]

I've been working on macOS and Windows drivers. Can't help but disagree.

Because of the absolute dearth of high-quality open-source driver code and the huge proliferation of absolutely bottom-barrel general-purpose C and C++, the result is... Not good.

On the other hand, I asked Claude to convert an existing, short-ish Bash script to idiomatic PowerShell with proper cmdlet-style argument parsing, and it returned a decent result that I barely had to modify or iterate on. I was quite impressed.

Garbage in, garbage out. I'm not altogether dismissive of AI and LLMs but it is really necessary to know where and what their limits are.

	▲	Sharlin 4 days ago \| parent [-]
		I'm pretty sure the GP referred to GGP's "brain dead CRUD apps" when they talked about automating 90% of the work.

▲

murukesh_s 4 days ago | parent | prev [-]

I found the opposite - I am able to get 50% improvement in productivity for day to day coding (mix of backend, frontend), mostly in Javascript but have helped in other languages. But you have to carefully review though - and have extremely well written test cases if you have to blindly generate or replace existing code.

▲ motorest 4 days ago | parent | prev [-]

> In code bases like this you can't just wing it without breaking a million things, so LLM agents tend to perform really poorly.

This is a false premise. LLMs themselves don't force you to introduce breaking changes into your code.

In fact, the inception of coding agents was lauded as a major improvement to the developer experience because they allow the LLMs themselves to automatically react to feedback from test suites, thus speeding up how code was implemented while preventing regressions.

If tweaking your code can result in breaking a million things, this is a problem with your code and how you worked to make it resilient. LLMs are only able to introduce regressions if your automated tests are unable to catch any of these million of things breaking. If this is the case then your problems are far greater than LLMs existing, and at best LLMs only point out the elephant in the room.

▲ malfist 4 days ago | parent | prev | next [-]

Perhaps the issue is you were used to writing 200k lines of code. Most engineers would be agast at that. Lines of code is a debit not a credit

▲ Deestan 4 days ago | parent | next [-]

I am now making an emotional reaction based on zero knowledge of the B2B codebase's environment, but to be honest I think it is relevant to the discussion on why people are "worlds apart".

200k lines of code is a failure state. At this point you have lost control and can only make changes to the codebase through immense effort, and not at a tolerable pace.

Agentic code writers are good at giving you this size of mess and at helping to shovel stuff around to make changes that are hard for humans due to the unusable state of the codebase.

If overgrown barely manageble codebases are all a person's ever known and they think it's normal that changes are hard and time-consuming and needing reams of code, I understand that they believe AI agents are useful as code writers. I think they do not have the foundation to tell mediocre from good code.

I am extremely aware of the judgemental hubris of this comment. I'd not normally huff my own farts in public this obnoxiously, but I honestly feel it is useful for the "AI hater vs AI sucker" discussion to be honest about this type of emotion.

▲

mind-blight 4 days ago | parent | next [-]

It really depends on what your use case is. E.g. of you're dealing with a lot of legacy integrations, dealing with all the edge cases can require a lot of code that you can't refactor away through cleverness.

Each integration is hopefully only a few thousand lines of code, but if you have 50 integrations you can easily break 100k loc just dealing with those. They just need to be encapsulated well so that the integration cruft is isolated from the core business logic, and they become relatively simple to reason about

▲

bubblyworld 4 days ago | parent | prev | next [-]

> 200k lines of code is a failure state.

What on earth are you talking about? This is unavoidable for many use-cases, especially ones that involve interacting with the real world in complex ways. It's hardly a marker of failure (or success, for that matter) on its own.

▲

haskellshill 2 days ago | parent | prev | next [-]

If all your code depends on all your other code, yeah 200k lines might be a lot. But if you actually know how to code, I fail to understand why 200k lines (or any number) of properly encapsulated well-written code would be a problem.

Further, if you yourself don't understand the code, how can you verify that using LLMs to make major sweeping changes, doesn't mess anything up, given that they are notorious for making random errors?

▲

throwawaymaths 4 days ago | parent | prev | next [-]

200k loc is not a failure state. suppose your b2b saas has 5 user types and 5 downstream SAASes it connects to, thats 20k loc per major programming unit. not so bad.

	▲	krainboltgreene 3 days ago \| parent [-]
		That's actually insane.

▲

johnnyanmac 4 days ago | parent | prev [-]

I agree on principle, and I'm sure many of us know how much of a pain it is to work on million or even billion dollar codebases, where even small changes can be weeks of beauracracy and hours of meetings.

But with the way the industry is, I'm also not remotely surprised. We have people come and go as they are poached, burned out, or simply life circumstances. The training for the new people isn't the best, and the documentation for any but the large companies are probably a mess. We also don't tend to encourage periods to focus on properly addressing tech debt, but focusing on delivering features. I don't know how such an environment over years, decades doesn't generate so much redundant, clashing, and quirky interactions. The culture doesn't allow much alternative.

And of course, I hope even the most devout AI evangelists realize that AI will only multiply this culture. Code that no one may even truly understand, but "it works". I don't know if even Silicon Valley (2014) could have made a parody more shocking than the reality this will yield.

▲ rootnod3 4 days ago | parent | prev | next [-]

In that case, LLMs are full on debt-machines.

▲

threecheese 4 days ago | parent [-]

Ones that can remediate it though. If I am capable of safely refactoring 1,000 copies of a method, in a codebase that humans don’t look at, did it really matter if the workload functions as designed?

	▲	sdenton4 4 days ago \| parent \| next [-]
		Jeebus, 'safely' is carrying a hell of a lot of water there...
	▲	JustExAWS 4 days ago \| parent \| prev \| next [-]
		In a type safe language like C# or Java, why could you need an LLM for that? it’s a standard guaranteed safe (as long as you aren’t using reflection) refactor with ReSharper.
	▲	uoaei 4 days ago \| parent \| prev [-]
		Features present in all IDEs over the last 5 years or so are better and more verifiably correct for this task than probabilistic text generators.

▲ d0mine 4 days ago | parent | prev | next [-]

You might have meant "code is a liability not an asset"

▲ rahimnathwani 4 days ago | parent | prev [-]

  Lines of code is a debit not a credit

Perhaps you meant this the other way around. A credit entry indicates an increase in the amount you owe.

▲

bmurphy1976 4 days ago | parent | next [-]

It's a terrible analogy either way. It should be each extra line of code beyond the bare minimum is a liability.

▲

malfist 4 days ago | parent | prev [-]

You are absolutely correct, I am not a finance wizard

	▲	aspenmayer 4 days ago \| parent [-]
		Liability vs asset is what you were trying to say, I think, but everyone says that, so to be charitable I think you were trying to put a new spin on the phrasing, which I think is admirable, to your credit.

▲ s1mplicissimus 4 days ago | parent | prev | next [-]

It's interesting how LLM enthusiasts will point to problems like IDE, context, model etc. but not the one thing that really matters:

Which problem are you trying to solve?

At this point my assumption is they learned that talking about this question will very quickly reveal that "the great things I use LLMs for" are actually personal throwaway pieces, not to be extended above triviality or maintained over longer than a year. Which, I guess, doesn't make for a great sales pitch.

	▲	phito 4 days ago \| parent \| next [-]
		It's amazing to make small custom apps and scripts, and they're such high quality (compared to what I would half-ass write and never finish/polish them) that they don't end up as "throwaway", I keep using them all the time. The LLM is saving me time to write these small programs, and the small programs boost my productivity. Often, I will solve a problem in a crappy single-file script, then feed it to Claude and ask to turn it into a proper GUI/TUI/CLI, add CI/CD workflows, a README, etc... I was very skeptical and reluctant of LLM assisted coding (you can look at my history) until I actually tried it last month. Now I am sold.
	▲	maigret 4 days ago \| parent \| prev \| next [-]
		At work I need often smaller, short lived scripts to find this or that insight, or to use visualization to render some data and I find LLMs very useful at that. A non coding topic, but recently I had difficulty articulating a summarized state of a complex project, so I spoke 2 min in the microphone and it gave me a pretty good list of accomplishments, todos and open points. Some colleagues have found them useful for modernizing dependencies of micro services or to help getting a head start on unit test coverage for web apps. All kinds of grunt work that’s not really complex but just really moves quite some text. I agree it’s not life changing, but a nice help when needed.
	▲	wan23 3 days ago \| parent \| prev [-]
		I use it to do all the things that I couldn't be bothered to do before. Generate documentation, dump and transform data for one off analyses, write comprehensive tests, create reports. I don't use it for writing real production code unless the task is very constrained with good test coverage, and when I do it's usually to fix small but tedious bugs that were never going to get prioritized otherwise.

▲ Ballas 4 days ago | parent | prev | next [-]

There is definitely a divide in users - those for which it works and those for which it doesn't. I suspect it comes down to what language and what tooling you use. People doing web-related or python work seem to be doing much better than people doing embedded C or C++. Similarly doing C++ in a popular framework like QT also yields better results. When the system design is not pre-defined or rigid like in QT, then you get completely unmaintainable code as a result.

If you are writing code that is/can be "heavily borrowed" - things that have complete examples on Github, then an LLM is perfect.

▲

hn_throwaway_99 4 days ago | parent | next [-]

While I agree that AI assisted coding probably works much better for languages and use cases that have a lot more relevant training data, when I read comments from people who like LLM assisted coding vs. those that don't, I strongly get the impression that the difference has a lot more to do with the programmers than their programming language.

The primary difference I see in people who get the most value from AI tools is that they expect it to make mistakes: they always carefully review the code and are fine with acting, in some cases, more like an editor than an author. They also seem to have a good sense of where AI can add a lot of value (implementing well-defined functions, writing tests, etc.) vs. where it tends to fall over (e.g. tasks where large scale context is required). Those who can't seem to get value from AI tools seem (at least to me) less tolerant of AI mistakes, and less willing to iterate with AI agents, and they seem more willing to "throw the baby out with the bathwater", i.e. fixate on some of the failure cases but then not willing to just limit usage to cases where AI does a better job.

To be clear, I'm not saying one is necessarily "better" than the other, just that the reason for the dichotomy has a lot more to do with the programmers than the domain. For me personally, while I get a lot of value in AI coding, I also find that I don't enjoy the "editing" aspect as much as the "authoring" aspect.

▲

paufernandez 4 days ago | parent | next [-]

Yes, and each person has a different perception of what is "good enough". Perfectionists don't like AI code.

▲

skydhash 4 days ago | parent [-]

My main reason is: Why should I try twice or more, when I can do it once and expand my knowledge? It's not like I have to produce something now.

▲

sgc 4 days ago | parent [-]

If it takes 10x the time to do something, did you learn 10x as much? I don't mind repetition, I learned that way for many years and it still works for me. I recently made a short program using ai assist in a domain I was unfamiliar with. I iterated probably 4x. Iterations were based on learning about the domain both from the ai results that worked and researching the parts that either seemed extraneous or wrong. It was fast, and I learned a lot. I would have learned maybe 2x more doing it all from scratch, but I would have taken at least 10x the time and effort to reach the result, because there was no good place to immerse myself. To me, that is still useful learning and I can do it 5x before I have spent the same amount of time.

It comes back to other people's comments about acceptance of the tooling. I don't mind the somewhat messy learning methodology - I can still wind up at a good results quickly, and learn. I don't mind that I have to sort of beat the AI into submission. It reminds me a bit of part lecture, part lab work. I enjoy working out where it failed and why.

	▲	skydhash 3 days ago \| parent [-]
		The fact is that most people skip learning about what works (learning is not cheap mentally). I’ve seen teammates just trying stuff (for days) until something kinda works instead of spending 30 mns doing research. The fact is that LLMs are good for producing something that looks correct, and waste the reviewer time. It’s harder to review something than writing it from scratch. Learning is also exponential, the more you do it, the faster it is, because you may already have the foundations for that particular bit.

▲

robenkleene 3 days ago | parent | prev | next [-]

> I strongly get the impression that the difference has a lot more to do with the programmers than their programming language.

The problem with this perspective is that anyone who works on more niche programming areas knows the vast majority of programming discussion online aren't relevant to them. E.g., I've done macOS/iOS programming most of my career, and I now do work that's an order of magnitude more niche than that, and I commonly see programmers saying thing like "you shouldn't use a debugger", which is a statement that I can't imagine a macOS or iOS programmer saying (don't get me wrong they're probably out there, I've just never met or encountered one). So you just become use to most programming conversations being irrelevant to your work.

So of course the majority of AI conversations aren't relevant to your work either, because that's the expectation.

I think a lot of these conversations are two people with wildly different contexts trying to communicate, which is just pointless. Really we just shouldn't be trying to participate in these conversations (the more niche programmers that is), because there's just not enough shared context to make communication effective.

We just all happen to fall under this same umbrella of "programming", which gives the illusion of a shared context. It's true there's some things that are relevant across the field (it's all just variables, loops, and conditionals), but many of the other details aren't universal, so it's silly to talk about them without first understanding the full context around the other persons work.

▲

hn_throwaway_99 3 days ago | parent [-]

> and I commonly see programmers saying thing like "you shouldn't use a debugger"

Sorry, but who TF says that? This is actually not something I hear commonly, and if it were, I would just discount this person's opinion outright unless there were some other special context here. I do a lot of web programming (Node, Java, Python primarily) and if someone told me "you shouldn't use a debugger" in those domains I would question their competence.

	▲	robenkleene 3 days ago \| parent [-]
		E.g., https://news.ycombinator.com/item?id=39652860 (no specific comment, just the variety of opinions) Here's a good specific example https://news.ycombinator.com/item?id=26928696

▲

felipeerias 4 days ago | parent | prev | next [-]

It might boil down to individual thinking styles, which would explain why people tend to talk past each other in these discussions.

▲

jappgar 4 days ago | parent | prev [-]

No one likes to hear it, but it comes down to prompting skill. People who are terrible at communicating and delegating complex tasks will be terrible at prompting.

It's no secret that a lot of engineers are bad at this part of the job. They prefer to work alone (i.e. without AI) because they lack the ability to clearly and concisely describe problems and solutions.

	▲	JackFr 4 days ago \| parent [-]
		This. I work with juniors who have no idea what a spec is, and the idea of designing precisely what a component should do, especially in error cases, is foreign to them. One key to good prompting is clear thinking.

▲

motorest 4 days ago | parent | prev | next [-]

> If you are writing code that is/can be "heavily borrowed" - things that have complete examples on Github, then an LLM is perfect.

I agree with the general premise. There is however more to it than "heavily borrowed". The degree to which a code base is organized and structured and curated plays as big of a role as what framework you use.

If your project is a huge pile of unmaintainable and buggy spaghetti code then don't expect a LLM to do well. If your codebase is well structured, clear, and follows patterns systematically the of course a glorified pattern matching service will do far better in outputting acceptable results.

There is a reason why one of the most basic vibecoding guidelines is to include a prompt cycle to clean up and refactor code between introducing new features. LLMs fare much better when the project in their context is in line with their training. If you refactor your project to align it with what a LLM is trained to handle, it will do much better when prompted to fill in the gaps. This goes way beyond being "heavily borrowed".

I don't expect your average developer struggling with LLMs to acknowledge this fact, because then they would need to explain why their work is unintelligible to a system trained on vast volumes of code. Garbage in, garbage out. But who exactly created all the garbage going in?

▲

pydry 4 days ago | parent | prev | next [-]

I suspect it comes down to how novel the code you are writing is and how tolerant of bugs you are.

People who use it to create a proof of concept of something that is in the LLM training set will have a wildly different experience to somebody writing novel production code.

Even there the people who rave the most rave about how well it does boilerplate.

▲

jstummbillig 4 days ago | parent | prev | next [-]

> When the system design is not pre-defined or rigid like

Why would a LLM be any worse building from language fundamentals (which it knows, in ~every language)? Given how new this paradigm is the far more obvious and likely explanation seems to be: LLM powered coding requires somewhat different skills and strategies. The success of each user heavily depends on their learning rate.

▲

PUSH_AX 4 days ago | parent | prev [-]

I think there are still lots of code “artisans” who are completely dogmatic about what code should look like, once the tunnel vision goes and you realise the code just enables the business it all of a sudden becomes a velocity God send.

▲

gtsop 4 days ago | parent | next [-]

Two years in and we are waiting to see all you people (who are free of our tunnel vision) fly high with your velocity. I don't see anyone, am I doing something wrong?

Your words predict an explosion of unimaginary magnitude for new code and for new buisnesses. Where is it? Nowhere.

Edit: And dont start about how you vibed a SaaS service, show income numbers from paying customers (not buyouts)

▲

hn_throwaway_99 4 days ago | parent | next [-]

There was this recent post about a Cloudflare OAuth client where the author checked in all the AI prompts, https://news.ycombinator.com/item?id=44159166.

The author of the library (kentonv) comments in the HN thread that he said it took him a few days to write the library with AI help, while he thinks it would have taken weeks or months to write manually.

Also, while it may be technically true we're "two years in", I don't think this is a fair assessment. I've been trying AI tools for a while, and the first time I felt "OK, now this is really starting to enhance my velocity" was with the release of Claude 4 in May of this year.

	▲	ath92 4 days ago \| parent [-]
		But that example is of writing a green field library that deals with an extremely well documented spec. While impressive, this isn’t what 99% of software engineering is. I’m generally a believer/user but this is a poor example to point at and say “look, gains”.

▲

PUSH_AX 4 days ago | parent | prev [-]

Do you have some magical insight into every codebase in existence? No? Ok then…

▲

gtsop 3 days ago | parent | next [-]

No i don't but by your post it seems like you do. Show us, that is all i request.

▲

PUSH_AX 3 days ago | parent [-]

I have insight into enough code bases to know its a non zero number. Your logic is bizarre, if you’ve never seen a kangaroo would you just believe they don’t exist?

▲

gtsop 3 days ago | parent [-]

Show us the numbers, stop wasting our time. NUMBERS.

Also, why would I ever believe kangaroos exist if I haven't seen any evidence of them? this is a fallacy. You are portraying the healthy skepticism as stupid because you already know kangaroos exist.

	▲	PUSH_AX 3 days ago \| parent [-]
		What numbers? It doesn’t matter if it’s one or a million, it’s had a positive impact on the velocity of a non zero number of projects. You wrote: > Two years in and we are waiting to see all you people (who are free of our tunnel vision) fly high with your velocity. I don't see anyone, am I doing something wrong? Yes is the answer. I could probably put it in front of your face and you’d reject it. You do you. All the best.

▲

ceejayoz 4 days ago | parent | prev [-]

That’s hardly necessary.

Have we seen a noticeably increased amount of newly launched useful apps?

▲

PUSH_AX 4 days ago | parent [-]

Why is useful a metric? This is about software delivery, what one person deems useful is subjective

▲

nobleach 3 days ago | parent | next [-]

Perhaps I'm misreading the person to whom you're replying, but usefullness, while subjective, isn't typically based on one person's opinion. If enough people agree on the usefullness of something, we as a collective call it "useful".

Perhaps we take the example of a blender. There's enough need to blend/puree/chop food-like-items, that a large group of people agree on the usefullness of a blender. A salad-shooter, while a novel idea, might not be seen as "useful".

Creating software that most folks wouldn't find useful still might be considered "neat" or "cool". But it may not be adding anything to the industry. The fact that someone shipped something quickly doesn't make it any better.

	▲	PUSH_AX 3 days ago \| parent [-]
		Ultimately, or at least in this discussion, we should decouple the software’s end use from the question of whether it satisfies the creator’s requirements and vision in a safe and robust way. How you get there and what happens after are two different problems.

▲

darkwater 4 days ago | parent | prev [-]

> Why is useful a metric?

"and you realise the code just enables the business it all of a sudden becomes a velocity God send."

If a business is not useful, well, it will fail. So, so much autogenerated code for nothing.

	▲	PUSH_AX 4 days ago \| parent \| next [-]
		I see, I guess every business I haven’t used personally, because it wasn’t useful to me, has failed… Usefulness isn’t a good metric for this.
	▲	imiric 4 days ago \| parent \| prev [-]
		It's not for nothing. When a profitable product can be created in a fraction of the time and effort previously required, the tool to create it will attract scammers and grifters like bees to honey. It doesn't matter if the "business" around it fails, if a new one can be created quickly and cheaply. This is the same idea behind brands with random letters selling garbage physical products, only applied to software.

▲

imiric 4 days ago | parent | prev | next [-]

The issue is not with how code looks. It's with what it does, and how it does it. You don't have to be an "artisan" to notice the issues moi2388 mentioned.

The actual difference is between people who care about the quality of the end result, and the experience of users of the software, and those who care about "shipping quickly" no matter the state of what they're producing.

This difference has always existed, but ML tools empower the latter group much more than the former. The inevitable outcome of this will be a stark decline of average software quality, and broad user dissatisfaction. While also making scammers and grifters much more productive, and their scams more lucrative.

	▲	Buttons840 4 days ago \| parent \| next [-]
		Certainly billions of people's personal data will be leaked, and nobody will be held responsible.
	▲	airtonix 4 days ago \| parent \| prev [-]
		[dead]

▲

Buttons840 4 days ago | parent | prev | next [-]

I'm not a code "artisan", but I do believe companies should be financially responsible when they have security breaches.

▲

cowl 4 days ago | parent | prev [-]

There are very good reason that code should look a certain way and it comes from years of experience and the fact that code is written once but read and modified much more.

When the first bugs come up you see that the velocity was not god sent and you end up hiring one of the many "LLM code fixer" companies that are poping up like mushrooms.

▲

PUSH_AX 4 days ago | parent [-]

You’re confusing yoloing code into prod and using ai to increase velocity while ensuring it functions and is safe.

▲

habinero 3 days ago | parent [-]

No, they're not. It's critically important if you're part of an engineering team.

If everyone does their own thing, the codebase rapidly turns to mush and is unreadable.

And you need humans to be able to read it the moment the code actually matters and needs to stand up to adversaries. If you work with money or personal information, someone will want to steal that. Or you may have legal requirements you have to meet.

It matters.

	▲	PUSH_AX 3 days ago \| parent [-]
		You’ve made a sweeping statement there, there are swathes of teams working in startups still trying to find product market fit. Focusing on quality in these situations is folly, but that’s not even the point. My point is you can ship quality to any standard using an llm, even your standards. If you can’t that’s a skill issue on your part.

▲ codingdave 4 days ago | parent | prev | next [-]

And also ask: "How much money do you spend on LLMs?"

In the long run, that is going to be what drives their quality. At some point the conversation is going to evolve from whether or not AI-assisted coding works to what the price point is to get the quality you need, and whether or not that price matches its value.

▲ lcnPylGDnU4H9OF 4 days ago | parent | prev | next [-]

> It's like we live in different worlds.

There is the huge variance in prompt specificity as well as the subtle differences inherent to the models. People often don't give examples when they talk about their experiences with AI so it's hard to get a read on what a good prompt looks like for a given model or even what a good workflow is for getting useful code out of it.

▲

ruszki 4 days ago | parent | next [-]

Some gave. Some even recorded it, and showed it, because they thought that they are good with it. But they weren’t good at all.

They were slower than coding by hand, if you wanted to keep quality. Some were almost as quick as copy-pasting from the code just above the generated one, but their quality was worse. They even kept some bugs in the code during their reviews.

So the different world is probably what the acceptable level of quality means. I know a lot of coders who don’t give a shit whether it makes sense what they’re doing. What their bad solution will cause in the long run. They ignore everything else, just the “done” state next to their tasks in Jira. They will never solve complex bugs, they simply don’t care enough. At a lot of places, they are the majority. For them, LLM can be an improvement.

Claude Code the other day made a test for me, which mocked everything out from the live code. Everything was green, everything was good. On paper. A lot of people simply wouldn’t care to even review properly. That thing can generate a few thousands of lines of semi usable code per hour. It’s not built to review it properly. Serena MCP for example specifically built to not review what it does. It’s stated by their creators.

▲

typpilol 4 days ago | parent | next [-]

Honestly I think LLMs really shine best when your first getting into a language.

I just recently got into JavaScript and typescript and being able to ask the llm how to do something and get some sources and link examples is really nice.

However using it in a language I'm much more familiar with really decreases the usefulness. Even more so when your code base is mid to large sized

▲

myaccountonhn 4 days ago | parent | next [-]

I have scaffolded projects using LLMs in languages I don't know and I agree that it can be a great way to learn as it gives you something to iterate on. But that is only if you review/rewrite the code and read documentation alongside it. Many times LLMs will generate code that is just plain bad and confusing even if it works.

I find that LLM coding requires more in-depth understanding, because rather than just coming up with a solution you need to understand the LLMs solution and answer if the complexity is necessary, because it will add structures, defensive code and more that you wouldn't add if you coded it yourself. It's way harder to answer if some code is necessary or the correct way to do something.

▲

dns_snek 4 days ago | parent | prev | next [-]

This is the one place where I find real value in LLMs. I still wouldn't trust them as teachers because many details are bound to be wrong and potentially dangerous, but they're great initial points of contact for self-directed learning in all kinds of fields.

▲

platevoltage 4 days ago | parent | prev | next [-]

Yeah this is where I find a lot of value. Typescript is my main language, but I often use C++ and Python where my knowledge is very surface level. Being able to ask it "how do I do ____ in ____" and getting a half decent explanation is awesome.

▲

ponector 4 days ago | parent | prev [-]

The best usage is to ask LLM to explain existing code, to search in the legacy codebase.

	▲	typpilol 4 days ago \| parent [-]
		I've found this to be not very useful in large projects or projects that are very modularized or fragment across many files. Because sometimes it can't trace down all the data paths and by the time it does it's context window is running out. That seems to be the biggest issue I see for my daily use anyways

▲

dns_snek 4 days ago | parent | prev [-]

> Some gave. Some even recorded it, and showed it, because they thought that they are good with it. But they weren’t good at all.

Do you have any links saved by any chance?

▲

giantg2 4 days ago | parent | prev [-]

I'm convinced that for coding we will have to use some sort of TDD or enhanced requirement framework to get the best code. Even on human made systems the quality is highly dependent on the specificity of the requirements and the engineer's ability to probe the edgecases. Something like writing all the tests first (even in something like cucumber) and having the LLM write code to get them to pass would likely produce better code evene though most devs hate the test-first paradigm.

▲ tetha 4 days ago | parent | prev | next [-]

I deal with a few code bases at work and the quality differs a lot between projects and frameworks.

We have 1-2 small python services based on Flask and Pydantic, very structured and a well-written development and extension guide. The newer Copilot models perform very well with this, and improving the dev guidelines keep making it better. Very nice.

We also have a central configuration of applications in the infrastructure and what systems they need. A lot of similarly shaped JSON files, now with a well-documented JSON schema (which is nice to have anyway). Again, very high quality. Someone recently joked we should throw these service requests at a model and let it create PRs to review.

But currently I'm working in Vector and it's Vector remap language... it's enough of a mess that I'm faster working without any copilot "assistance". I think the main issue is that there is very little VRL code out in the open, and the remaps depend on a lot of unseen context, which one would have to work on giving to the LLM. Had similar experiences with OPA and a few more of these DSLs.

▲ skydhash 4 days ago | parent | prev | next [-]

> Personally, I wrote 200K lines of my B2B SaaS

That would probably be 1000 line of Common Lisp.

▲

throwmeaway222 4 days ago | parent [-]

[flagged]

▲

skydhash 4 days ago | parent [-]

I think that is the 200 lines of the perl version.

	▲	abeyer 4 days ago \| parent [-]
		you put linefeeds in your perl?

▲ albrewer 4 days ago | parent | prev | next [-]

My AI experience has varied wildly depending on the problem I'm working on. For web apps in Python, they're fantastic. For hacking on old engineering calculation code written in C/C++, it's an unmitigated disaster and an active hindrance.

▲

f1shy 4 days ago | parent [-]

Just last week I asked copilot to make a FastCGI client in C. It gave me 5 times code that did not compile. Afer some massaging I got it to compile, didn’t work. After some changes, works. No I say “i do not want to use libfcgi, just want a simple implementation”. After already one hour wrestling, I realize the whole thing blocks, I want no blocking calls… still half an hour later fighting, I’m slowly getting there. I see the code: a total mess.

I deleted all, wrote from scratch a 350 lines file which wotks.

▲

paool 4 days ago | parent [-]

Context engineering > vibe coding.

Front load with instructions, examples, and be specific. How well you write the prompt greatly determines the output.

Also, use Claude code not copilot.

▲

kentm 4 days ago | parent | next [-]

At some point it becomes easier to just write the code. If the solution was 350 lines, then I'm guessing it was far easier for them to just write that rather then tweak instructions, find examples, etc to cajole the AI to writing workable code (that would then need to be reviewed and tweaked if doing it properly).

	▲	f1shy 3 days ago \| parent [-]
		Exactly, if I have to write a 340 lines prompt, I could very well start just writing code.

▲

kortilla 4 days ago | parent | prev [-]

“Just tell it how to write the code and then it will write the code.”

No wonder the vast majority of AI adoption is failing to produce results.

▲ haburka 4 days ago | parent | prev | next [-]

It’s not just you, I think some engineers benefit a lot from AI and some don’t. It’s probably a combination of factors including: AI skepticism, mental rigidity, how popular the tech stack is, and type of engineering. Some problems are going to be very straightforward.

I also think it’s that people don’t know how to use the tool very well. In my experience I don’t guide it to do any kind of software pattern or ideology. I think that just confuses the tool. I give it very little detail and have it do tasks that are evident from the code base.

Sometimes I ask it to do rather large tasks and occasionally the output is like 80% of the way there and I can fix it up until it’s useful.

▲

mlyle 4 days ago | parent | next [-]

Yah. Latest thing I wrote was

* Code using sympy to generate math problems testing different skills for students, with difficulty values affecting what kinds of things are selected, and various transforms to problems possible (e.g. having to solve for z+4 of 4a+b instead of x) to test different subskills

(On this part, the LLM did pretty well. The code was correct after a couple of quick iterations, and the base classes and end-use interfaces are correct. There's a few things in the middle that are unnecessarily "superstitious" and check for conditions that can't happen, and so I need to work with the LLM to clean it up.

* Code to use IRT to estimate the probability that students have each skill and to request problems with appropriate combinations of skills and difficulties for each student.

(This was somewhat garbage. Good database & backend, but the interface to use it was not nice and it kind of contaminated things).

* Code to recognize QR codes in the corners of worksheet, find answer boxes, and feed the image to ChatGPT to determine whether the scribble in the box is the answer in the correct form.

(This was 100%, first time. I adjusted the prompt it chose to better clarify my intent in borderline cases).

The output was, overall, pretty similar to what I'd get from a junior engineer under my supervision-- a bit wacky in places that aren't quite worth fixing, a little bit of technical debt, a couple of things more clever that I didn't expect myself, etc. But I did all of this in three hours and $12 expended.

The total time supervising it was probably similar to the amount of time spent supervising the junior engineer... but the LLM turns things around quick enough that I don't need to context switch.

	▲	novembermike 4 days ago \| parent [-]
		I think it's fair to call code LLM's similar to fairly bad but very fast juniors that don't get bored. That's a serious drawback but it does give you something to work with. What scares me is non-technical people just vibe coding because it's like a PM driving the same juniors with no one to give sanity checks.

▲

AlexCoventry 4 days ago | parent | prev | next [-]

> I also think it’s that people don’t know how to use the tool very well.

I think this is very important. You have to look at what it suggests critically, and take what makes sense. The original comment was absolutely correct that AI-generated code is way too verbose and disconnected from the realities of the application and large-scale software design, but there can be kernels of good ideas in its output.

▲

oooyay 4 days ago | parent | prev | next [-]

I think a lot of it is tool familiarity. I can do a lot with Cursor but frankly I find out about "big" new stuff every day like agents.md. If I wasn't paying attention or also able to use Cursor at home then I'd probably learn more inefficiently. Learning how to use rule globs versus project instructions was a big learning moment. As I did more LLM work on our internal tools that was also a big lesson in prompting and compaction.

Certain parts of HN and Reddit I think are very invested in nay-saying because it threatens their livelihoods or sense of self. A lot of these folks have identities that are very tied up in being craftful coders rather than business problem solvers.

▲

abullinan 4 days ago | parent | prev [-]

Junior engineers see better results than senior engineers for obvious reasons.

	▲	davidcbc 3 days ago \| parent [-]
		Junior engineers think they see better results than senior engineers for obvious reasons

▲ aDyslecticCrow 4 days ago | parent | prev | next [-]

I think its down to language and domain more than tools.

No model ive tried can write, usefully debug or even explain cmake. (It invents new syntax if it gets stuck, i often have to prompt multiple AI to know if even the first response in the context was made-up)

My luck with embedded c has been atrocious for existing codebase (burning millions of tolkens), but passable for small scripts. (Arduino projects)

My experience with python is much better. Suggesting relevant libraries and functions, debugging odd errors, or even making small script on its own. Even the original github copilot which i got access to early was excellent on python.

Alot of people that seem to have fully embraced agentic vibe-coding seem to be in the web or node.js domain. Which I've not done myself since pre-AI.

I've tried most (free or trial) major models or schemes in hope that i find any of them useful, but not found much use yet.

▲ johnnyanmac 4 days ago | parent | prev | next [-]

> It's like we live in different worlds.

We probably do, yes. the Web domain compared to a cybersecurity firm compared to embedded will have very different experiences. Because clearly there's a lot more code to train on for one domain than the other (for obvious reasons). You can have colleagues at the same company or even same team have drastically different experiences because they might be in the weeds on a different part of tech.

> I then carefully review and test.

If most people did this, I would have 90% less issues with AI. But as we expect, people see shortcuts and use them to cut corners, not give more times to polish the edges.

▲ bubblyworld 4 days ago | parent | prev | next [-]

I think people react to AI with strong emotions, which can come from many places, anxiety/uncertainty about the future being a common one, strong dislike of change being another (especially amongst autists, whom I would guess based on me and my friend circle are quite common around here). Maybe it explains a lot of the spicy hot-takes you see here and on lobsters? People are unwilling to think clearly or argue in good faith when they are emotionally charged (see any political discussion). You basically need to ignore any extremist takes entirely, both positive and negative, to get a pulse on what's going on.

If you look, there are people out there approaching this stuff with more objectivity than most (mitsuhiko and simonw come to mind, have a look through their blogs, it's a goldmine of information about LLM-based systems).

▲ oblio 4 days ago | parent | prev | next [-]

What tech stack do you use?

Betting in advance that it's JavaScript or Python, probably with very mainstream libraries or frameworks.

▲

dpc_01234 4 days ago | parent | next [-]

FWIW. Claude Code does great job for me on complex domain Rust projects, but I just use it one relatively small feature/code chunk at the time, where oftentimes it can pick up existing patterns etc. (I try to point it at similar existing code/feature if I have it). I do not let it write anything creative where it has to come up with own design (either high-level architectural, or low level facilities). Basically I draw the lines manually, and let it color the space between, using existing reference pictures. Works very, very well for me.

▲

va1a 4 days ago | parent | prev | next [-]

Is this meant to detract from their situation? These tech stacks are mainstream because so many use them... it's only natural that AI would be the best at writing code in contexts where it has the most available training data.

	▲	feoren 4 days ago \| parent \| next [-]
		> These tech stacks are mainstream because so many use them That's a tautology. No, those tech stacks are mainstream because it is easy to get something that looks OK up and running quickly. That's it. That's what makes a framework go mainstream: can you download it and get something pretty on the screen quickly? Long-term maintenance and clarity is absolutely not a strong selection force for what goes mainstream, and in fact can be an opposing force, since achieving long-term clarity comes with tradeoffs that hinder the feeling of "going fast and breaking things" within the first hour of hearing about the framework. A framework being popular means it has optimized for inexperienced developers feeling fast early, which is literally a slightly negative signal for its quality.
	▲	aDyslecticCrow 4 days ago \| parent \| prev [-]
		No, it's a clarification. There is massive difference between domains, and the parent post did not specify. If the AI can only decently do JS and Python, then it can fully explain the observed disparity in opinion of its usefulness.

▲

JustExAWS 4 days ago | parent | prev [-]

You are exactly right in my case - JavaScript and Python dealing with the AWS CDK and SDK. Where there is plenty of documentation and code samples.

Even when it occasionally gets it wrong, it’s just a matter of telling ChatGPT - “verify your code using the official documentation”.

But honestly, even before LLMs when deciding on which technology, service, or frameworks to use I would always go with the most popular ones because they are the easiest to hire for, easiest to find documentation and answers for and when I myself was looking for a job, easiest to be the perfect match for the most jobs.

▲

oblio 4 days ago | parent [-]

Yeah, but most devs are working on brownfield projects where they did not choose any part of the tech stack.

▲

JustExAWS 4 days ago | parent [-]

They can choose jobs. Starting with my 3rd job in 2008, I always chose my employer based on how it would help me get my n+1 job and that was based on tech stack I would be using.

Once a saw a misalignment between market demands and current tech stack my employer was using, I changed jobs. I’m on job #10 now.

▲

yreg 4 days ago | parent | next [-]

If one wants to optimise career, isn't it better to become an expert in the _less_ mainstream technologies that not-everyone can use?

	▲	JustExAWS 4 days ago \| parent [-]
		Honestly, now that I think about it, I am using a pre-2020 playbook. I don’t know what the hell I would do these days if I were still a pure developer without the industry connections and having AWS ProServe experience on my resume. While it is true that I got a job quickly in 2023 and last year when I was looking, while I was interviewing for those two, as a Plan B, I was randomly submitting my resume (which I think is quite good) to literally hundreds of jobs through Indeed and LinkedIn Easy Apply and I heard crickets - regular old enterprise dev jobs that wanted C#, Node or Python experience on top of AWS. I don’t really have any generic strategy for people these days aside from whatever job you are at, don’t be a ticket taker and be over larger initiatives.

▲

oblio 4 days ago | parent | prev [-]

When did you get your last 3 jobs?

	▲	JustExAWS 4 days ago \| parent [-]
		Mid 2020 - at AWS ProServe the internal consulting arm of AWS - full time job Late 2023 - full time at a third party AWS consulting company. It took around two weeks after I started looking to get an offer Late 2024 - “Staff consultant” third party consulting company. An internal recruiter reached out to me. Before 2020 I was just a run of the mill C#/JS enterprise developer. I didn’t open the AWS console for the first time until mid 2018.

▲ rozgo 4 days ago | parent | prev | next [-]

It could be the language. Almost 100% of my code is written by AI, I do supervise as it creates and steer in the right direction. I configure the code agents with examples of all frameworks Im using. My choice of Rust might be disproportionately providing better results, because cargo, the expected code structure, examples, docs, and error messages, are so well thought out in Rust, that the coding agents can really get very far. I work on 2-3 projects at once, cycling through them supervising their work. Most of my work is simulation, physics and complex robotics frameworks. It works for me.

▲ LauraMedia 4 days ago | parent | prev | next [-]

As a practical example, I've recently tried out v0's new updated systems to scaffold a very simple UI where I can upload screenshots from videogames I took and tag them.

The resulting code included an API call to run arbitrary SQL queries against the DB. Even after pointing this out, this API call was not removed or at least secured with authentication rules but instead /just/hidden/through/obscur/paths...

▲ mirkodrummer 4 days ago | parent | prev | next [-]

B2B SaaS in most cases are sophisticated masks over some structured data, perhaps with great ux, automation and convenience, so I can see LLMs be more successful there, even so because there is more training data and many processes are streamlined. Not all domains are equal, go try develop a serious game, not the yet another simple and broken arcade, with llms and you'll have a different take

▲ physicsguy 4 days ago | parent | prev | next [-]

Do you not think part of it is just whether employers permit it or not? My conglomerate employer took a long time to get started and has only just rolled out agent mode in GH Copilot, but even that is in some reduced/restricted mode vs the public one. At the same time we have access to lots of models via an internal portal.

	▲	randomNumber7 4 days ago \| parent [-]
		Companies that don't allow their devs to use LLMs will go bankrupt and in the meantime their employees will try to use their private LLM accounts.

▲ Fergusonb 3 days ago | parent | prev | next [-]

I agree, it's like they looked at GPT 3.5 one time and said "this isn't for me"

The big 3 - Opus 4.1 GPT5 High, Gemini 2.5 Pro

Are astonishing in their capabilities, it's just a matter of providing the right context and instructions.

Basically, "you're holding it wrong"

▲ abm53 3 days ago | parent | prev | next [-]

I am also constantly astonished.

That said, observing attempts by skeptics to “unsuccessfully” prompt an LLM have been illuminating.

My reaction is usually either:

- I would never have asked that kind of question in the first place.

- The output you claim is useless looks very useful to me.

▲ deterministic 3 days ago | parent | prev | next [-]

Lines of code is not a useful metric for anything. Especially not productivity.

The less code I write to solve a problem the happier I am.

▲ moi2388 4 days ago | parent | prev | next [-]

GitHub copilot, Microsoft copilot, Gemini, loveable, gpt, cursor with Claude models, you name it.

▲ cobbzilla 4 days ago | parent | prev | next [-]

It really depends, and can be variable, and this can be frustrating.

Yes, I’ve produced thousands of lines of good code with an LLM.

And also yes, yesterday I wasted over an hour trying to define a single docker service block for my docker-compose setup. Constant hallucination, eventually had to cross check everything and discover it had no idea what it was doing.

I’ve been doing this long enough to be a decent prompt engineer. Continuous vigilance is required, which can sometimes be tiring.

▲ kortilla 4 days ago | parent | prev | next [-]

It could be because your job is boilerplate derivatives of well solved problems. Enjoy the next 1 to 2 years because yours is the job Claude is coming to replace.

Stuff Wordpress templates should have solved 5 years ago.

▲ typpilol 4 days ago | parent | prev | next [-]

Honestly the best way to get good code at least with typescript and JavaScript is to have like 50 eslint plugins

That way it constantly yells at sonnet 4 to get the code at least in a better state.

If anyone is curious I have a massive eslint config for typescript that really gets good code out of sonnet.

But before I started doing this the code it wrote was so buggy and it was constantly trying to duplicate functions into separate files etc

▲ feoren 4 days ago | parent | prev | next [-]

[flagged]

▲

dang 4 days ago | parent | next [-]

It is quite a putdown to tell someone else that if you wrote their program it would be 10 times shorter.

That's not in keeping with either the spirit of this site or its rules: https://news.ycombinator.com/newsguidelines.html.

▲

feoren 4 days ago | parent [-]

Fair: it was rude. Moderation is hard and I respect what you do. But it's also a sentiment several other comments expressed. It's the conversation we're having. Can we have any discussions of code quality without making assumptions about each others' code quality? I mean, yeah, I could probably have done better.

> "That would probably be 1000 line of Common Lisp." https://news.ycombinator.com/item?id=44974495

> "Perhaps the issue is you were used to writing 200k lines of code. Most engineers would be agast at that." https://news.ycombinator.com/item?id=44976074

> "200k lines of code is a failure state ... I'd not normally huff my own farts in public this obnoxiously, but I honestly feel it is useful for the "AI hater vs AI sucker" discussion to be honest about this type of emotion." https://news.ycombinator.com/item?id=44976328

	▲	dang 4 days ago \| parent [-]
		Oh for sure you can talk about this, it's just a question of how you do it. I'd say the key thing is to actively guard against coming across as personal. To do that is not so easy, because most of us underestimate the provocation in our own comments and overestimate the provocation in others (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...). This bias is like carbon monoxide - you can't really tell it's affecting you (I don't mean you personally, of course—I mean all of us), so it needs to be consciously compensated for. As for those other comments - I take your point! I by no means meant to pick on you specifically; I just didn't see those. It's pretty random what we do and don't see.

▲

brushfoot 4 days ago | parent | prev | next [-]

[flagged]

	▲	dang 4 days ago \| parent \| next [-]
		I understand the provocation, but please don't respond to a bad comment by breaking the site guidelines yourself. That only makes things worse. Your GP comment was great, and probably the thing to do with a supercilious reply is just not bother responding (easier said than done of course). You can usually trust other users to assess the thread fairly (e.g.https://news.ycombinator.com/item?id=44975623). https://news.ycombinator.com/newsguidelines.html
	▲	feoren 4 days ago \| parent \| prev \| next [-]
		> What makes you think I'm not "a developer who strongly values brevity and clarity" Some pieces of evidence that make me think that: 1. The base rate of developers who write massively overly verbose code is about 99%, and there's not a ton of signal to deviate from that base rate other than the fact that you post on HN (probably a mild positive signal). 2. An LLM writes 80% of your code now, and my prior on LLM code output is that it's on par with a forgetful junior dev who writes very verbose code. 3. 200K lines of code is a lot. It just is. Again, without more signal, it's hard to deviate from the base rate of what 200K-line codebases look like in the wild. 99.5% of them are spaghettified messes with tons of copy-pasting and redundancy and code-by-numbers scaffolded code (and now, LLM output). This is the state of software today. Keep in mind the bad programmers who make verbose spaghettified messes are completely convinced they're code-ninja geniuses; perhaps even more so than those who write clean and elegant code. You're allowed to write me off as an internet rando who doesn't know you, of course. To me, you're not you, you're every programmer who writes a 200k LOC B2B SaaS application and uses an LLM for 80% of their code, and the vast, vast majority of those people are -- well, not people who share my values. Not people who can code cleanly, concisely, and elegantly. You're a unicorn; cool beans. Before you used LLMs, how often were you copy/pasting blocks of code (more than 1 line)? How often were you using "scaffolds" to create baseline codefiles that you then modified? How often were you copy/pasting code from Stack Overflow and other sources?
	▲	3form 4 days ago \| parent \| prev [-]
		At least to me what you said sounded like 200k is just with LLMs but before agents. But it's a very reasonable amount of code for 9 years of work.

▲

leetharris 4 days ago | parent | prev | next [-]

This is such a bizarre comment. You have no idea what code base they are talking about, their skill level, or anything.

▲

sunrunner 4 days ago | parent | prev | next [-]

> I'm struggling to even describe... 200,000 lines of code is so much.

The point about increasing levels of abstractions is a really good one, and it's worth considering whether any new code that's added is entirely new functionality, some kind of abstraction over some existing functionality (that might then reduce the need for as new code), or (for good or bad reason) some kind of copy of some of the existing behaviour but re-purposed for a different use case.

▲

eichin 4 days ago | parent | prev [-]

200kloc is what, 4 reams of paper, double sided? So, 10% of that famous Margaret Hamilton picture (which is roughly "two spaceships worth of flight code".) I'm not sure the intuition that gives you is good but at least it slots the raw amount in as "big but not crazy big" (the "9 years work" rather than "weekend project" measurement elsethread also helps with that.)

▲ mxhwll 4 days ago | parent | prev [-]

[flagged]

▲ eloisius 4 days ago | parent | prev | next [-]

I agree. AI is a wonderful tool for making fuzzy queries on vast amounts of information. More and more I'm finding that Kagi's Assistant is my first stop before an actual search. It may help inform me about vocabulary I'm lacking which I can then go successfully comb more pages with until I find what I need.

But I have not yet been able to consistently get value out of vibe coding. It's great for one-off tasks. I use it to create matplotlib charts just by telling it what I want and showing it the schema of the data I have. It nails that about 90% of the time. I have it spit out close-ended shell scripts, like recently I had it write me a small CLI tool to organize my Raw photos into a directory structure I want by reading the EXIF data and sorting the images accordingly. It's great for this stuff.

But anything bigger it seems to do useless crap. Creates data models that already exist in the project. Makes unrelated changes. Hallucinates API functions that don't exist. It's just not worth it to me to have to check its work. By the time I've done that, I could have written it myself, and writing the code is usually the most pleasurable part of the job to me.

I think the way I'm finding LLMs to be useful is that they are a brilliant interface to query with, but I have not yet seen any use cases I like where the output is saved, directly incorporated into work, or presented to another human that did not do the prompting.

▲

nwienert 4 days ago | parent [-]

Have you tried Opus? It's what got me past using LLMs only marginally. Standard disclaimers apply in that you need to know what it's good for and guide it well, but there's no doubt at this point it's a huge productivity boost, even if you have high standards - you just have to tell it what those standards are sometimes.

	▲	collingreen 4 days ago \| parent [-]
		Opus was also the threshold for me where I started getting real value out of (correctly applied) LLMs for coding.

▲ JeremyNT 4 days ago | parent | prev | next [-]

What tooling are you using?

I use aider and your description doesn't match my experience, even with a relatively bad-at-coding model (gpt-5). It does actually work and it does generate "good" code - it even matches the style of the existing code.

Prompting is very important, and in an existing code base the success rate is immensely higher if you can hint at a specific implementation - i.e. something a senior who is familiar with the codebase somewhat can do, but a junior may struggle with.

It's important to be clear eyed about where we are here. I think overall I am still faster doing things manually than iterating with aider on an existing code base, but the margin is not very much, and it's only going to get better.

Even though it can do some work a junior could do, it can't ever replace a junior human... because a junior human also goes to meetings, drives discussions, and eventually becomes a senior! But management may not care about that fact.

▲ kstenerud 4 days ago | parent | prev | next [-]

I just had Claude Sonnet 4 build this for me: https://github.com/kstenerud/orb-serde

Using the following prompt:

    Write a rust serde implementation for the ORB binary data format.

    Here is the background information you need:

    * The ORB reference material is here: https://github.com/kstenerud/orb/blob/main/orb.md
    * The formal grammar dscribing ORB is here: https://github.com/kstenerud/orb/blob/main/orb.dogma
    * The formal grammar used to describe ORB is called Dogma.
    * Dogma reference material is here: https://github.com/kstenerud/dogma/blob/master/v1/dogma_v1.0.md
    * The end of the Dogma description document has a section called "Dogma described as Dogma", which contains the formal grammar describing Dogma.

    Other important things to remember:

    * ORB is an extension of BONJSON, so it must also implement all of BONJSON.
    * The BONJSON reference material is here: https://github.com/kstenerud/bonjson/blob/main/bonjson.md
    * The formal grammar desribing BONJSON is here: https://github.com/kstenerud/bonjson/blob/main/bonjson.dogma

Is it perfect? Nope, but it's 90% of the way there. It would have taken me all day to build all of these ceremonious bits, and Claude did it in 10 minutes. Now I can concentrate on the important parts.

▲

WA 4 days ago | parent [-]

First and foremost, it’s 404. Probably a mistake, but I chuckled a bit when someone says "AI build this thing and it’s 90% there" and then posts a dead link.

	▲	kstenerud 4 days ago \| parent [-]
		Weird... For some reason Github decided that this time my repo should default to private.

▲ foxyv 4 days ago | parent | prev | next [-]

The one thing I've found AI is good at is parsing through the hundreds of ad ridden, barely usable websites for answers to my questions. I use the Duck Duck Go AI a lot to answer questions. I trust it about as far as I can throw the datacenter it resides in, but it's useful for quickly verifiable things. Especially stuff like syntax and command line options for various programs.

▲

oblio 4 days ago | parent [-]

> The one thing I've found AI is good at is parsing through the hundreds of ad ridden, barely usable websites for answers to my questions.

One thing I can guarantee you is that this won't last. No sane MBA will ignore that revenue stream.

Image hosting services, all over again.

▲

foxyv 4 days ago | parent | next [-]

You are entirely correct. The enshittification will continue. All we can do is enjoy these things while they are still usable.

▲

benatkin 4 days ago | parent [-]

Nope, this only applies to a small percent of content, where a relatively small number of people needs access to it and the incentive to create derivative work based on it is low, or where there's a huge amount of content that's frequently changing (think airfares). But yes, they will protect it more.

For content that doesn't change frequently and is used by a lot of people it will be hard to control access to it or derivative works based on it.

▲

baggachipz 4 days ago | parent [-]

I don't think you're considering the enshittification route here. I'm sure it will be: Ask ChatGPT a question -> "While I'm thinking, here's something from our sponsor which is tailored to your question" -> lame answer which requires you to ask another question. And on and on. While you're asking these questions, a profile of you is built and sold on the market.

▲

oblio 4 days ago | parent [-]

It's even worse. "Native advertising":

Which car should I buy?

"The Toyota DZx4 is the best EV on the market according to multiple analysts. It had the following benefits: ...

If the DZx4 is out of your budget, the Nissan Avenger is a great budget option: ..."

Each spot the result of an automated live auction.

Now imagine that for everything and also some suggestions along the way ("If you need financing, Cash Direct offers same day loans...").

Advertising with LLMs will be incredibly insidious and lucrative. And most likely, unblockable.

	▲	baggachipz 4 days ago \| parent [-]
		whynotboth.jpg

▲

o11c 4 days ago | parent | prev [-]

The difference, of course, is that most AI companies don't have the malicious motive that Google has by also being an ad company.

	▲	oblio 4 days ago \| parent \| next [-]
		Almost every big tech company is an ad company. Google sells ads, Meta sells ads, Microsoft sells ads, Amazon sells ads, Apple sells ads, only Nvidia doesn't because they sell hardware components. It's practically inevitable for a tech company offering content and everyone who thinks otherwise should set a reminder to 5 years from now.
	▲	ceejayoz 4 days ago \| parent \| prev \| next [-]
		Google wasn’t really an ad company on day one, either. https://en.wikipedia.org/wiki/Google?wprov=sfti1#Early_years > The next year, Google began selling advertisements associated with search keywords against Page and Brin's initial opposition toward an advertising-funded search engine. Ads are coming. https://www.theverge.com/news/759140/openai-chatgpt-ads-nick...
	▲	moi2388 4 days ago \| parent \| prev \| next [-]
		OpenAI is already looking into inserting ads, sorry..
	▲	foxyv 3 days ago \| parent \| prev [-]
		How fast we forget history. "Do no evil" my ass.

▲ lbrito 4 days ago | parent | prev | next [-]

It's one of those you get what you put in kind of deals.

If you spend a lot of time thinking about what you want, describing the inner workings, edge cases, architecture and library choices, and put that into a thoughtful markdown, then maybe after a couple of iterations you will get half decent code. It certainly makes a difference between that and a short "implement X" prompt.

But it makes one think - at that point (writing a good prompt that is basically a spec), you've basically solved the problem already. So LLM in this case is little more than a glorified electric typewriter. It types faster than you, but you did most of the thinking.

▲

jeremyjh 4 days ago | parent [-]

Right, and then after you do all the thinking and the specs, you have to read and understand and own every single line it generated. And speaking for myself, I am no where near as good at thinking through code I am reviewing as thinking through the code I am writing.

Other people will put up PRs full of code they don't understand. I'm not saying everyone who is reporting success with LLMs are doing that, but I hear it a lot. I call those people clowns, and I'd fire anyone who did that.

▲

flatline 4 days ago | parent [-]

If it passes the unit tests I make it write and works for my sample manual cases I absolutely will not spend time reading the implementation details unless and until something comes up. Sometimes garbage makes its way into git but working code is better than no code and the mess can be cleaned up later. If you have correctness at the interface and function level you can get a lot done quickly. Technical debt is going to come out somewhere no matter what you do.

▲

malfist 4 days ago | parent [-]

If AI is writing the code and the unit tests, how do you really know its working? Who watches the watchman

▲

jeremyjh 4 days ago | parent | next [-]

The trick is to not give a fuck. This works great in a lot of apps, which are useless to begin with. It may also be a reasonable strategy in an early-stage startup yet to achieve product-market fit, but your plan has to be to scrap it and rewrite it and we all know how that usually turns out.

▲

lbrito 4 days ago | parent [-]

This is an excellent point. Sure in an ideal world we should care very much about every line of code committed, but in the real world pushing garbage might be a valid compromise given things like crunch, sales pitches due tomorrow etc.

	▲	jeremyjh 4 days ago \| parent [-]
		No, that's a much stronger statement. I'm not talking about ideals. I'm talking about running a business that is mature, growing and going to be around in five years. You could literally kill such a business running it on a pile of AI slop that becomes unmaintainable.

▲

flatline 4 days ago | parent | prev | next [-]

How much of the code do you review in a third party package installed through npm, pip, etc.? How many eyes other than the author’s have ever even looked at that code? I bet the answers have been “none” and “zero” for many HN readers at some point. I’m certainly not saying this is a great practice or the only way to productively use LLMs, just pointing out that we treat many things as a black box that “just works” till it doesn’t, and life somehow continues. LLM output doesn’t need to be an exception.

	▲	lbrito 3 days ago \| parent [-]
		That's true, however, not so great of an issue because there's a kind of natural selection happening: if the package is popular, other people will eventually read (parts of, at least) the code and catch the most egregious problems. Most packages will have "none" like you said, but they aren't being used by that many people either, so that's ok. Of course this also applies to hypothetical LLM-generated packages that become popular, but some new issues arise: the verbosity and sometimes baffling architecture choices by LLM will certainly make third-party reviews harder and push up the threshold in terms of popularity needed to obtain third party attention.

▲

JustExAWS 4 days ago | parent | prev [-]

When you right your own code, you don’t manually test your code for correctness and corner cases in addition to writing unit tests?

▲ AIorNot 4 days ago | parent | prev | next [-]

I’ve built 2 SaaS applications with LLM coding one of which was expanded and release to enterprise customers and is in good use today - note I’ve got years of dev experience and I follow context and documentation prompts and I’m using common LLM languages like typescript and python and react and AWS infra

Now it requires me to fully review all code and understand what the LLM is doing at the functional, class level and api level- in fact it works better at the method or component level for me and I had a lot of cleanup work (and lots of frustration with the models) on the codebase but overall there’s no way that I could equal the velocity I have now without it

	▲	bmcahren 4 days ago \| parent \| next [-]
		I think the other important step is to reject code your engineers submit that they can't explain for a large enterprise saas with millions of lines of code. I myself reject I'd say 30% of the code the LLMs generate but the power is in being able to stay focused on larger problems while rapidly implementing smaller accessory functions that enable that continued work without stopping to add another engineer to the task. I've definitely 2-4X'd depending on the task. For small tasks I've definitely 20X'd myself for some features or bugfixes.
	▲	conradfr 4 days ago \| parent \| prev [-]
		After all the exciting part of coding has always been code reviews.

▲ infecto 4 days ago | parent | prev | next [-]

I agree with the article but also believe LLM coding can boost my productivity and ability to write code over long stretches. Sure getting it to write a whole feature, high opportunity of risk. But getting it to build out a simple api with examples above and below it, piece of cake, takes a few seconds and would have taken me a few minutes.

▲ dolebirchwood 4 days ago | parent | prev | next [-]

I do frontend work (React/TypeScript). I barely write my own code anymore, aside from CSS (the LLMs have no aesthetic sensibilities). Just prompting with Gemini 2.5 Pro. Sometimes Sonnet 4.

I don't know what to tell you. I just talk to the thing in plain but very specific English and it generally does what I want. Sometimes it will do stupid things, but then I either steer it back in the direction I want or just do it myself if I have to.

▲ herpdyderp 4 days ago | parent | prev | next [-]

The bigger the task, the more messy it'll get. GPT5 can write a single UI component for me no problem. A new endpoint? If it's simple, no problem. The risk increases as the complexity of the task does.

	▲	JustExAWS 4 days ago \| parent [-]
		I break complex task down into simple tasks when using ChatGPT just like I did before ChatGPT with modular design.

▲ panny 4 days ago | parent | prev | next [-]

I think it has a lot to do with skill level. Lower skilled developers seem to feel it gives them a lot of benefit. Higher skilled developers just get frustrated looking at all the errors in produces.

▲ chrischen 4 days ago | parent | prev | next [-]

The AI agents tend to fail for me with open ended or complex tasks requiring multiple steps. But I’ve found it massively helpful if you have these two things: 1) a typed language… better if strongly typed 2) your program is logically structured and follows best practices and has hierarchical composition.

The agents are able to iterate and work with the compiler until it gets it right and the combination of 1 and 2 means there’s fewer possible “right answers” to whatever problem I have. If i structure my prompte to basically fill in the blanks of my code in specific areas it saves a lot of time. Most of what I prompt is something already done, and usually 1 google search away. This saves me the time to search it up, figure out whatever syntax I need, etc.

▲ richardlblair 3 days ago | parent | prev | next [-]

AI is really good at writing tests.

AI is also pretty good if you get it to do small chunks of code for you. This means you come with the architecture, the implementation details, and how each piece is structured. When I walk AI through each unit of code I find the results are better, and it's easier for me to address issues as I progress.

This may seem some what redundant, though. Sometimes it's faster to just do it yourself. But, with a toddler who hates sleep I've found I've been able to maintain my velocity... Even on days I get 3 hrs of sleep.

▲ phatfish 4 days ago | parent | prev | next [-]

I don't code every day and am not an expert. Supposedly the sort of casual coder that LLMs are supposed to elevate into senior engineers.

Even I can see they have big blind spots. As the parent said I get overly verbose code that does run, but is no where near the best solution. Well, for really common problems and patterns I usually get a good answer. Need a more niche problem solved?You better brush up your Googling skills and do some research if you care about code quality.

▲ gspencley 3 days ago | parent | prev | next [-]

My favourite code smell that LLMs love to introduce is redundant code comments.

// assign "bar" to foo

const foo = "bar";

They love to do that shit. I know you can prompt it not to. But the amount of PRs I'm reviewing these days that have those types of comments is insane.

▲ randomjoe2 3 days ago | parent | prev | next [-]

If you actually believe this, you're either using bad models or just terrible at prompting and giving proper context. Let me know if you need help, I use generated code in every corner of my computer every day

▲ dionian 3 days ago | parent | prev | next [-]

Could the quality of your prompt be related to our differing outcome? I have decades of pre-AI experience and I use AI heavily. If I let it go off on its own its not as good as constraining and hand-holding it.

▲ bdcravens 4 days ago | parent | prev | next [-]

I haven't had that experience, but I tend to keep my prompts very focused with a tightly limited scope. Put a different way, if I had a junior or mid level developer, and I wanted them to create a single-purpose class of 100-200 lines at most, that's how I write my prompts.

▲ IT4MD 4 days ago | parent | prev | next [-]

Likewise with Powershell. It's good to give you an approach or some ideas, but copy/paste fails about 80% of the time.

Granted, I may be a inexpert prompter, but at the same time, I'm asking for basic things, as a test, and it just fails miserably most of the time.

▲ Kiro 3 days ago | parent | prev | next [-]

The code LLMs write is much better than mine. Way less shortcuts and spaghetti. Maybe that means that I am a lousy coder but the end result is still better.

▲ burnte 3 days ago | parent | prev | next [-]

I see LLM coding as hinting on steroids. I don't trust it to actually write all of my code, but sometimes it can get me started, like a template.

▲ paulddraper 4 days ago | parent | prev | next [-]

> ya’ll must be prompt wizards

Thank you, but I don’t feel that way.

I’d ask you a lot of details…what tool, what model, what kind of code. But it’d probably take a lot to get to the bottom of the issue.

▲ tempodox 4 days ago | parent | prev | next [-]

It's just that being the dumbest thing we ever heard still doesn't stop some people from doing it anyway. And that goes for many kinds of LLM application.

▲ jimbo808 4 days ago | parent | prev | next [-]

I've been pondering this for a while. I think there's an element of dopamine that LLMs bring to the table. They probably don't make a competent senior engineer much more productive if at all, but there's that element of chance that we don't get a lot of in this line of work.

I think a lot of us eventually arrive at a point where our jobs get a bit boring and all the work starts to look like some permutation of past work. If instead of going to work and spending two hours adding some database fields and writing some tests, you had the opportunity to either:

A) Do the thing as usual in the predictable two hours

B) Spend an hour writing a detailed prompt as if you were instructing a junior engineer on a PIP to do it, and doing all the typical cognitive work you'd have done normally and then some, but then instead of typing out the code in the next hour, you have a random chance to either press enter, and tada the code has been typed and even kinda sorta works, after this computer program was "flibbertigibbeting" for just 10 minutes. Wow!

Then you get that sweet dopamine hit that tells you you're a really smart prompt engineer who did a two hour task in... cough 10 minutes. You enjoy your high for a bit, maybe go chat with some subordinate about how great your CLAUDE.md was and if they're not sure about this AI thing it's just because they're bad at prompt engineering.

Then all you have to do is cross your t's and dot your i's and it's smooth sailing from there.Except, it's not. Because you (or another engineer) will probably find architectural/style issues when reviewing the code that you explicitly told it to follow, but it ignored, and you'll have to fix those. You'll also probably be sobering up from your dopamine rush by now, and realize that you have to either review all the other lines of AI generated code, which you could have just correctly typed once.

But now you have to review with an added degree of scrutny, because you know it's really good at writing text that looks beautiful, but is ever so slightly wrong in ways that might even slip through code review and cause the company to end up in the news.

Alternatively, you could yolo and put up an MR after a quick smell, making some other poor engineer do your job for you (you're a 10x now, you've got better things to do anyway). Or better yet, just have Claude write the MR, and don't even bother to read it. Surely nobody's going to notice your "acceptance critera" section says to make sure the changes have been tested on both Android and Apple, even though you're building a microservice for an AI-powered smart fridge (mostly just a fridge, except every now and then it starts shooting ice cubes across the room at mach 3). Then three months later when someone, who never realized there are three different identical "authenticate," spends an hour scratching their head about why the code they're writing is not doing anything (because it's actually running another redundant function that nobody ever seems to catch in MR review because they're not reflected in a diff.

But yeah, that 10 minute AI magic trick sure felt good. There are times when work is dull enough that option B sounds pretty good, and I'll dabble. But yeah, I'm not sure where this AI stuff leads but I'm pretty confident it won't taking over our jobs any time soon (an ever-increasing quota of H1Bs and STEM opt student visas working for 30% less pay, on the other hand, might).

▲ m3kw9 4 days ago | parent | prev | next [-]

Not only a prompt wizard, you need to know what prompts are bad or good and also use bad/lazy prompts to your advantage

▲ platevoltage 4 days ago | parent | prev | next [-]

This is exactly how I use it.

▲ larodi 4 days ago | parent | prev | next [-]

I must be a prompt wizard then.

▲ threecheese 4 days ago | parent | prev | next [-]

I hate to admit it, but it is the prompt (call it context if ya like, includes tools). Model is important, window/tokensz are important, but direction wins. Also codebase is important, greenfield gets much better results, so much so that we may throw away 40 years of wisdom designed to help humans code amongst each other and use design patterns that will disgust us.

	▲	shaunxcode 4 days ago \| parent [-]
		“we”

▲ uh_uh 4 days ago | parent | prev | next [-]

Which model?

▲ eatsyourtacos 4 days ago | parent | prev [-]

Sounds like you are using it entirely wrong then...

Just yesterday I uploaded a few files of my code (each about 3000+ lines) into a gpt5 project and asked in assistance in changing a lot of database calls into a caching system, and it proceeded to create a full 500 line file with all the caching objects and functions I needed. Then we went section through section of the main 3000+ line file to change parts of the database queries into the cached version. [I didn't even really need to do this, it basically detected everything I would need changing at once and gave me most of it, but I wanted to do it in smaller chunks so I was sure what was going on]

Could I have done this without AI? Sure.. but this was basically like having a second pair of eyes and validating what I'm doing. And saving me a bunch of time so I'm not writing everything from scratch. I have the base template of what I need then I can improve it from there.

All the code it wrote was perfectly clean.. and this is not a one off, I've been using it daily for the last year for everything. It almost completely replaces my need to have a junior developer helping me.

▲

jayd16 4 days ago | parent | next [-]

You mean like it turned on Hibernate or it wrote some custom rolled in app cache layer?

I usually find these kinds of caching solutions to be extremely complicated (well the cache invalidating part) and I'm a bit curious what approach it took.

You mention it only updated a single file so I guess it's not using any updates to the session handling so either sticky sessions are not assumed or something else is going on. So then how do you invalidate the app level cache for a user across all machine instances? I have a lot of trauma from the old web days of people figuring this out so I'm really curious to hear about how this AI one shot it in a single file.

	▲	eatsyourtacos 4 days ago \| parent [-]
		This is C# so basically just automatically detected that I had 4 object types I was working with that were being updated to the database that I want to keep in a concurrent dictionary type of cache. So it created the dictionaries for each object with the appropriate keys, created functions for each object type if I touch an object to get that one updated etc. It created the function to load in the data, then the finalize where it writes to the DB what was touched and clears the cache. Again- I'm not saying this is anything particularly fancy, but it did the general concept of what I wanted. Also this is all iterative; when it creates something I talk to it like a person to say "hey I want to actually load in all the data, even though we will only be writing what changed" and all that kind of stuff. Also the bigger help wasn't really the creation of the cache, it was helping to make the changes and detect what needed to be modified. End of the day even if I want to go a slightly different route of how it did the caching; it creates all the framework so I can simplify if needed. A lot of times for me using this LLM approach is to get all the boilerplate out of the way.. sometimes just starting the process by yourself of something is daunting. I find this to be a great way to begin.

▲

lubesGordi 4 days ago | parent | prev | next [-]

I know, I don't understand what problems people are having with getting usable code. Maybe the models don't work well with certain languages? Works great with C++. I've gotten thousands of lines of clean compiling on the first try and obviously correct code from ChatGPT, Gemini, and Claude.

I've been assuming the people who are having issues are junior devs, who don't know the vocabulary well enough yet to steer these things in the right direction. I wouldn't say I'm a prompt wizard, but I do understand context and the surface area of the things I'm asking the llm to do.

▲

Tiktaalik 4 days ago | parent [-]

From my experience the further you get from the sort of stuff that easily accessible on Stack Overflow the worse it gets. I've had few problems having an AI write out some minor python scripts, but yield severely poorer results with Unreal C++ code and badly hallucinate nonsense if asked in general anything about Unreal architecture and API.

▲

lubesGordi 4 days ago | parent [-]

Does the Unreal API change a bit over versions? I've noticed when asking to do a simple telnet server in Rust it was hallucinating like crazy but when I went to the documentation it was clear the api was changing a lot from version to version. I don't think they do well with API churn. That's my hypothesis anyway.

▲

toshinoriyagi 4 days ago | parent | next [-]

I think the big thing with Unreal is the vast majority of games are closed source. It's already only used for games, as opposed to asking questions about general-purpose programming, but there is also less training data.

▲

mac-mc 4 days ago | parent | prev | next [-]

You see this dynamic even with swift which has a corpus of OSS source code out there, but not nearly as much as js or python and so has always been behind those languages.

▲

Tiktaalik 4 days ago | parent | prev [-]

There would be significant changes from 4 to 5, but sadly I haven’t had any improvement if clarifying version.

	▲	frankc 4 days ago \| parent [-]
		Clarifying can help but ultimately it was trained on older versions. When you are working with a changing api, it's really important that the llm can see examples of the new api and new api docs. Adding context7 as a tool is hugely helpful here. Include in your rules or prompt to consult context7 for docs. https://github.com/upstash/context7

▲

rootnod3 4 days ago | parent | prev [-]

How large is that code-base overall? Would you be able to let the LLM look at the entirety of it without it crapping out?

It definitely sounds nice to go and change a few queries, but did it also consider the potential impacts in other parts of the source or in adjacent running systems? The query itself here might not be the best example, but you get what I mean.