Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code.

Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time.

For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.

We also had Claude Code create a bunch of ESLint automation, including custom ESLint rules and lint checks that catch and auto-handle a lot of stuff before it even hits review.

Then we take it further: we have a deep code review agent Claude Code runs after changes are made. And when a PR goes up, we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it.

On top of that, we’ve got like five other Claude Code GitHub workflow agents that run on a schedule. One of them reads all commits from the last month and makes sure docs are still aligned. Another checks for gaps in end-to-end coverage. Stuff like that. A ton of maintenance and quality work is just… automated. It runs ridiculously smoothly.

We even use Claude Code for ticket triage. It reads the ticket, digs into the codebase, and leaves a comment with what it thinks should be done. So when an engineer picks it up, they’re basically starting halfway through already.

There is so much low-hanging fruit here that it honestly blows my mind people aren’t all over it. 2026 is going to be a wake-up call.

(used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)

Edit: made an example repo for ya

https://github.com/ChrisWiles/claude-code-showcase

▲ klaussilveira 3 days ago | parent | next [-]

I made a similar comment on a different thread, but I think it also fits here: I think the disconnect between engineers is due to their own context. If you work with frontend applications, specially React/React Native/HTML/Mobile, your experience with LLMs is completely different than the experience of someone working with OpenGL, io_uring, libev and other lower level stuff. Sure, Opus 4.5 can one shot Windows utilities and full stack apps, but can't implement a simple shadowing algorithm from a 2003 paper in C++, GLFW, GLAD: https://www.cse.chalmers.se/~uffe/soft_gfxhw2003.pdf

Codex/Claude Code are terrible with C++. It also can't do Rust really well, once you get to the meat of it. Not sure why that is, but they just spit out nonsense that creates more work than it helps me. It also can't one shot anything complete, even though I might feed him the entire paper that explains what the algorithm is supposed to do.

Try to do some OpenGL or Vulkan with it, without using WebGPU or three.js. Try it with real code, that all of us have to deal with every day. SDL, Vulkan RHI, NVRHI. Very frustrating.

Try it with boost, or cmake, or taskflow. It loses itself constantly, hallucinates which version it is working on and ignores you when you provide actual pointers to documentation on the repo.

I've also recently tried to get Opus 4.5 to move the Job system from Doom 3 BFG to the original codebase. Clean clone of dhewm3, pointed Opus to the BFG Job system codebase, and explained how it works. I have also fed it the Fabien Sanglard code review of the job system: https://fabiensanglard.net/doom3_bfg/threading.php

We are not sleeping on it, we are actually waiting for it to get actually useful. Sure, it can generate a full stack admin control panel in JS for my PostgreSQL tables, but is that really "not normal"? That's basic.

▲

JDye 2 days ago | parent | next [-]

We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside of grunt work like minor refactors across many files. It doesn't seem to understand proxying and how it works on both a protocol level and business logic level.

With some entirely novel work we're doing, it's actually a hindrance as it consistently tells us the approach isn't valid/won't work (it will) and then enters "absolutely right" loops when corrected.

I still believe those who rave about it are not writing anything I would consider "engineering". Or perhaps it's a skill issue and I'm using it wrong, but I haven't yet met someone I respect who tells me it's the future in the way those running AI-based companies tell me.

▲

dpc_01234 2 days ago | parent | next [-]

> We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside

I have a great time using Claude Code in Rust projects, so I know it's not about the language exactly.

My working model is is that since LLM are basically inference/correlation based, the more you deviate from the mainstream corpus of training data, the more confused LLM gets. Because LLM doesn't "understand" anything. But if it was trained on a lot of things kind of like the problem, it can match the patterns just fine, and it can generalize over a lot layers, including programming languages.

Also I've noticed that it can get confused about stupid stuff. E.g. I had two different things named kind of the same in two parts of the codebase, and it would constantly stumble on conflating them. Changing the name in the codebase immediately improved it.

So yeah, we've got another potentially powerful tool that requires understanding how it works under the hood to be useful. Kind of like git.

▲

lisperforlife 2 days ago | parent [-]

Recently the v8 rust library changed it from mutable handle scopes to pinned scopes. A fairly simple change that I even put in my CLAUDE.md file. But it still generates methods with HandleScope's and then says... oh I have a different scope and goes on a random walk refactoring completely unrelated parts of the code. All the while Opus 4.5 burns through tokens. Things work great as long as you are testing on the training set. But that said, it is absolutely brilliant with React and Typescript.

	▲	dpc_01234 2 days ago \| parent [-]
		Well, it's not like it never happened to me to "burn tokens" with some lifetime issue. :D But yeah, if you're working in Rust on something with sharp edges, LLM will get get hurt. I just don't tend to have these in my projects. Even more basic failure mode. I told it to convert/copy a bit (1k LOC) of blocking code into a new module and convert to async. It just couldn't do a proper 1:1 logical _copy_. But when I manually `cp <src> <dst>` the file and then told it to convert that to async and fix issues, it did it 100% correct. Because fundamentally it's just non-deterministic pattern generator.

▲

kevin42 2 days ago | parent | prev | next [-]

This isn't meant as a criticism, or to doubt your experience, but I've talked to a few people who had experiences like this. But, I helped them get Claude code setup, analyze the codebase and document the architecture into markdown (edit as needed after), create an agent for the architecture, and prompt it in an incremental way. Maybe 15-30 minutes of prep. Everyone I helped with this responded with things like "This is amazing", "Wow!", etc.

For some things you can fire up Claude and have it generate great code from scratch. But for bigger code bases and more complex architecture, you need to break it down ahead of time so it can just read about the architecture rather than analyze it every time.

▲

ryandrake 2 days ago | parent [-]

Is there any good documentation out there about how to perform this wizardry? I always assumed if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code. If there are extra steps that need to be done, why don't Claude's developers just add those extra steps to /init?

▲

kevin42 2 days ago | parent | next [-]

Not that I have seen, which is probably a big part of the disconnect. Mostly it's tribal knowledge. I learned through experimentation, but I've seen tips here and there. Here's my workflow (roughly)

> Create a CLAUDE.md for a c++ application that uses libraries x/y/z

[Then I edit it, adding general information about the architecture]

> Analyze the library in the xxx directory, and produce a xxx_architecture.md describing the major components and design

> /agent [let claude make the agent, but when it asks what you want it to do, explain that you want it to specialize in subsystem xxx, and refer to xxx_architecture.md

Then repeat until you have the major components covered. Then:

> Using the files named with architecture.md analyze the entire system and update CLAUDE.md to use refer to them and use the specialized agents.

Now, when you need to do something, put it in planning mode and say something like:

> There's a bug in the xxx part of the application, where when I do yyy, it does zzz, but it should do aaa. Analyze the problem and come up with a plan to fix it, and automated tests you can perform if possible.

Then, iterate on the plan with it if you need to, or just approve it.

One of the most important things you can do when dealing with something complex is let it come up with a test case so it can fix or implement something and then iterate until it's done. I had an image processing problem and I gave it some sample data, then it iterated (looking at the output image) until it fixed it. It spent at least an hour, but I didn't have to touch it while it worked.

▲

JDye 9 hours ago | parent | next [-]

I've taken time today to do this. With some of your suggestions, I am seeing an improvement in it's ability to do some of the grunt work I mentioned. It just saved me an hour refactoring a large protocol implementation into a few files and extracted some common utilities. I can recognise and appreciate how useful that is for me and for most other devs.

At the same time, I think there's limitations to these tools and that I wont ever be able to achieve what I see others saying about 95% of code being AI written or leaving the AI to iterate for an hour. There's just too many weird little pitfalls in our work that the AI just cannot seem to avoid.

It's understandable, I've fallen victim to a few of them too, but I have the benefit of the ability to continuously learn/develop/extrapolate in a way that the LLM cannot. And with how little documentation exists for some of these things (MASQUE proxying for example) anytime the LLM encounters this code it throws a fit, and is unable to contribute meaningfully.

So thanks for your suggestions, it has made Claude better and clearly I was dragging my feet a little. At the very least, it's freed up a some more of my time to work on the complex things Claude can't do.

▲

ryandrake 2 days ago | parent | prev | next [-]

To be perfectly honest, I've never used a single /command besides /init. That probably means I'm using 1% of the software's capabilities. In frankness, the whole menu of /-commands is intimidating and I don't know where to start.

▲

theshrike79 2 days ago | parent | next [-]

/commands are like macros or mayyybe aliases. You just put in the commands you see yourself repeating often, like "commit the unstaged files in distinct commits, use xxx style for the commit messages..." - then you can iterate on it if you see any gaps or confusion, even give example commands to use in the different steps.

Skills on the other hand are commands ON STEROIDS. They can be packaged with actual scripts and executables, the PEP723 Python style + uv is super useful.

I have one skill for example that uses Python+Treesitter to check the unit thest quality of a Go project. It does some AST magic to check the code for repetition, stupid things like sleeps and relative timestamps etc. A /command _can_ do it, but it's not as efficient, the scripts for the skill are specifically designed for LLM use and output the result in a hyper-compact form a human could never be arsed to read.

▲

kevin42 2 days ago | parent | prev | next [-]

You don't need to do much, the /agent command is the most useful, and it walks you through it. The main thing though is to give the agent something to work with before you create it. That's why I go through the steps of letting Claude analyze different components and document the design/architecture.

The major benefit of agents is that it keeps context clean for the main job. So the agent might have a huge context working through some specific code, but the main process can do something to the effect of "Hey UI library agent, where do I need to put code to change the color of widget xyz", then the agent does all the thinking and can reply with "that's in file 123.js, line 200". The cleaner you keep the main context, the better it works.

	▲	theshrike79 2 days ago \| parent [-]
		Never thought of Agents in that way to be honest. I think I need to try that style =)

▲

gck1 2 days ago | parent | prev [-]

> In frankness, the whole menu of /-commands is intimidating and I don't know where to start.

claude-code has a built in plugin that it can use to fetch its own docs! You don't have to ever touch anything yourself, it can add the features to itself, by itself.

▲

gck1 2 days ago | parent | prev [-]

This is some great advice. What I would add is to avoid the internal plan mode and just build your own. Built in one creates md files outside the project, gives the files random names and its hard to reference in the future.

It's also hard to steer the plan mode or have it remember some behavior that you want to enforce. It's much better to create a custom command with custom instructions that acts as the plan mode.

My system works like this:

/implement command acts as an orchestrator & plan mode, and it is instructed to launch predefined set of agents based on the problem and have them utilize specific skills. Every time /implement command is initiated, it has to create markdown file inside my own project, and then each subagent is also instructed to update the file when it finished working.

This way, orchestrator can spot that agent misbehaved, and reviewer agent can see what developer agent tried to do and why it was wrong.

▲

HDThoreaun 2 days ago | parent | prev [-]

> if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code.

This is definitely not the case, and the reason anthropic doesnt make claude do this is because its quality degrades massively as you use up its context. So the solution is to let users manage the context themselves in order to minimize the amount that is "wasted" on prep work. Context windows have been increasing quite a bit so I suspect that by 2030 this will no longer be an issue for any but the largest codebases, but for now you need to be strategic.

▲

turkey99 2 days ago | parent | prev | next [-]

Are you still talking about Opus 4.5 I’ve been working on a Rust, kotlin and c++ and it’s been doing well. Incredible at C++, like the number of mistakes it doesn’t make

▲

parliament32 2 days ago | parent | prev [-]

> I still believe those who rave about it are not writing anything I would consider "engineering".

Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators.

The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word.

▲

woah 2 days ago | parent | next [-]

> Engineering is.. novel, for lack of a better word.

Tell that to the guys drawing up the world's 10 millionth cable suspension bridge

	▲	drysine 2 days ago \| parent [-]
		Actually, 10000th https://www.bridgemeister.com/fulllist.htm

▲

ryandrake 2 days ago | parent | prev | next [-]

I don't think it's that helpful to try to gatekeep the "engineering" term or try to separate it into "pure" and "impure" buckets, implying that one is lesser than the other. It should be enough to just say that AI assisted development is much better at non-novel tasks than it is at novel tasks. Which makes sense: LLMs are trained on existing work, and can't do anything novel because if it was trained on a task, that task is by definition not novel.

▲

parliament32 2 days ago | parent [-]

Respectfully, it's absolutely important to "gatekeep" a title that has an established definition and certain expectations attached to the title.

OP says, "BUT YOU DON’T KNOW HOW THE CODE WORKS.. No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled." This is not what I would call an engineer. Or a programmer. "Prompter", at best.

And yes, this is absolutely "lesser than", just like a middleman who subcontracts his work to Fiverr (and has no understanding of the actual work) is "lesser than" an actual developer.

▲

emodendroket 2 days ago | parent [-]

That's not the point being made to you. The point is that most people in the "software engineering" space are applying known tools and techniques to problems that are not groundbreaking. Very few are doing theoretical computer science, algorithm design, or whatever you think it is that should be called "engineering."

▲

windexh8er 2 days ago | parent [-]

So the TL;DR here is... If you're in the business of recreating wheels - then you're in luck! We've automated wheel recreation to an acceptable degree of those wheels being true.

	▲	emodendroket 26 minutes ago \| parent [-]
		Most physical engineers are just applying known techniques all the time too. Most products or bridges or whatever are not solving some heretofore-unsolved problem.

▲

scottyah 2 days ago | parent | prev | next [-]

It's how you use the tool that matters. Some people get bitter and try to compare it to top engineers' work on novel things as a strawman so they can go "Hah! Look how it failed!" as they swing a hammer to demonstrate it cannot chop down a tree. Because the tool is so novel and it's use us a lot more abstract than that of an axe, it is taking awhile for some to see its potential, especially if they are remembering models from even six months ago.

Engineering is just problem solving, nobody judges structural engineers for designing structures with another Simpson Strong Tie/No.2 Pine 2x4 combo because that is just another easy (and therefore cheap) way to rapidly get to the desired state. If your client/company want to pay for art, that's great! Most just want the thing done fast and robustly.

	▲	wolvoleo a day ago \| parent [-]
		I think it's also that the potential is far from being realized yet we're constantly bombarded by braindead marketers trying to convince us that it's the best thing ever already. This is tiring especially when the leadership (not held back by any technical knowledge) believes them. I'm sure AI will get there, I also think it's not very good yet.

▲

loandbehold 2 days ago | parent | prev | next [-]

Coding agents as of Jan 2026 are great at what 95% of software engineers do. For remaining 5% that do really novel stuff -- the agents will get there in a few years.

▲

3oil3 2 days ago | parent | prev [-]

When he said 'just look at what I'v been able to build', I was expecting anything but an 'image converter'

▲

wild_egg 3 days ago | parent | prev | next [-]

I've had Opus 4.5 hand rolling CUDA kernels and writing a custom event loop on io_uring lately and both were done really well. Need to set up the right feedback loops so it can test its work thoroughly but then it flies.

▲

jaggederest 3 days ago | parent [-]

Yeah I've handed it a naive scalar implementation and said "Make this use SIMD for Mac Silicon / NEON" and it just spits out a working implementation that's 3-6x faster and passes the tests, which are binary exact specifications.

▲

jonstewart 3 days ago | parent [-]

It can do this at the level of a function, and that's -useful-, but like the parent reply to top-level comment, and despite investing the time, using skills & subagents, etc., I haven't gotten it to do well with C++ or Rust projects of sufficient complexity. I'm not going to say they won't some day, but, it's not today.

▲

rtfeldman 3 days ago | parent | next [-]

Anecdotally, we use Opus 4.5 constantly on Zed's code base, which is almost a million lines of Rust code and has over 150K active users, and we use it for basically every task you can think of - new features, bug fixes, refactors, prototypes, you name it. The code base is a complex native GUI with no Web tech anywhere in it.

I'm not talking about "write this function" but rather like implementing the whole feature by writing only English to the agent, over the course of numerous back-and-forth interactions and exhausting multiple 200K-token context windows.

For me personally, definitely at least 99% all of the Rust code I've committed at work since Opus 4.5 came out has been from an agent running that model. I'm reading lots of Rust code (that Opus generated) but I'm essentially no longer writing any of it. If dot-autocomplete (and LLM autocomplete) disappeared from IDE existence, I would not notice.

▲

mr_o47 2 days ago | parent | next [-]

Woah that's a very interesting claim you made I was shying away from writing Rust as I am not a Rust developer but hearing from your experience looks like claude has gotten very good at writing Rust

	▲	jaggederest 2 days ago \| parent [-]
		Honestly I think the more you can give Claude a type system and effective tests, the more effective it can be. Rust is quite high up on the test strictness front (though I think more could be done...), so it's a great candidate. I also like it's performance on Haskell and Go, both get you pretty great code out of the box.

▲

norir 2 days ago | parent | prev | next [-]

Have you ever worried that by programming in this way, you are methodically giving Anthropic all the information it needs to copy your product? If there is any real value in what you are doing, what is to stop Anthropic or OpenAI or whomever from essentially one-shotting Zed? What happens when the model providers 10x their costs and also use the information you've so enthusiastically given them to clone your product and use the money that you paid them to squash you?

▲

rtfeldman 2 days ago | parent | next [-]

Zed's entire code base is already open source, so Anthropic has a much more straightforward way to see our code:

https://github.com/zed-industries/zed

▲

kaydub 2 days ago | parent | prev [-]

That's what things like AWS bedrock are for.

Are you worried about microsoft stealing your codebase from github?

	▲	djhn 2 days ago \| parent [-]
		Isn’t it widely assumed Microsoft used private repos for LLM training? And even with a narrower definition of stealing, Microsoft’s ability to share your code with US government agencies is a common and very legitimate worry in plenty of threat model scenarios.

▲

ziml77 2 days ago | parent | prev | next [-]

I just uninstalled Zed today when I realized the reason I couldn't delete a file on Windows because it was open in Zed. So I wouldn't speak too highly of the LLM's ability to write code. I have never seen another editor on Windows make the mistake of opening files without enabling all 3 share modes.

	▲	rtfeldman 4 hours ago \| parent [-]
		Just based on timing, I am almost 100% sure whatever code is responsible was handwritten before anyone working on Windows was using LLMs...but anyway, thank you for the bug report - I'll pass it along!

▲

Snuggly73 2 days ago | parent | prev [-]

The article is arguing that it will basically replace devs. Do you think it can replace you basically one-shotting features/bugs in Zed?

And also - doesn’t that make Zed (and other editors) pointless?

▲

kevin42 2 days ago | parent | next [-]

Trying to one-shot large codebases is a exercise in futility. You need to let Claude figure out and document the architecture first, then setup agents for each major part of the project. Doing this keeps the context clean for the main agent, since it doesn't have to go read the code each time. So one agent can fill it's entire context understanding part of the code and then the main agent asks it how to do something and gets a shorter response.

It takes more work than one-shot, but not a lot, and it pays dividends.

	▲	dpark 2 days ago \| parent [-]
		Is there a guide for doing that successfully somewhere? I would love to play with this on a large codebase. I would also love to not reinvent the wheel on getting Claude working effectively on a large code base. I don’t even know where to start with, e.g., setting up agents for each part.

▲

rtfeldman 2 days ago | parent | prev [-]

> Do you think it can replace you basically one-shotting features/bugs in Zed?

Nobody is one-shotting anything nontrivial in Zed's code base, with Opus 4.5 or any other model.

What about a future model? Literally nobody knows. Forecasts about AI capabilities have had horrendously low accuracy in both directions - e.g. most people underestimated what LLMs would be capable of today, and almost everyone who thought AI would at least be where it is today...instead overestimated and predicted we'd have AGI or even superintelligence by now. I see zero signs of that forecasting accuracy improving. In aggregate, we are atrocious at it.

The only safe bet is that hardware will be faster and cheaper (because the most reliable trend in the history of computing has been that hardware gets faster and cheaper), which will naturally affect the software running on it.

> And also - doesn’t that make Zed (and other editors) pointless?

It means there's now demand for supporting use cases that didn't exist until recently, which comes with the territory of building a product for technologists! :)

▲

Snuggly73 2 days ago | parent [-]

Thanx. More of a "faster keyboard" so far then?

And yeah - if I had a crystal ball, I would be on my private island instead of hanging on HN :)

	▲	rtfeldman 2 days ago \| parent [-]
		Definitely more than a faster keyboard (e.g. I also ask the model to track down the source of a bug, or questions about the state of the code base after others have changed it, bounce architectural ideas off the model, research, etc.) but also definitely not a replacement for thinking or programming expertise.

▲

jaggederest 3 days ago | parent | prev | next [-]

I don't know if you've tried Chatgpt-5.2 but I find codex much better for Rust mostly due to the underlying model. You have to do planning and provide context, but 80%+ of the time it's a oneshot for small-to-medium size features in an existing codebase that's fairly complex. I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job.

If you have any opensource examples of your codebase, prompt, and/or output, I would happily learn from it / give advice. I think we're all still figuring it out.

Also this SIMD translation wasn't just a single function - it was multiple functions across a whole region of the codebase dealing with video and frame capture, so pretty substantial.

	▲	glhaynes 2 days ago \| parent [-]
		"I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job." That's a good way to say it, I totally identify.

▲

andai 2 days ago | parent | prev [-]

Is that a context issue? I wonder if LSP would help there. Though Claude Code should grep the codebase for all necessary context and LSP should in theory only save time, I think there would be a real improvement to outcomes as well.

The bigger a project gets the more context you generally need to understand any particular part. And by default Claude Code doesn't inject context, you need to use 3rd party integrations for that.

▲

348512469721 a day ago | parent | prev | next [-]

> It also can't do rust really well

I have not had this experience at all. It often doesn't get it right on the first pass, yes, but the advantage with Rust vibecoding is that if you give it a rule to "Always run cargo check before you think you're finished" then it will go back and fix whatever it missed on the first pass. What I find particularly valuable is that the compiler forces it to handle all cases like match arms or errors. I find that it often misses edge cases when writing typescript, and I believe that the relative leniency of the typescript compiler is why.

In a similar vein, it is quite good at writing macros (or at least, quite good given how difficult this otherwise is). You often have to cajole it into not hardcoding features into the macro, but since macros resolve at compile time they're quite well-suited for an LLM workflow as most potential bugs will be apparent before the user needs to test. I also think that the biggest hurdle of writing macros to humans is the cryptic compiler errors, but I can imagine that since LLMs have a lot of information about compilers and syntax parsing in their training corpus, they have an easier time with this than the median programmer. I'm sure an actual compiler engineer would be far better than the LLM, but I am not that guy (nor can I afford one) so I'm quite happy to use LLMs here.

For context, I am purely a webdev. I can't speak for how well LLMs fare at anything other than writing SQL, hooking up to REST APIs, React frontend, and macros. With the exception of macros, these are all problems that have been solved a million times thus are more boilerplate than novelty, so I think it is entirely plausible that they're very poor for different domains of programming despite my experiences with them.

	▲	jessoteric a day ago \| parent [-]
		i've also been using opus 4.5 with lots of heavy rust development. i don't "vibe code", but lead it with a relatively firm hand- and it produces pretty good results in surprisingly complicated tasks. for example, one of our public repos works with rust compiler artifacts and cache restoration (https://github.com/attunehq/hurry); if you look at the history you can see it do some pretty surprisingly complex (and well made, for an LLM) changes. its code isn't necessarily what i would always write, or the best way to solve the problem, but it's usually perfectly serviceable if you give it enough context and guidance.

▲

CapsAdmin 3 days ago | parent | prev | next [-]

I built an open to "game engine" entirely in Lua a many years ago, but relying on many third party libraries that I would bind to with FFI.

I thought I'd revive it, but this time with Vulkan and no third-party dependencies (except for Vulkan)

4.5 Sonet, Opus and Gemini 3.5 flash has helped me write image decoders for dds, png jpg, exr, a wayland window implementation, macOS window implementation, etc.

I find that Gemini 3.5 flash is really good at understanding 3d in general while sonnet might be lacking a little.

All these sota models seem to understand my bespoke Lua framework and the right level of abstraction. For example at the low level you have the generated Vulkan bindings, then after that you have objects around Vulkan types, then finally a high level pipeline builder and whatnot which does not mention Vulkan anywhere.

However with a larger C# codebase at work, they really struggle. My theory is that there are too many files and abstractions so that they cannot understand where to begin looking.

▲

lelandfe 3 days ago | parent | prev | next [-]

I'm a quite senior frontend using React and even I see Sonnet 4.5 struggle with basic things. Today it wrote my Zod validation incorrectly, mixing up versions, then just decided it wasn't working and attempted to replace the entire thing with a different library.

▲

baq 3 days ago | parent | next [-]

There’s little reason to use sonnet anymore. Haiku for summaries, opus for anything else. Sonnet isn’t a good model by today’s standards.

	▲	lelandfe an hour ago \| parent [-]
		I have been chastened in the opposite direction by others. I've also subjectively really disliked Opus's speed and I've seen Opus do really silly things too. But I'll try out using it as a daily driver and see if I like it more.

▲

subomi 3 days ago | parent | prev [-]

Why do we all of a sudden hold these agents to some unrealistic high bar? Engineers write bugs all the time and write incorrect validations. But we iterate. We read the stacktrace in Sentry and realise what the hell I was thinking when I wrote that, and we fix things. If you're going to benefit from these agents, you'd need to be a bit more patient and point them correctly to your codebase.

My rule of thumb is that if you can clearly describe exactly what you want to another engineer, then you can instruct the agent to do it too.

▲

puttycat 2 days ago | parent | next [-]

> Engineers write bugs all the time

Why do we hold calculators to such high bars? Humans make calculation mistakes all the time.

Why do we hold banking software to such high bars? People forget where they put their change all the time.

Etc etc.

▲

Der_Einzige 2 days ago | parent [-]

I don't hold calculators to high bars. They think 0.1 + 0.2 = 0.30000000000000004:

https://qntm.org/notpointthree

	▲	recursive 2 days ago \| parent [-]
		Some of them. The good ones don't.

▲

lelandfe 3 days ago | parent | prev [-]

my unrealistic bar lies somewhere above "pick a new library" bug resolution

▲

3D30497420 2 days ago | parent | prev | next [-]

I'll second this. I'm making a fairly basic iOS/Swift app with an accompanying React-based site. I was able to vibe-code the React site (it isn't pretty, but it works and the code is fairly decent). But I've struggled to get the Swift code to be reliable.

Which makes sense. I'm sure there's lots of training data for React/HTML/CSS/etc. but much less with Swift, especially the newer versions.

▲

rootusrootus 2 days ago | parent | next [-]

I had surprising success vibe coding a swift iOS app a while back. Just for fun, since I have a bluetooth OBD2 dongle and an electric truck, I told Claude to make me an app that could connect to the truck using the dongle, read me the VIN, odometer, and state of charge. This was middle of 2025, so before Opus 4.5. It took Claude a few attempts and some feedback on what was failing, but it did eventually make a working app after a couple hours.

Now, was the code quality any good? Beats me, I am not a swift developer. I did it partly as an experiment to see what Claude was currently capable of and partly because I wanted to test the feasibility of setting up a simple passive data logger for my truck.

I'm tempted to take another swing with Opus 4.5 for the science.

▲

billbrown 2 days ago | parent | prev [-]

I hate "vibe code" as a verb. May I suggest "prompt" instead? "I was able to prompt the React site…."

	▲	bigDinosaur 2 days ago \| parent [-]
		You aren't prompting the React site, you're prompting the LLM.

▲

nycdatasci 2 days ago | parent | prev | next [-]

Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models.

	▲	klaussilveira 2 days ago \| parent [-]
		Yes, December 2025 and January 2026.

▲

UncleOxidant 3 days ago | parent | prev | next [-]

I've had pretty good luck with LLM agents coding C. In this case a C compiler that supports a subset of C and targets a customizable microcoded state machine/processor. Then I had Gemini code up a simulator/debugger for the target machine in C++ and it did it in short order and quite successfully - lets you single step through the microcode and examine inputs (and set inputs), outputs & current state - did that in an afternoon and the resulting C++ code looks pretty decent.

▲

HarHarVeryFunny 2 days ago | parent [-]

That's remarkably similar to something I've just started on - I want to create a self-compiling C compiler targeting (and to run on) an 8-bit micro via a custom VM. This a basically a retro-computing hobby project.

I've worked with Gemini Fast on the web to help design the VM ISA, then next steps will be to have some AI (maybe Gemini CLI - currently free) write an assembler, disassembler and interpreter for the ISA, and then the recursive descent compiler (written in C) too.

I already had Gemini 3.0 Fast write me a precedence climbing expression parser as a more efficient drop-in replacement for a recursive descent one, although I had it do that in C++ as a proof-of-concept since I don't know yet what C libraries I want to build and use (arena allocator, etc). This involved a lot of copy-paste between Gemini output and an online C++ dev environment (OnlineGDB), but that was not too bad, although Gemini CLI would have avoided that. Too bad that Gemini web only has "code interpreter" support for Python, not C and/or C++.

Using Gemini to help define the ISA was an interesting process. It had useful input in a "pair-design" process, working on various parts of the ISA, but then failed to bring all the ideas together into a single ISA document, repeatedly missing parts of what had been previously discussed until I gave up and did that manually. The default persona of Gemini seems not very well suited to this type of work flow where you want to direct what to do next, since it seems they've RL'd the heck out of it to want to suggest next step and ask questions rather than do what is asked and wait for further instruction. I eventually had to keep asking it to "please answer then stop", and interestingly quality of the "conversation" seemed to fall apart after that (perhaps because Gemini was now predicting/generating a more adversarial conversation than a collaborative one?).

I'm wondering/hoping that Gemini CLI might be better at working on documentation than Gemini web, since then the doc can be an actual file it is editing, and it can use it's edit tool for that, as opposed to hoping that Gemini web can assemble chunks of context (various parts of the ISA discussion) into a single document.

▲

HarHarVeryFunny a day ago | parent [-]

Just as a self follow-up here (I hate to do it!), after chatting to Gemini some more more about document creation alternatives, it does seem that Gemini CLI is by far the best way to go, since it's working in similar fashion to Claude Code and making targeted edits (string replacements) to files, rather than regenerating from scratch (unless it has misinterpreted something you said as a request to do that, which would be obvious when it showed you the suggested diff).

Another alternative (not recommended due to potential for "drift") is to use Gemini's Canvas capability where it is working on a document rather than a specification being spread out over Chat, but this document is fully regenerated for every update (unlike Claude's artifacts), so there is potential for it to summarize or drop sections of the document ("drift") rather than just making requested changes. Canvas also doesn't have Artifact's versioning to allow you to go back to undo unwanted drifts/changes.

▲

mattarm a day ago | parent [-]

Yeah, the online Gemini app is not good for long lived conversations that build up a body of decisions. The context window gets too large and things drop.

What I’ve learned is that once you reach that point you’ve got to break that problem down into smaller pieces that the AI can work productively with.

If you’re about to start with Gemini-cli I recommend you look up https://github.com/github/spec-kit. It’s a project out of Microsoft/Github that encodes a rigorous spec-then-implement multi pass workflow. It gets the AI to produce specs, double check the specs for holes and ambiguity, plan out implementation, translate that into small tasks, then check them off as it goes. I don’t use spec-kit all the time, but it taught me that what explicit multi pass prompting can do when the context is held in files on disk, often markdown that I can go in and change as needed. I think it ask basically comes down to enforcing enough structure in the form of codified processes, self checks and/or tests for your code.

Pro tip, tell spec-kit to do TDD in your constitution and the tests will keep it on the rails as you progress. I suspect “vibe coding” can get a bad rap due to lack of testing. With AI coding I think test coverage gets more important.

	▲	HarHarVeryFunny a day ago \| parent [-]
		Thanks for the spec-kit recommendation - I'll give it a try!

▲

ryandrake 2 days ago | parent | prev | next [-]

I've found it to be pretty hit-or-miss with C++ in general, but it's really, REALLY bad at 3D graphics code. I've tried to use it to port an OpenGL project to SDL3_GPU, and it really struggled. It would confidently insist that the code it wrote worked, when all you had to do was run it and look at the output to see a blank screen.

▲

Wowfunhappy 2 days ago | parent [-]

I hope I’m not committing a faux pas by saying this—and please feel free to tell me that I’m wrong—but I imagine a human who has been blind since birth would also struggle to build 3D graphics code.

The Claude models are technically multi-modal, but IME the vision side of the equation is really lacking. As a result, Claude is quite good at reasoning about logic, and it can build e.g. simpler web pages where the underlying html structure is enough to work with, but it’s much worse at tasks that inherently require seeing.

▲

ryandrake 2 days ago | parent [-]

Yea, for obvious reasons, it seems to be best at code that transforms data: text/binary input to text/binary output. And where the logic can be tracked and verified at runtime with sufficient (text) logging. In other words, it's much better close loop than open loop. I tried to help it by prompting it to please take a screen capture of its output to verify functionality, but it seems LLMs aren't quite ready for that yet.

	▲	mattarm a day ago \| parent [-]
		They work much better off a test that must pass. That they can “see”. Without it they are just making up some other acceptance criteria.

▲

antonvs 3 days ago | parent | prev | next [-]

> It also can't do Rust really well, once you get to the meat of it. Not sure why that is

Because types are proofs and require global correctness, you can't just iterate, fix things locally, and wait until it breaks somewhere else that you also have to fix locally.

▲

nopakos 3 days ago | parent | prev | next [-]

I have not tried C++, but Codex did a good job with low-level C code, shaders as well as porting 32 bit to 64 bit assembly drawing routines. I have also tried it with retro-computing programming with relative success.

▲

ivm 3 days ago | parent | prev [-]

> Mobile

From what I've seen, CC has troubles with the latest Swift too, partially because of it being latest and partially because it's so convoluted nowadays.

But it's übercharged™ for C#

▲ spaceman_2020 3 days ago | parent | prev | next [-]

I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.

And I get it. Coding with Claude Code really was prompting something, getting errors, and asking it to fix it. Which was still useful but I could see why a skilled coder adding a feature to a complex codebase would just give up

Opus 4.5 really is at a new tier however. It just...works. The errors are far fewer and often very minor - "careless" errors, not fundamental issues (like forgetting to add "use client" to a nextjs client component.

▲ ryandrake 3 days ago | parent | next [-]

This was me. I was a huge AI coding detractor on here for a while (you can check my comment history). But, in order to stay informed and not just be that grouchy curmudgeon all the time, I kept up with the models and regularly tried them out. Opus 4.5 is so much better than anything I've tried before, I'm ready to change my mind about AI assistance.

I even gave -True Vibe Coding- a whirl. Yesterday, from a blank directory and text file list of requirements, I had Opus 4.5 build an Android TV video player that could read a directory over NFS, show a grid view of movie poster thumbnails, and play the selected video file on the TV. The result wasn't exactly full-featured Kodi, but it works in the emulator and actual device, it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything. It was pretty astounding.

Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.

▲

mikestorrent 3 days ago | parent | next [-]

I have a few Go projects now and I speak Go as well as you speak Kotlin. I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.

For instance, I always respected types, but I'm too lazy to go spend hours working on types when I can just do ruby-style duck typing and get a long ways before the inevitable problems rear their head. Now, I can use a strongly typed language and get the advantages for "free".

▲

gck1 2 days ago | parent | next [-]

> I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.

Oh absolutely. I've been using Python for past 15 or so years for everything.

I've never written a single line of Rust in my life, and all my new projects are Rust now, even the quick-script-throwaway things, because it's so much better at instantly screaming at claude when it goes off track. It may take it longer to finish what I asked it to do, but requires so much less involvement from me.

I will likely never start another new project in python ever.

EDIT: Forgot to add that paired with a good linter, this is even more impressive. I told Claude to come up with the most masochistic clippy configuration possible, where even a tiny mistake is instantly punished and exceptions have to be truly exceptional (I have another agent that verifies this each run).

I just wish there was cargo-clippy for enforcing architectural patterns.

▲

tezza 3 days ago | parent | prev [-]

and with types, it makes it easier for rounds of agents to pick up mistakes at compile time, statically. linting and sanity checking untyped languages only goes so far. I've not seen LLM's one shot perl style regexes. and javascript can still have ugly runtime WTFs

	▲	nl 3 days ago \| parent [-]
		I've found this too. I find I'm doing more Typescript projects than Python because of the superior typing, despite the fact I prefer Python.

▲

myk9001 3 days ago | parent | prev | next [-]

Oh, wow, that's impressive, thanks for sharing!

Going to one-up you though -- here's a literal one-liner that gets me a polished media center with beautiful interface and powerful skinning engine. It supports Android, BSD, Linux, macOS, iOS, tvOS and Windows.

`git clone https://github.com/xbmc/xbmc.git`

▲

ryandrake 3 days ago | parent [-]

Hah! I actually initiated the project because I'm a long time XBMC/Kodi user. I started using it when it was called XBMC, on an actual Xbox 1. I am sick and tired of its crashing, poor playback performance, and increasingly bloated feature set. It's embarrassing when I have friends or family over for movie night, and I have to explain "Sorry folks, Kodi froze midway through the movie again" while I frantically try to re-launch/reboot my way back to watching the movie. VLC's playback engine is much better but the VLC app's TV UX is ass. This application actually uses the libVLC playback engine under the hood.

▲

apitman 3 days ago | parent | next [-]

I think anecdotes like this may prove very relevant the next few years. AI might make bad code, but a project of bad code that's still way smaller than a bloated alternative, and has a UX tailored to your exact requirements could be compelling.

A big part of the problem with existing software is that humans seem to be pretty much incapable of deciding a project is done and stop adding to it. We treat creating code like a job or hobby instead of a tool. Nothing wrong with that, unless you're advertising it as a tool.

	▲	ryandrake 3 days ago \| parent [-]
		Yea, after this little experiment, I feel like I can just go through every big, bloated, slow, tech-debt-ridden software I use and replace it with a tiny, bespoke version that does only what I need and no more. The old adage about how "users use 10% of your software's features, but they each use a different 10%" can now be solved by each user just building that 10% for themselves.

▲

indigodaddy 3 days ago | parent | prev [-]

Have you tried VidHub? Works nicely against almost anything. Plex, jellyfin, smb/webdav folder etc

▲

ku1ik 3 days ago | parent | prev | next [-]

How do you know “it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything” if you built it just yesterday? Isn’t it a bit too early for claims like this? I get it’s easy to bring ideas to life but aren’t we overly optimistic?

	▲	missingdays 3 days ago \| parent \| next [-]
		By tomorrow the app will be replaced with a new version from the other competitor, by that time the memory leak will not reveal itself
	▲	ryandrake 2 days ago \| parent \| prev [-]
		Part of the "one day" development time was exhaustively testing it. Since the tool's scope is so small, getting good test coverage was pretty easy. Of course, I'm not guaranteeing through formal verification methods that the code is bug free. I did find bugs, but they were all areas that were poorly specified by me in the requirements.

▲

rdedev 3 days ago | parent | prev | next [-]

I decided to vibe code something myself last week at work. I've been wanting to create a poc that involves a coding agent create custom bokeh plots that a user can interact with and ask follow up questions. All this had to be served using a holoview panel library

At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems. Claude was able to get slthe first iteration up pretty quick. At that stage the app could create a plot and you could interact with it and ask follow up questions.

Then I asked it to extend the app so that it could generate multiple plots and the user could interact with all of them one at a time. It made a bunch of changes but the feature was never implemented. I asked it to do again but got the same outcome. I completely accept the fact that it could just be all because I am using vscode copilot or my promoting skills are not good but the LLM got 70% of the way there and then completely failed

▲

cebert 3 days ago | parent [-]

> At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems.

You really need to at least try Claude Code directly instead of using CoPilot. My work gives us access to CoPilot, Claude Code, and Codex. CoPilot isn’t close to the other more agentic products.

▲

debian3 3 days ago | parent [-]

Vs code copilot extension the harness is not great, but Opus 4.5 with Copilot CLI works quite well.

	▲	pluralmonad 2 days ago \| parent [-]
		Do they manage context differently or have different system prompts? I would assume a lot of that would be the same between them. I think GH Copilots biggest shortcoming is that they are too token cheap. Aggressively managing context to the detriment of the results. Watching Claude read a 500 line file in 100 line chunks just makes me sad.

▲

yieldcrv 3 days ago | parent | prev | next [-]

I recently replaced my monitor with one that could be vertically oriented, because I'm just using Claude Code in the terminal and not looking at file trees at all

but I do want a better way to glance and keep up with what its doing in longer conversations, for my own mental context window

▲

adastra22 3 days ago | parent [-]

Ah, but you’re at the beginning stage young grasshopper. Soon you will be missing that horizontal ultra wide monitor as you spin up 8 different Claude agents in parallel seasons.

	▲	yieldcrv 3 days ago \| parent [-]
		oh I noticed! I've begun doing that on my laptop. I just started going down all my list of sideprojects one by one, then two by two, a Claude Code instance in a terminal window for each folder. It's a bit mental I'm finding that branding and graphic design is the most arduous part, that I'm hoping to accelerate soon. I'm heavily AI assisted there too and I'm evaluating MCP servers to help, but so far I do actually have to focus on just that part as opposed to babysit

▲

libraryofbabel 3 days ago | parent | prev | next [-]

Thanks for posting this. It's a nice reminder that despite all the noise from hype-mongers and skeptics in the past few years, most of us here are just trying to figure this all out with an open mind and are ready to change our opinions when the facts change. And a lot of people in the industry that I respect on HN or elsewhere have changed their minds about this stuff in the last year, having previously been quite justifiably skeptical. We're not in 2023 anymore.

If you were someone saying at the start of 2025 "this is a flash in the pan and a bunch of hype, it's not going to fundamentally change how we write code", that was still a reasonable belief to hold back then. At the start of 2026 that position is basically untenable: it's just burying your head in the sand and wishing for AI to go away. If you're someone who still holds it you really really need to download Claude Code and set it to Opus and start trying it with an open mind: I don't know what else to tell you. So now the question has shifted from whether this is going to transform our profession (it is), to how exactly it's going to play out. I personally don't think we will be replacing human engineers anytime soon ("coders", maybe!), but I'm prepared to change my mind on that too if the facts change. We'll see.

I was a fellow mind-changer, although it was back around the first half of last year when Claude Code was good enough to do things for me in a mature codebase under supervision. It clearly still had a long way to go but it was at that tipping point from "not really useful" to "useful". But Opus 4.5 is something different - I don't feel I have to keep pulling it back on track in quite the way I used to with Sonnet 3.7, 4, even Sonnet 4.5.

For the record, I still think we're in a bubble. AI companies are overvalued. But that's a separate question from whether this is going to change the software development profession.

▲

arcfour 3 days ago | parent [-]

The AI bubble is kind of like the dot-com bubble in that it's a revolutionary technology that will certainly be a huge part of the future, but it's still overhyped (i.e. people are investing without regard for logic).

▲

ryandrake 3 days ago | parent | next [-]

We were enjoying cheap second hand rack mount servers, RAM, hard drives, printers, office chairs and so on for a decade after the original dot com crash. Every company that went out of business liquidated their good shit for pennies.

I'm hoping after AI comes back down to earth there will be a new glut of cheap second hand GPUs and RAM to get snapped up.

▲

libraryofbabel 3 days ago | parent | prev | next [-]

Right. And same for railways, which had a huge bubble early on. Over-hyped on the short time horizon. Long term, they were transformative in the end, although most of the early companies and early investors didn’t reap the eventual profits.

▲

nl 3 days ago | parent | prev [-]

But the dot-com bubble wasn't overhyed in retrospect. It was under-hyped.

▲

arcfour 3 days ago | parent [-]

At the time it was overhyped because just by adding .com to your company's name you could increase your valuation regardless of whether or not you had anything to do with the internet. Is that not stupid?

I think my comparison is apt; being a bubble and a truly society-altering technology are not mutually exclusive, and by virtue of it being a bubble, it is overhyped.

	▲	retsibsi 3 days ago \| parent [-]
		There was definitely a lot of stupid stuff happening. IMO the clearest accurate way to put it is that it was overhyped for the short term (hence the crazy high valuations for obvious bullshit), and underhyped for the long term (in the sense that we didn't really foresee how broadly and deeply it would change the world). Of course, there's more nuance to it, because some people had wild long-term predictions too. But I think the overall, mainstream vibe was to underappreciate how big a deal it was.

▲

fpauser 3 days ago | parent | prev | next [-]

> Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.

... says it all.

	▲	jononor 2 days ago \| parent [-]
		What exactly does it say, in your opinion? I can imagine 4-5 different takes on that post.

▲

sksishbs 3 days ago | parent | prev [-]

[dead]

▲ theshrike79 3 days ago | parent | prev | next [-]

> "asking it to fix it."

This is what people are still doing wrong. Tools in a loop people, tools in a loop.

The agent has to have the tools to detect whatever it just created is producing errors during linting/testing/running. When it can do that, I can loop again, fix the error and again - use the tools to see whether it worked.

I _still_ encounter people who think "AI programming" is pasting stuff into ChatGPT on the browser and they complain it hallucinates functions and produces invalid code.

Well, d'oh.

	▲	ikornaselur 2 days ago \| parent \| next [-]
		Last weekend I was debugging some blocking issue on a microcontroller with embassy-rs, where the whole microcontroller would lock up as soon as I started trying to connect to an MQTT server. I was having Opus investigate it and I kept building and deploying the firmware for testing.. then I just figured I'd explain how it could do the same and pull the logs. Off it went, for the next ~15 minutes it would flash the firmware multiple times until it figured out the issue and fixed it. There was something so interesting about seeing a microcontroller on the desk being flashed by Claude Code, with LEDs blinking indicating failure states. There's something about it not being just code on your laptop that felt so interesting to me. But I agree, absolutely, red/green test or have a way of validating (linting, testing, whatever it is) and explain the end-to-end loop, then the agent is able to work much faster without being blocked by you multiple times along the way.
	▲	gck1 2 days ago \| parent \| prev \| next [-]
		This is kind of why I'm not really scared of losing my job. While Claude is amazing at writing code, it still requires human operators. And even experienced human operators are bad at operating this machinery. Tell your average joe - the one who thinks they can create software without engineers - what "tools-in-a-loop" means, and they'll make the same face they made when you tried explaining iterators to them, before LLMs. Explain to them how typing system, E2E or integration test helps the agent, and suddenly, they now have to learn all the things they would be required to learn to be able to write on their own.
	▲	nprateem 2 days ago \| parent \| prev [-]
		Jules is slow incompetent shit and that uses tools in a loop, so no...

▲ ern 3 days ago | parent | prev | next [-]

I have been out of the loop for a couple of months (vacation). I tried Claude Opus 4.5 at the end of November 2025 with the corporate Github Copilot subscription in Agent mode and it was awful: basically ignoring code and hallucinating.

My team is using it with Claude Code and say it works brilliantly, so I'll be giving it another go.

How much of the value comes from Opus 4.5, how much comes from Claude Code, and how much comes from the combination?

▲

everfrustrated 3 days ago | parent | next [-]

As someone coming from GitHub copilot in vscode and recently trying Claude Code plugin for vscode I don't get the fuss about Claude.

Copilot has by far the best and most intuitive agent UI. Just make sure you're in agent mode and choose Sonnet or Opus models.

I've just cancelled my Claude sub and gone back and will upgrade to the GH Pro+ to get more sonnet/opus.

	▲	pluralmonad 2 days ago \| parent \| next [-]
		I strongly concur with your second statement. Anything other than agent mode in GH copilot feels useless to me. If I want to engage Opus through GH copilot for planning work, I still use agent mode and just indicate the desired output is whatever.md. I obviously only do this in environments lacking a better tool (Claude Code).
	▲	indigodaddy 3 days ago \| parent \| prev \| next [-]
		Check out Antigravity+Google AI Pro $20 plan+Opus 4.5. apparently the Opus limits are insanely generous (of course that could change on a dime).
	▲	ern 3 days ago \| parent \| prev [-]
		I'd used both CC and Copilot Agent Mode in VSCode, but not the combination of CC + Opus 4.5, and I agree, I was happy enough with Copilot. The gap didn't seem big, but in November (which admittedly was when Opus 4.5 was in preview on Copilot) Opus 4.5 with Copilot was awful.

▲

Dusseldorf 3 days ago | parent | prev [-]

I suspect that's the other thing at play here; many people have only tried Copilot because it's cheap with all the other Microsoft subscriptions many companies have. Copilot frankly is garbage compared to Cursor/Claude, even with the same exact models.

▲ AstroBen 3 days ago | parent | prev | next [-]

my issue hasn't been for a long time now that the code they write works or doesn't work. My issues all stem from that it works, but does the wrong thing

▲ zmmmmm 3 days ago | parent | next [-]

> My issues all stem from that it works, but does the wrong thing

It's an opportunity, not a problem. Because it means there's a gap in your specifications and then your tests.

I use Aider not Claude but I run it with Anthropic models. And what I found is that comprehensively writing up the documentation for a feature spec style before starting eliminates a huge amount of what you're referring to. It serves a triple purpose (a) you get the documentation, (b) you guide the AI and (c) it's surprising how often this helps to refine the feature itself. Sometimes I invoke the AI to help me write the spec as well, asking it to prompt for areas where clarification is needed etc.

▲

giancarlostoro 3 days ago | parent [-]

This is how Beads works, especially with Claude Code. What I do is I tell Claude to always create a Bead when I tell it to add something, or about something that needs to be added, then I start brainstorming, and even ask it to do market research what are top apps doing for x, y or z. Then ask it to update the bead (I call them tasks) and then finally when its got enough detail, I tell it, do all of these in parallel.

	▲	beoberha 3 days ago \| parent [-]
		Beads is amazing. It’s such a simple concept but elevates agentic coding to another levels

▲ simonw 3 days ago | parent | prev | next [-]

If it does the wrong thing you tell it what the right thing is and have it try again.

With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try.

▲ GoatInGrey 3 days ago | parent | next [-]

There are several rubs with that operating protocol extending beyond the "you're holding it wrong" claim.

1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

▲

simonw 3 days ago | parent | next [-]

The value I'm getting from this stuff is so large that I'll take those risks, personally.

▲

th0ma5 3 days ago | parent [-]

Glad you found a way to be unfalsifiable! Lol

▲

scubbo 3 days ago | parent [-]

Many people - simonw is the most visible of them, but there are countless others - have given up trying to convinced folks who are determined to not be convinced, and are simply enjoying their increased productivity. This is not a competition or an argument.

▲

llmslave2 3 days ago | parent [-]

Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

My experience scrolling X and HN is a bunch of people going "omg opus omg Claude Code I'm 10x more productive" and that's it. Just hand wavy anecdotes based on their own perceived productivity. I'm open to being convinced but just saying stuff is not convincing. It's the opposite, it feels like people have been put under a spell.

I'm following The Primeagen, he's doing a series where he is trying these tools on stream and following peoples advice on how to use them the best. He's actually quite a good programmer so I'm eager to see how it goes. So far he isn't impressed and thus neither am I. If he cracks it and unlocks significant productivity then I will be convinced.

▲

enraged_camel 3 days ago | parent [-]

>> Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

Simon has produced plenty of evidence over the past year. You can check their submission history and their blog: https://simonwillison.net/

The problem with people asking for evidence is that there's no level of evidence that will convince them. They will say things like "that's great but this is not a novel problem so obviously the AI did well" or "the AI worked only because this is a greenfield project, it fails miserably in large codebases".

▲

llmslave2 3 days ago | parent [-]

It's true that some people will just continually move the goalposts because they are invested in their beliefs. But that doesn't mean that the skepticism around certain claims aren't relevant.

Nobody serious is disputing that LLM's can generate working code. They dispute claims like "Agentic workflows will replace software developers in the short to medium term", or "Agentic workflows lead to 2-100x improvements in productivity across the board". This is what people are looking for in terms of evidence and there just isn't any.

Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity [0]. We also have evidence that it harms our cognitive abilities [1]. Anecdotally, I have found myself lazily reaching for LLM assistance when encountering a difficult problem instead of thinking deeply about the problem. Anecdotally I also struggle to be more productive using AI-centric agents workflows in areas of expertise.

We want evidence that "vibe engineering" is actually more productive across the entire lifespan of a software project. We want evidence that it produces better outcomes. Nobody has yet shown that. It's just people claiming that because they vibe coded some trivial project, all of software development can benefit from this approach. Recently a principle engineer at Google claimed that Claude Code wrote their team's entire year's worth of work in a single afternoon. They later walked that claim back, but most do not.

I'm more than happy to be convinced but it's becoming extremely tiring to hear the same claims being parroted without evidence and then you get called a luddite when you question it. It's also tiring when you push them on it and they blame it on the model you use, and then the agent, and then the way you handle context, and then the prompts, and then "skill issue". Meanwhile all they have to show is some slop that could be hand coded in a couple hours by someone familiar with the domain. I use AI, I was pretty bullish on it for the last two years, and the combination of it simply not living up to expectations + the constant barrage of what feels like a stealth marketing campaign parroting the same thing over and over (the new model is way better, unlike the other times we said that) + the amount of absolute slop code that seems to continue to increase + companies like Microsoft producing worse and worse software as they shoehorn AI into every single product (Office was renamed to Copilot 365). I've become very sensitive to it, much in the same way I was very sensitive to the claims being made by certain VC backed webdev companies regarding their product + framework in the last few years.

I'm not even going to bring up the economic, social, and environmental issues because I don't think they're relevant, but they do contribute to my annoyance with this stuff.

[0] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... [1] https://news.harvard.edu/gazette/story/2025/11/is-ai-dulling...

▲

lunar_mycroft 3 days ago | parent [-]

> Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity

I generally agree with you, but I'd be remiss if I didn't point out that it's plausible that the slow down observed in the METR study was at least partially due to the subjects lack of experience with LLMs. Someone with more experience performed the same experiment on themselves, and couldn't find a significant difference between using LLMs and not [0]. I think the more important point here is that programmers subjective assessment of how much LLMs help them is not reliable, and biased towards the LLMs.

[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

▲

llmslave2 3 days ago | parent [-]

I think we're on the same page re. that study. Actually your link made me think about the ongoing debate around IDE's vs stuff like Vim. Some people swear by IDE's and insist they drastically improve their productivity, others dismiss them or even claim they make them less productive. Sound familiar? I think it's possible these AI tools are simply another way to type code, and the differences averaged out end up being a wash.

▲

AstroBen 2 days ago | parent [-]

IDEs vs vim makes a lot of sense. AI really does feel like using an IDE in a certain way

Using AI for me absolutely makes it feel like I'm more productive. When I look back on my work at the end of the day and look at what I got done, it would be ludicrous to say it was multiple times the amount as my output pre-AI

Despite all the people replying to me saying "you're holding it wrong" I know the fix to it doing the wrong thing. Specify in more detail what I want. The problem with that is twofold:

1. How much to specify? As little as possible is the ideal, if we want to maximize how much it can help us. A balance here is key. If I need to detail every minute thing I may as well write the code myself

2. If I get this step wrong, I still have to review everything, rethink it, go back and re-prompt, costing time

When I'm working on production code, I have to understand it all to confidently commit. It costs time for me to go over everything, sometimes multiple iterations. Sometimes the AI uses things I don't know about and I need to dig into it to understand it

AI is currently writing 90% of my code. Quality is fine. It's fun! It's magical when it nails something one-shot. I'm just not confident it's faster overall

	▲	llmslave2 2 days ago \| parent [-]
		I think this is an extremely honest perspective. It's actually kind of cool that it's gotten to the point it can write most code - albeit with a lot of handholding.

▲

theshrike79 3 days ago | parent | prev | next [-]

I've said this multiple times:

This is why you use this AI bubble (it IS a bubble) to use the VC-funded AI models for dirt cheap prices and CREATE tools for yourself.

Need a very specific linter? AI can do it. Need a complex Roslyn analyser? AI. Any kind of scripting or automation that you run on your own machine. AI.

None of that will go away or suddenly stop working when the bubble bursts.

Within just the last 6 months I've built so many little utilities to speed up my work (and personal life) it's completely bonkers. Most went from "hmm, might be cool to..." to a good-enough script/program in an evening while doing chores.

Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

▲

MarsIronPI 2 days ago | parent | next [-]

> Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

Are there any local models that are at least somewhat comparable to the latest-and-greatest (e.g. Opus 4.5, Gemini 3), especially in terms of coding?

▲

lunar_mycroft 3 days ago | parent | prev [-]

A risk I see with this approach is that when the bubble pops, you'll be left dependent on a bunch of tools which you don't know how to maintain or replace on your own, and won't have/be able to afford access to LLMs to do it for you.

	▲	theshrike79 3 days ago \| parent \| next [-]
		The "tools" in this context are literally a few hundred lines of Python or Github CI build pipeline, we're not talking about 500kLOC massive applications. I'm building tools, not complete factories :) The AI builds me a better hammer specifically for the nails I'm nailing 90% of the time. Even if the AI goes away, I still know how the custom hammer works.
	▲	AstroBen 3 days ago \| parent \| prev \| next [-]
		I thought that initially, but I don't think the skills AI weakens in me are particularly valuable Let's say AI becomes too expensive - I more or less only have to sharpen up being able to write the language. My active recall of the syntax, common methods and libraries. That's not hard or much of a setback Maybe this would be a problem if you're purely vibe coding, but I haven't seen that work long term
	▲	baq 3 days ago \| parent \| prev [-]
		Open source models hosted by independent providers (or even yourself, which if the bubble pops will be affordable if you manage to pick up hardware on fire sales) are already good enough to explain most code.

▲

kaydub 2 days ago | parent | prev [-]

> 1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

I can run multiple agents at once, across multiple code bases (or the same codebase but multiple different branches), doing the same or different things. You absolutely can't keep up with that. Maybe the one singular task you were working on, sure, but the fact that I can work on multiple different things without the same cognitive load will blow you out of the water.

> 2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

Tell the LLM to document in comments why it did things. Human developers often leave and then people with no knowledge of their codebase or their "whys" are even around to give details. Devs are notoriously terrible about documentation.

> 3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

You can't develop at the same velocity, so drop that assumption now. There's all kinds of lower abstractions that you build on top of that you probably can't explain currently.

> 4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

You aren't keeping up with the actual economics. This shit is technically profitable, the unprofitable part is the ongoing battle between LLM providers to have the best model. They know software in the past has often been winner takes all so they're all trying to win.

▲ Capricorn2481 3 days ago | parent | prev | next [-]

> With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try

That's great that this is your experience, but it's not a lot of people's. There are projects where it's just not going to know what to do.

I'm working in a web framework that is a Frankenstein-ing of Laravel and October CMS. It's so easy for the agent to get confused because, even when I tell it this is a different framework, it sees things that look like Laravel or October CMS and suggests solutions that are only for those frameworks. So there's constant made up methods and getting stuck in loops.

The documentation is terrible, you just have to read the code. Which, despite what people say, Cursor is terrible at, because embeddings are not a real way to read a codebase.

▲ simonw 3 days ago | parent [-]

I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast.

One trick I use that might work for you as well:

  Clone GitHub.com/simonw/datasette to /tmp
  then look at /tmp/docs/datasette for
  documentation and search the code
  if you need to

Try that with your own custom framework and it might unblock things.

If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

▲

Capricorn2481 2 days ago | parent [-]

> I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast

Potentially because there is no baggage with similar frameworks. I'm sure it would have an easier time with this if it was not spun off from other frameworks.

> If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

If Claude cannot read the code well enough to begin with, and needs supplemental documentation, I certainly don't want it generating the docs from the code. That's just compounding hallucinations on top of each other.

	▲	simonw 2 days ago \| parent [-]
		Give it a try and see get happens. I find Claude Code is so good at docs that I sometimes investigate a new library by checking out a GitHub repo, deleting the docs/ and README and having Claude write fresh docs from scratch.

▲ aurumque 3 days ago | parent | prev | next [-]

In a circuitous way, you can rather successfully have one agent write a specification and another one execute the code changes. Claude code has a planning mode that lets you work with the model to create a robust specification that can then be executed, asking the sort of leading questions for which it already seems to know it could make an incorrect assumption. I say 'agent' but I'm really just talking about separate model contexts, nothing fancy.

▲

mikestorrent 3 days ago | parent [-]

Cursor's planning functionality is very similar and I have found that I can even use "cheap" models like their Composer-1 and get great results in the planning phase, and then turn on Sonnet or Opus to actually produce the plan. 90% of the stuff I need to argue about is during the planning phase, so I save a ton of tokens and rework just making a really good spec.

It turns out that Waterfall was always the correct method, it's just really slow ;)

	▲	aurumque 2 days ago \| parent [-]
		Did you know that software specifications used to be almost entirely flow charts? There is something to be said for that and waterfall.

▲ cadamsdotcom 3 days ago | parent | prev | next [-]

Even better, have it write code to describe the right thing then run its code against that, taking yourself out of that loop.

▲ giancarlostoro 3 days ago | parent | prev [-]

And if you've told it too many times to fix it, tell it someone has a gun to your head, for some reason it almost always gets it right this very next time.

▲

dare944 3 days ago | parent [-]

If you're a developer at the dawn of the AI revolution, there is absolutely a gun to your head.

	▲	giancarlostoro 2 days ago \| parent [-]
		Yeah, if anyone can truly afford the AI empire. Remember all these "leading" companies are running it at a loss, so most companies paying for it are severely underpaying the cost of it all. We would need an insane technological breakthrough of unlimited memory and power before I start to worry, and at that point, I'll just look for a new career.

▲ jmathai 3 days ago | parent | prev | next [-]

I think it's worth understanding why. Because that's not everyone's experience and there's a chance you could make a change such that you find it extremely useful.

There's a lesser chance that you're working on a code base that Claude Code just isn't capable of helping with.

▲ solumunus 3 days ago | parent | prev [-]

Correct it then, and next time craft a more explicit plan.

▲

wubrr 3 days ago | parent [-]

The more explicit/detailed your plan, the more context it uses up, the less accurate and generally functional it is. Don't get me wrong, it's amazing, but on a complex problem with large enough context it will consistently shit the bed.

	▲	rectang 3 days ago \| parent \| next [-]
		The human still has to manage complexity. A properly modularized and maintainable code base is much easier for the LLM to operate on — but the LLM has difficulty keeping the code base in that state without strong guidance. Putting “Make minimal changes” in my standard prompt helped a lot with the tendency of basically all agents to make too many changes at once. With that addition it became possible to direct the LLM to make something similar to the logical progression of commits I would have made anyway, but now don’t have to work as hard at crafting. Most of the hype merchants avoid the topic of maintainability because they’re playing to non-technical management skeptical of the importance of engineering fundamentals. But everything I’ve experienced so far working with LLMs screams that the fundamentals are more important than ever.
	▲	solumunus 3 days ago \| parent \| prev \| next [-]
		It usually works well for me. With very big tasks I break the plan into multiple MD files with the relevant context included and work through in individual sessions, updating remaining plans appropriately at the end of each one (usually there will be decision changes or additions during iteration).
	▲	pigpop 3 days ago \| parent \| prev [-]
		It takes a lot of plan to use up the context and most of the time the agent doesn't need the whole plan, they just need what's relevant to the current task.

▲ scubbo 3 days ago | parent | prev | next [-]

This was me. I have done a full 180 over the last 12 months or so, from "they're an interesting idea, and technically impressive, but not practically useful" to "holy shit I can have entire days/weeks where I don't write a single line of code".

▲ littlestymaar 3 days ago | parent | prev | next [-]

> I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.

It's not just the deficiencies of earlier versions, but the mismatch between the praise from AI enthusiasts and the reality.

I mean maybe it is really different now and I should definitely try uploading all of my employer's IP on Claude's cloud and see how well it works. But so many people were as hyped by GPT-4 as they are now, despite GPT-4 actually being underwhelming.

Too much hype for disappointing results leads to skepticism later on, even when the product has improved.

▲

roadside_picnic 3 days ago | parent | next [-]

I feel similar, I'm not against the idea that maybe LLMs have gotten so much better... but I've been told this probably 10 times in the last few years working with AI daily.

The funny part about rapidly changing industries is that, despite the fomo, there's honestly not any reward to keeping up unless you want to be a consultant. Otherwise, wait and see what sticks. If this summer people are still citing the Opus 4.5 was a game changing moment and have solid, repeatable workflows, then I'll happily change up my workflow.

Someone could walk into the LLM space today and wouldn't be significantly at a loss for not having paid attention to anything that had happened in the last 4 years other than learning what has stuck since then.

▲

kaydub 2 days ago | parent | next [-]

> The funny part about rapidly changing industries is that, despite the fomo, there's honestly not any reward to keeping up unless you want to be a consultant.

LMAO what???

▲

roadside_picnic 2 days ago | parent | next [-]

I've lived through multiple incredibly rapid changes in tech throughout my career, and the lesson always learned was there is a lot of wasted energy keeping up.

Two big examples:

- Period from early mvc JavaScript frontends (backbone.js etc) and the time of the great React/Angular wars. I completely stepped out of the webdev space during that time period.

- The rapid expansion of Deep Learning frameworks where I did try to keep up (shipped some Lua torch packages and made minor contributions to Pylearn2).

In the first case, missing 5 years of front-end wars had zero impact. After not doing webdev work at all for 5-years I was tasked with shipping a React app. It took me a week to catch up, and everything was deployed in roughly the same time as someone would have had they spent years keeping up with changes.

In the second case, where I did keep up with many of the developing deep learning frameworks, it didn't really confer any advantage. Coworkers who I worked with who started with Pytorch fresh out of school were just as proficient, if not more so, with building models. Spending energy keeping up offered no value other than feeling "current" at the time.

Can you give me a counter example of where keeping up with a rapidly changing area that's unstable has conferred a benefit to you? Most of FOMO is really just fear. Again, unless you're trying to sell your self specifically as a consultant on the bleeding edge, there's no reason to keep up with all these changes (other than finding it fun).

	▲	kaydub 2 days ago \| parent [-]
		You moved out of webdev for 5 years, not everybody else had that luxury. I'm sure it was beneficial to those people to keep up with webdev technologies.

▲

recursive 2 days ago | parent | prev [-]

If everything changes every month, then stuff you learn next month would be obsolete in two months. This is a response to people saying "adapt or be left behind". There's so much thrashing that if you're not interested with the SOTA, you can just wait for everything to calm down and pick it up then.

▲

baq 3 days ago | parent | prev [-]

If the trend line holds you’ll be very, very surprised.

▲

spaceman_2020 3 days ago | parent | prev [-]

You enter some text and a computer spits out complex answers generated on the spot

Right or wrong - doesn’t matter. You typed in a line of text and now your computer is making 3000 word stories, images, even videos based on it

How are you NOT astounded by that? We used to have NONE of this even 4 years ago!

▲

littlestymaar 3 days ago | parent | next [-]

Of course I'm astounded. But being spectacular and being useful are entirely different things.

▲

spaceman_2020 2 days ago | parent [-]

If you've found nothing useful about AI so far then the problem is likely you

	▲	recursive 2 days ago \| parent [-]
		I don't think it's necessarily a problem. And even if you accept that the problem is you, it doesn't exactly provide a "solution".

▲

nprateem 3 days ago | parent | prev [-]

Because I want correct answers.

	▲	Kim_Bruning 2 days ago \| parent [-]
		> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. -- Charles Babbage

▲ troupo 3 days ago | parent | prev | next [-]

> Opus 4.5 really is at a new tier however. It just...works.

Literally tried it yesterday. I didn't see a single difference with whatever model Claude Code was using two months ago. Same crippled context window. Same "I'll read 10 irrelevant lines from a file", same random changes etc.

▲ EMM_386 3 days ago | parent | next [-]

The context window isn't "crippled".

Create a markdown document of your task (or use CLAUDE.md), put it in "plan mode" which allows Claude to use tool calls to ask questions before it generates the plan.

When it finishes one part of the plan, have it create a another markdown document - "progress.md" or whatever with the whole plan and what is completed at that point.

Type /clear (no more context window), tell Claude to read the two documents.

Repeat until even a massive project is complete - with those 2 markdown documents and no context window issues.

▲ troupo 3 days ago | parent [-]

> The context window isn't "crippled".

... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.

▲ EMM_386 2 days ago | parent [-]

> ... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.

No - that's not what I did.

You don't need an extra-long context full of irrelevant tokens. Claude doesn't need to see the code it implemented 40 steps ago in a working method from Phase 1 if it is on Phase 3 and not using that method. It doesn't need reasoning traces for things it already "thought" through.

This other information is cluttering, not helpful. It is making signal to noise ratio worse.

If Claude needs to know something it did in Phase 1 for Phase 4 it will put a note on it in the living markdown document to simply find it again when it needs it.

▲ troupo 2 days ago | parent [-]

Again, you're basically explaining how Claude has a very short limited context and you have to implement multiple workarounds to "prevent cluttering". Aka: try to keep context as small as possible, restart context often, try and feed it only small relevant information.

What I very succinctly called "crippled context" despite claims that Opus 4.5 is somehow "next tier". It's all the same techniques we've been using for over a year now.

▲ scotty79 2 days ago | parent [-]

Context is a short term memory. Yours is even more limited and yet somehow you get by.

▲ troupo 2 days ago | parent [-]

I get by because I also have long-term memory, and experience, and I can learn. LLMs have none of that, and every new session is rebuilding the world anew.

And even my short-term memory is significantly larger than the at most 50% of the 200k-token context window that Claude has. It runs out of context before my short-term memory is probably not even 1% full, for the same task (and I'm capable of more context-switching in the meantime).

And so even the "Opus 4.5 really is at a new tier" runs into the very same limitations all models have been running into since the beginning.

▲ scotty79 2 days ago | parent [-]

> LLMs have none of that, and every new session is rebuilding the world anew.

For LLMs long term memory is achieved by tooling. Which you discounted in your previous comments.

You also overstimate capacity of your short-term memory by few orders of magnitude:

https://my.clevelandclinic.org/health/articles/short-term-me...

▲ troupo 2 days ago | parent [-]

> For LLMs long term memory is achieved by tooling. Which you discounted in your previous comments.

My specific complaint, which is an observable fact about "Opus 4.5 is next tier": it has the same crippled context that degrades the quality of the model as soon as it fills 50%.

EMM_386: no-no-no, it's not crippled. All you have to do is keep track across multiple files, clear out context often, feed very specific information not to overflow context.

Me: so... it's crippled, and you need multiple workarounds

scotty79: After all it's the same as your own short-term memory, and <some unspecified tooling (I guess those same files)> provide long-term memory for LLMs.

Me: Your comparison is invalid because I can go have lunch, and come back to the problem at hand and continue where I left off. "Next tier Opus 4.5" will have to be fed the entire world from scratch after a context clear/compact/in a new session.

Unless, of course, you meant to say that "next tier Opus model" only has 15-30 second short term memory, and needs to keep multiple notes around like the guy from Memento. Which... makes it crippled.

▲ scotty79 2 days ago | parent [-]

If you refuse to use what you call workarounds and I call long term memory then you end up with a guy from Memento and regardless of how smart the model is it can end up making same mistakes. And that's why you can't tell the difference between smarter and dumber one while others can.

▲ recursive 2 days ago | parent | next [-]

I think the premise is that if it was the "next tier" than you wouldn't need to use these workarounds.

▲ troupo 2 days ago | parent | prev [-]

> If you refuse to use what you call workarounds

Who said I refuse them?

I evaluated the claim that Opus is somehow next tier/something different/amazeballs future at its face value. It still has all the same issues and needs all the same workarounds as whatever I was using two months ago (I had a bit of a coding hiatus between beginning of December and now).

> then you end up with a guy from Memento and regardless of how smart the model is

Those models are, and keep being the guy from memento. Your "long memory" is nothing but notes scribbled everywhere that you have to re-assemble every time.

> And that's why you can't tell the difference between smarter and dumber one while others can.

If it was "next tier smarter" it wouldn't need the exact same workarounds as the "dumber" models. You wouldn't compare the context to the 15-30 second short-term memory and need unspecified tools [1] to have "long-term memory". You wouldn't have the model behave in an indistinguishable way from a "dumber" model after half of its context windows has been filled. You wouldn't even think about context windows. And yet here we are

[1] For each person these tools will be a different collection of magic incantations. From scattered .md files to slop like Beads to MCP servers providing access to various external storage solutions to custom shell scripts to ...

BTW, I still find "superpowers" from https://github.com/obra/superpowers to be the single best improvement to Claude (and other providers) even if it's just another in a long serious of magic chants I've evaluated.

▲ scotty79 a day ago | parent [-]

> Those models are, and keep being the guy from memento. Your "long memory" is nothing but notes scribbled everywhere that you have to re-assemble every time.

That's exactly how the long term memory works in humans as well. The fact that some of these scribbles are done chemically in the same organ that does the processing doesn't make it much better. Human memories are reassembled at recall (often inaccurately). And humans also scribble when they try to solve a problem that exceeds their short term memory.

> If it was "next tier smarter" it wouldn't need the exact same workarounds as the "dumber" models.

This is akin to opposing calling processor next tier because it still needs RAM and bus to communicate with it and SSD as well. You think it should have everything in cache to be worthy of calling it next tier.

It's fine to have your own standards for applying words. But expect further confusion and miscommunication with other people if don't intend to realign.

▲ troupo a day ago | parent [-]

> That's exactly how the long term memory works in humans as well.

Where this is applicable when is you go away from a problem for a while. And yet I don't lose the entire context and have to rebuild it from scratch when I go for lunch, for example.

Models have to rebuild the entire world from scratch for every small task.

> This is akin to opposing calling processor next tier because it still needs RAM and bus to communicate with it and SSD as well.

You're so lost in your own metaphor that it makes no sense.

> You think it should have everything in cache to be worthy of calling it next tier.

No. "Next tier" implies something significantly and observably better. I don't. And here you are trying to tell me "if you use all the exact same tools that you have already used before with 'previous tier models' you will see it is somehow next tier".

If your "next tier" needs an equator-length list of caveats and all the same tools, it's not next tier is it?

BTW. I'm literally coding with this "next tier" tool with "long memory just like people". After just doing the "plan/execute/write notes" bullshit incantations I had to correct it:

    You're right, I fucked up on all three counts:

    1. FileDetails - I should have WIRED IT UP, not deleted it. 
       It's a useful feature to preview file details before playing.
       I treated "unused" as "unwanted" instead of "not yet connected".
  
    2. Worktree not merged - Complete oversight. Did all the work but
       didn't finish the job.
  
    3. _spacing - Lazy fix. Should have analyzed why it exists and either
      used it or removed the layout constraint entirely.

So next tier. So long memory. So person-like.

Oh. Within about 10 seconds after that it started compacting the "non-crippled" context window and immediately forgot most of what it had just been doing. So I had to clear out the context and teach it the world from the start again.

Edit. And now this amazing next tier model completely ignored that there already exists code to discover network interfaces, and wrote bullshit code calling CLI tools from Rust. So once again it needed to be reminded of this.

> It's fine to have your own standards for applying words. But expect further confusion and miscommunication with other people if don't intend to realign.

I mean, just like crypto bros before them, AI bros do sure love to invent their own terminology and their own realities that have nothing to do with anything real and observable.

	▲	scotty79 a day ago \| parent [-]
		> "You're right, I fucked up on all three counts:" It very well might be that AI tools are not for you, if you are getting such poor results with your methods of approaching them. If you would like to improve your outcomes at some point, ask people who achieve better results for pointers and try them out. Here's a freebie, never tell AI it fucked up.

▲ mikestorrent 3 days ago | parent | prev | next [-]

200k+ tokens is a pretty big context window if you are feeding it the right context. Editors like Cursor are really good at indexing and curating context for you; perhaps it'd be worth trying something that does that better than Claude CLI does?

▲ troupo 3 days ago | parent [-]

> a pretty big context window if you are feeding it the right context.

Yup. There's some magical "right context" that will fix all the problems. What is that right context? No idea, I guess I need to read a yet-another 20 000-word post describing magical incantations that you should or shouldn't do in the context.

The "Opus 4.5 is something else/nex tier/just works" claims in my mind means that I wouldn't need to babysit its every decision, or that it would actually read relevant lines from relevant files etc. Nope. Exact same behaviors as whatever the previous model was.

Oh, and that "200k tokens context window"? It's a lie. The quality quickly degrades as soon as Claude reaches somewhere around 50% of the context window. At 80+% it's nearly indistinguishable from a model from two years ago. (BTW, same for Codex/GPT with it's "1 million token window")

▲ mikestorrent 2 hours ago | parent | next [-]

> There's some magical "right context" that will fix all the problems.

All I can tell you is that in my own lived experience, I've had some fantastic results from AI, and it comes from telling it "look at this thing here, ok, i want you to chain it to that, please consider this factor, don't forget that... blah blah blah" like how I would have spelled things out to a junior developer, and then it really does stand a really solid chance of turning out what I've asked for. It helps a lot that I know what to ask for; there's no replacing that with AI yet.

So, your own situation must fall into one of these coarse buckets:

- You're doing something way too hard for AI to have a chance at yet, like real science / engineering at the frontier, not just boring software or infra development

- Your prompts aren't specific enough, you're not feeding it context, and you're expecting it to one-shot things perfectly instead of having to spend an afternoon prompting and correcting stuff

- You're not actually using and getting better at the tools, so you're just shouting criticisms from the sidelines, perhaps as sour grape because you're not allowed by policy / company can't afford to have you get into it.

IDK. I hope it's the first one and you're just doing Really Hard Things, but if you're doing normal software developer stuff and not seeing a productivity advantage, it's a fucking skill issue.

▲ theshrike79 3 days ago | parent | prev | next [-]

It's like working with humans:

  1) define problem
  2) split problem into small independently verifiable tasks
  3) implement tasks one by one, verify with tools

With humans 1) is the spec, 2) is the Jira or whatever tasks

With an LLM usually 1) is just a markdown file, 2) is a markdown checklist, Github issues (which Claude can use with the `gh` cli) and every loop of 3 gets a fresh context, maybe the spec from step 1 and the relevant task information from 2

I haven't ran into context issues in a LONG time, and if I have it's usually been either intentional (it's a problem where compacting wont' hurt) or an error on my part.

▲

troupo 3 days ago | parent [-]

> every loop of 3 gets a fresh context, maybe the spec from step 1 and the relevant task information from 2

> I haven't ran into context issues in a LONG time

Because you've become the reverse centaur :) "a person who is serving as a squishy meat appendage for an uncaring machine." [1]

You are very aware of the exact issues I'm talking about, and have trained yourself to do all the mechanical dance moves to avoid them.

I do the same dances, that's why I'm pointing out that they are still necessary despite the claims of how model X/Y/Z are "next tier".

[1] https://doctorow.medium.com/https-pluralistic-net-2025-12-05...

▲

theshrike79 3 days ago | parent [-]

Yes and no. I've worked quite a bit with juniors, offshore consultants and just in companies where processes are a bit shit.

The exact same method that worked for those happened to also work for LLMs, I didn't have to learn anything new or change much in my workflow.

"Fix bug in FoobarComponent" is enough of a bug ticket for the 100x developer in your team with experience with that specific product, but bad for AI, juniors and offshored teams.

Thus, giving enough context in each ticket to tell whoever is working on it where to look and a few ideas what might be the root cause and how to fix it is kinda second nature to me.

Also my own brain is mostly neurospicy mush, so _I_ need to write the context to the tickets even if I'm the one on it a few weeks from now. Because now-me remembers things, two-weeks-from-now me most likely doesn't.

	▲	troupo 2 days ago \| parent [-]
		The problem with LLMs (similar to people :) ) is that you never really know what works. I've had Claude one-shot "implement <some complex requirement>" with little additional input, and then completely botch even the smallest bug fix with explicit instructions and context. And vice versa :)

▲ CuriouslyC 3 days ago | parent | prev [-]

I realize your experience has been frustrating. I hope you see that every generation of model and harness is converting more hold-outs. We're still a few years from hard diminishing returns assuming capital keeps flowing (and that's without any major new architectures which are likely) so you should be able to see how this is going to play out.

It's in your interest to deal with your frustration and figure out how you can leverage the new tools to stay relevant (to the degree that you want to).

Regarding the context window, Claude needs thinking turned up for long context accuracy, it's quite forgetful without thinking.

▲

th0ma5 3 days ago | parent | next [-]

I think it's important for people who want to write a comment like this to understand how much this sounds like you're in a cult.

▲

mikestorrent 2 hours ago | parent | next [-]

And, conversely, when we read a comment like yours, it sounds like someone who's afraid of computers, would maybe have decried the bicycle and automobile, and really wishes they could just go live in a cabin in the woods.

(And it's fine to do so, just don't mail bombs to us, ok?)

▲

CuriouslyC 3 days ago | parent | prev | next [-]

Personally I'm sympathetic to people who don't want to have to use AI, but I dislike it when they attack my use of AI as a skill issue. I'm quite certain the workplace is going to punish people who don't leverage AI though, and I'm trying to be helpful.

▲

troupo 3 days ago | parent [-]

> but I dislike it when they attack my use of AI as a skill issue.

No one attacked your use of AI. I explained my own experience with the "Claude Opus 4.5 is next tier". You barged in, ignored anything I said, and attacked my skills.

> the workplace is going to punish people who don't leverage AI though, and I'm trying to be helpful.

So what exactly is helpful in your comments?

	▲	CuriouslyC 2 days ago \| parent [-]
		The only thing I disagreed with in your post is your objectively incorrect statement regarding Claude's context behavior. Other than that I'm just trying to encourage you to make preparations for something that I don't think you're taking seriously enough yet. No need to get all worked up, it'll only reflect on you.

▲

pigeons 2 days ago | parent | prev [-]

It certainly sounds unkind, if not cultish.

▲

troupo 3 days ago | parent | prev [-]

Note how nothing in your comment addresses anything I said. Except the last sentence that basically confirms what I said. This perfectly illustrates the discourse around AI.

As for the snide and patronizing "it's in your interest to stay relevant":

1. I use these tools daily. That's why I don't subscribe to willful wide-eyed gullibility. I know exactly what these tools can and cannot do.

The vast majority of "AI skeptics" are the same.

2. In a few years when the world is awash in barely working incomprehensible AI slop my skills will be in great demand. Not because I'm an amazing developer (I'm not), but because I have experience separating wheat from the chaff

▲

CuriouslyC 2 days ago | parent [-]

The snide and patronizing is your projection. It kinda makes me sad when the discourse is so poisoned that I can't even encourage someone to protect their own future from something that's obviously coming (technical merits aside, purely based on social dynamics).

It seems the subject of AI is emotionally charged for you, so I expect friendly/rational discourse is going to be a challenge. I'd say something nice but since you're primed to see me being patronizing... Fuck you? That what you were expecting?

▲

troupo 2 days ago | parent [-]

> The snide and patronizing is your projection.

It's not me who decided to barge in, assume their opponent doesn't use something or doesn't want to use something, and offer unsolicited advice.

> It kinda makes me sad when the discourse is so poisoned that I can't even encourage someone to protect their own future from something that's obviously coming

See. Again. You're so in love with your "wisdom" that you can't even see what you sound like: snide, patronising, condenscending. And completely missing the whole point of what was written. You are literally the person who poisons the discourse.

Me: "here are the issues I still experience with what people claim are 'next tier frontier model'"

You: "it's in your interests to figure out how to leverage new tools to stay relevant in the future"

Me: ... what the hell are you talking about? I'm using these tools daily. Do you have anything constructive to add to the discourse?

> so I expect friendly/rational discourse is going to be a challenge.

It's only challenge to you because you keep being in love with your voice and your voice only. Do you have anything to contribute to the actual rational discourse, are you going to attack my character?

> 'd say something nice but since you're primed to see me being patronizing... Fuck you? T

Ah. The famous friendly/rational discourse of "they attack my use of AI" (no one attacked you), "why don't you invest in learning tools to stay relevant in the future" (I literally use these tools daily, do you have anything useful to say?) and "fuck you" (well, same to you).

> That what you were expecting?

What I was expecting is responses to what I wrote, not you riding in on a high horse.

▲

CuriouslyC 2 days ago | parent | next [-]

You were the one complaining about how the tools aren't giving you the results you expected. If you're using these tools daily and having a hard time, either you're working on something very different from the bulk of people using the tools and your problems or legitimate, or you aren't and it's a skill issue.

If you want to take politeness as being patronizing, I'm happy to stop bothering. My guess is you're not a special snowflake, and you need to "get good" or you're going to end up on unemployment complaining about how unfair life is. I'd have sympathy but you don't seem like a pleasant human being to interact with, so have fun!

▲

troupo 2 days ago | parent [-]

> ou were the one complaining about how the tools aren't giving you the results you expected.

They are not giving me the results people claim they give. It is distinctly different from not giving the results I want.

> If you're using these tools daily and having a hard time, either you're working on something very different from the bulk of people using the tools and your problems or legitimate, or you aren't and it's a skill issue.

Indeed. And your rational/friendly discourse that you claim you're having would start with trying to figure that out. Did you? No, you didn't. You immediately assumed your opponent is a clueless idiot who is somehow against AI and is incapable or learning or something.

> If you want to take politeness as being patronizing, I'm happy to stop bothering.

No. It's not politeness. It's smugness. You literally started your interaction in this thread with a "git gud or else" and even managed to complain later that "you dislike it when they attack your use of AI as a skill issue". While continuously attacking others.

> you don't seem like a pleasant human being to interact with

Says the person who has contributed nothing to the conversation except his arrogance, smugness, holier-than-thou attitude, engaged in nothing but personal attacks, complained about non-existent grievances and when called out on this behavior completed his "friendly and rational discourse" with a "fuck you".

Well, fuck you, too.

Adieu.

	▲	cindyllm 2 days ago \| parent [-]
		[dead]

▲

cindyllm 2 days ago | parent | prev [-]

[dead]

▲ llmslave2 3 days ago | parent | prev | next [-]

That's because Opus has been out for almost 5 months now lol. Its the same model, so I think people have been vibe coding with a heavy dose of wine this holiday and are now convinced its the future.

▲

Leynos 2 days ago | parent | next [-]

Opus 4.5 was released 24th November.

▲

spaceman_2020 2 days ago | parent | prev [-]

Looks like you hallucinated the Opus release date

Are you sure you're not an LLM?

	▲	llmslave2 2 days ago \| parent [-]
		Opus 4.1 was released in August or smth.

▲ pluralmonad 2 days ago | parent | prev | next [-]

I'm not familiar with any form of intelligence that does not suffer from a bloated context. If you want to try and improve your workflow, a good place to start is using sub-agents so individual task implementations do not fill up your top level agents context. I used to regularly have to compact and clear, but since using sub-agents for most direct tasks, I hardly do anymore.

	▲	troupo 2 days ago \| parent [-]
		1. It's a workaround for context limitations 2. It's the same workarounds we've been doing forever 3. It's indistinguishable from "clear context and re-feed the entire world of relevant info from scratch" we've had forever, just slightly more automated That's why I don't understand all the "it's new tier" etc. It's all the same issues with all the same workarounds.

▲ iwontberude 3 days ago | parent | prev [-]

I use Sonnet and Opus all the time and the differences are almost negligible

▲ 2 days ago | parent | prev | next [-]

[deleted]

▲ iwontberude 3 days ago | parent | prev | next [-]

Opus 4.5 is fucking up just like Sonnet really. I don't know how your use is that much different than mine.

▲ biammer 3 days ago | parent | prev | next [-]

[flagged]

▲

keeda 3 days ago | parent | next [-]

Actually, I've been saying that even models from 2+ years ago were extremely good, but you needed to "hold them right" to get good results, else you might cut yourself on the sharp edges of the "jagged frontier" (https://www.hbs.edu/faculty/Pages/item.aspx?num=64700) Unfortunately, this often necessitated you to adapt yourself to the tool, which is a big change -- unfeasible for most people and companies.

I would say the underlying principle was ensuring a tight, highly relevant context (e.g. choose the "right" task size and load only the relevant files or even code snippets, not the whole codebase; more manual work upfront, but almost guaranteed one-shot results.)

With newer models the sharper edges have largely disappeared, so you can hold them pretty much any which way and still get very good results. I'm not sure how much of this is from the improvements in the model itself vs the additional context it gets from the agentic scaffolding.

I still maintain that we need to adapt ourselves to this new paradigm to fully leverage AI-assisted coding, and the future of coding will be pretty strange compared to what we're used to. As an example, see Gas Town: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

▲

CuriouslyC 3 days ago | parent [-]

FWIW, Gas Town is strange because Steve is strange (in a good way).

It's just the same agent swarm orchestration that most agent frameworks are using, but with quirky marketing. All of that is just based on the SDLC [PM/Architect -> engineer planning group -> engineer -> review -> qa/evaluation] loop most people here should be familiar with. So actually pretty banal, which is probably part of the reason Steve decided to be zany.

	▲	keeda 3 days ago \| parent [-]
		Ah, gotcha, I am still working through the article, but its detailed focus on all the moving parts under the covers is making it hard to grok the high-level workflow.

▲

QuantumGood 3 days ago | parent | prev | next [-]

Each failed prediction should lower our confidence in the next "it's finally useful!" claim. But this inductive reasoning breaks down at genuine inflection points.

I agree with your framing that measuring should NOT be separated from political issues, but each can be made clear separately (framing it as "training the tools of the oppressor" seems to conflate measuring tool usefulness with politics).

▲

biammer 3 days ago | parent [-]

[flagged]

▲

mikestorrent 3 days ago | parent | next [-]

> How is it useful to you that these companies are so valuation hungry that they are moving money into this technology in such a way that people are fearful it could cripple the entire global economy?

The creation of entire new classes of profession has always been the result of technological breakthroughs. The automobile did not cripple the economy, even as it ended the buggy-whip barons.

> How is it useful to you that this tech is so power hungry that environmental externalities are being further accelerated while regular people's utility costs are raising to cover the increased demand(whether they use the tech to "code" or "manifest art")?

There will be advantages to lower-power computing, and lower-cost electricity. Implement carbon taxes and AI companies will follow the market incentive to install their datacentres in places where sustainable power is available for cheap. We'll see China soaring to new heights with their massive solar investment, and America will eventually figure out they have to catch up and cannot do so with coal and gas.

> How is it useful to you that this tech is so compute hungry that they are seemingly ending the industry of personal compute to feed this tech's demand?

Temporary problem, the demand for personal computing is not going to die in five years, and meanwhile the lucrative markets for producing this equipment will result in many new factories, increasing capacity and eventually lowering prices again. In the meantime, many pundits are suggesting that this may thankfully begin the end of the Electron App Era where a fuckin' chat client thinks it deserves 1GB of RAM.

Consider this: why are we using Electron and needing 32GB of RAM on a desktop? Because web developers only knew how to use Javascript and couldn't write a proper desktop app. With AI, desktop frameworks can have a resurgence; why shouldn't I use Go or Rust and write a native app on all platforms now that the cost of doing so is decreasing and the number of people empowered to work with it is increasing? I wrote a nice multithreaded fractal renderer in Rust the other day; I don't know how to multithread, write Rust, and probably can't iterate complex numbers correctly on paper anymore....

> How is it useful to you that this tech is so water hungry that it is emptying drinking water acquifers?

This is only a problem in places that have poor water policy, e.g. California (who can all thank the gods that their reservoirs are all now very full from the recent rain). This problem predates datacenters and needs to be solved - for instance, by federalizing and closing down the so-called Wonderful Company and anyone else who uses underhanded tactics to buy up water rights to grow crops that shouldn't be grown there.

Come and run your datacenters up in the cold North, you won't even need evaporative cooling for them, just blow a ton of fresh air in....

> How is it useful to you that this tech is being used to manufacture consent?

Now you've actually got an argument, and I am on your side on this one.

	▲	biammer 3 days ago \| parent [-]
		[dead]

▲

ben_w 3 days ago | parent | prev | next [-]

> If at any point any of these releases were "genuine inflection points" it would be unnecessary to proselytize such. It would be self evident. Much like rain.

Agreed.

Now, I suggest reading through all of this to note that I am not a fan of tech bros, that I do want this to be a bubble. Then also note what else I'm saying despite all that.

To me, it is self-evident. The various projects I have created by simply asking for them, are so. I have looked at the source code they produce, and how this has changed over time: Last year I was describing them as "junior" coders, by which I meant "fresh hire"; now, even with the same title, I would say "someone who is just about to stop being a junior".

> "The oppressed need to acknowledge that their oppression is useful to their oppressors."

The capacity for AI to oppress you is in direct relation to its economic value.

The power hunger is in direct proportion to the demand. Someone burning USD 20 to get Claude Code tokens has consumed approximately USD 10 of electricity in that period, with the other USD 10 having been spread between repaying the model training cost and the server construction cost.

The reason they're willing to spend USD 20 is to save at least US 20 worth of dev time. This was already the case with the initial version of ChatGPT pro back in the day, when it could justify that by saving 23 dev minutes per month. There's around a million developers in the USA, just that group increasing electricity spending by USD 10/month will put a massive dent on the USA's power grid.

Gets worse though. Based on my experience, using Claude Code optimally, when you spend USD 20 you get at least 10 junior sprints' worth of output. Hiring a junior for 10 sprints is, what, USD 30,000? The bound here is "are you able to get value from having hired 1,500 juniors for the price of one?"

One can of course also waste those tokens. Both because nobody needs slop, and because most people can't manage one junior never mind 1500 of them.

However, if the economy collectively answers "yes", then the environmental externalities expand until you can't afford to keep your fridge cold or your lights on.

This is one of the failure modes of the technological singularity that people like me have been forewarning about for years, even when there's no alignment issues within the models themselves. Which there are, because Musk's one went and called itself Mecha Hitler, while being so sycophantic about Musk himself that it called him the best at everything even when the thing was "drinking piss", which would be extremely funny if he wasn't selling this to the US military.

> How is it useful to you that this tech is so compute hungry that they are seemingly ending the industry of personal compute to feed this tech's demand?

This will pass. Either this is a bubble, it pops, the manufacturers return to their roots; or it isn't because it works as advertised, which means it leads to much higher growth rates, and we (us, personally, you and me) get personal McKendree cylinders each with more compute than currently exists… or we get turned into the raw materials for those cylinders.

I assume the former. But I say that as one who wants it to be the former.

> How is it useful to you that this tech is so water hungry that it is emptying drinking water acquifers?

Is it what's emptying drinking water acquifers?

The combined water usage of all data centers in Arizona. All of them. Together. Which is over 100 DCs. All of them combined use about double what Tesla was expecting from just the Brandenburg Gigafactory to use before Musk decided to burn his reputation with EV consumers and Europeans for political point scoring.

> How is it useful to you that this tech is being used to manufacture consent?

This is one of the objectively bad things, though it's hard to say if this is more or less competent at this than all the other stuff we had three years ago, given the observed issues with the algorithmic feeds.

▲

biammer 3 days ago | parent [-]

I appreciate you taking the time to write up your thoughts on something other than exclusively these tools 'usefulness' at writing code.

> The capacity for AI to oppress you is in direct relation to its economic value.

I think this assumes a level of rationality in these systems, corporate interests and global markets, that I would push back on as being largely absent.

> The power hunger is in direct proportion to the demand.

Do you think this is entirely the case? I mean, I understand what you are saying, but I would draw stark lines between "company" demand versus "user" demand. I have found many times the 'AI' tools are being thrust into nearly everything regardless of user demand. Spinning its wheels to only ultimately cause frustration. [0]

> Is it what's emptying drinking water aquifers?

It appears this is a problem, and will only continue to be such. [1]

> The combined water usage of all data centers in Arizona. All of them. Together. Which is over 100 DCs. All of them combined use about double what Tesla was expecting from just the Brandenburg Gigafactory to use before Musk decided to burn his reputation with EV consumers and Europeans for political point scoring.

I am unsure if I am getting what your statements here are trying to say. Would you be able to restate this to be more explicit in what you are trying to communicate.

[0] https://news.ycombinator.com/item?id=46493506

[1] https://www.forbes.com/sites/cindygordon/2024/02/25/ai-is-ac...

▲

ben_w 3 days ago | parent [-]

> I think this assumes a level of rationality in these systems, corporate interests and global markets, that I would push back on as being largely absent.

Could be. What I hope and suspect is happening is that these companies are taking a real observation (the economic value that I also observe in software) and falsely expanding this to other domains.

Even to the extent that these work, AI has clearly been over-sold in humanoid robotics and self-driving systems, for example.

> Do you think this is entirely the case? I mean, I understand what you are saying, but I would draw stark lines between "company" demand versus "user" demand. I have found many times the 'AI' tools are being thrust into nearly everything regardless of user demand. Spinning its wheels to only ultimately cause frustration. [0]

I think it is. Companies setting silly goals like everyone must use LLMs once a day or whatever, that won't burn a lot of tokens. Claude Code is available in both subscription mode and PAYG mode, and the cost of subscriptions suggests it is burning millions of tokens a month for the basic subscription.

Other heavy users who we would both agree are bad, are slop content farms. I cannot even guesstimate those, so would be willing to accept the possibility they're huge.

> It appears this is a problem, and will only continue to be such. [1]

I find no reference to "aquifers" in that.

Where it says e.g. "up to 9 liters of water to evaporate per kWh of energy used", the average is 1.9 l/kWh. Also, evaporated water tends to fall nearby (on this scale) as rain, so unless there's now too much water on the surface, this isn't a net change even if it all comes form an aquifer (and I have yet to see any evidence of DCs going for that water source).

It says "The U.S. relies on water-intensive thermoelectric plants for electricity, indirectly increasing data centers' water footprint, with an average of 43.8L/kWh withdrawn for power generation." - most water withdrawn is returned, not consumed.

It says "Already AI's projected water usage could hit 6.6 billion m³ by 2027, signaling a need to tackle its water footprint.", this is less than the famously-a-desert that is Arizona.

> I am unsure if I am getting what your statements here are trying to say. Would you be able to restate this to be more explicit in what you are trying to communicate.

That the water consumption of data centres is much much smaller than the media would have you believe. It's more of a convenient scare story than a reality. If water is your principal concern, give up beef, dairy, cotton, rice, almonds, soy, biofuels, mining, paper, steel, cement, residential lawns, soft drinks, car washing, and hospitals, in approximately that order (assuming the lists I'm reading those from are not invented whole cloth), before you get to data centres.

And again, I don't disagree that they're a problem, it's just that the "water" part of the problem is so low down the list of things to worry about as to be a rounding error.

	▲	biammer 3 days ago \| parent [-]
		> I find no reference to "aquifers" in that. Ahh, I see your objection now. That is my bad. I was using my language too loosely. Here I was using 'aquifer' to mean 'any source of drinking water', but that is certainly different from the intended meaning. > And again, I don't disagree that they're a problem, it's just that the "water" part of the problem is so low down the list of things to worry about as to be a rounding error. I'm skeptical of the rounding error argument, and weary of relying on the logical framework of 'low down the list' when list items' effects stack interdependently. > give up beef, dairy, cotton, rice, almonds, soy, biofuels, mining, paper, steel, cement, residential lawns, soft drinks, car washing, and hospitals In part due to this reason, as well as others, I have stopped directly supporting the industries for: beef, dairy, rice, almonds, soy, biofuels, residential lawns, soft drinks, car washing

▲

QuantumGood 3 days ago | parent | prev [-]

The hype curve is a problem, but it's difficult to prevent. I myself have never made such a prediction. Though it now seems that the money and effort to create working coding tools is near an inflection point.

"It would be self evident." History shows the opposite at inflection points. The "self evident" stage typically comes much later.

▲

spaceman_2020 3 days ago | parent | prev | next [-]

It's a little weird how defensive people are about these tools. Did everyone really think being able to import a few npm packages, string together a few APIs, and run npx create-react-app was something a large number of people could do forever?

The vast majority of coders in employment barely write anything more complex than basic CRUD apps. These jobs were always going to be automated or abstracted away sooner or later.

Every profession changes. Saying that these new tools are useless or won't impact you/xyz devs is just ignoring a repeated historical pattern

▲

stefan_ 3 days ago | parent | next [-]

They made the "abstracted away the CRUD app", it's called Salesforce. Hows that going?

	▲	simonw 3 days ago \| parent [-]
		It's employing so may people who specialize in Salesforce configuration that every year San Francisco collapses under the weight of 50,000+ of them attending Dreamforce. And it's actually kind of amazing, because a lot of people who earn six figures programming Salesforce came to it from a non-traditional software engineering background.

▲

mikestorrent 3 days ago | parent | prev | next [-]

I think perhaps for some folks we're looking at their first professional paradigm shift. If you're a bit older, you've seen (smaller versions of) the same thing happening before as e.g. the Internet gained traction, Web2.0, ecommerce, crypto, etc. and have seen your past skillset become useless as now it can be accomplished for only $10/mo/user.... either you pivot and move on somehow, or you become a curmudgeon. Truly, the latter is optional, and at any point when you find yourself doing that you wish to stop and just embrace the new thing, you're still more than welcome to do so. AI is only going to get EASIER to get involved with, not harder.

▲

wiml 3 days ago | parent | next [-]

And by the same token (ha) for some folks we're looking at their first hype wave. If you're a bit older, you've seen similar things like 4GLs and visual programming languages and blockchain and expert systems. They each left their mark on our profession but most of their promises were unfounded and ultimately unrealized.

	▲	mikestorrent 2 hours ago \| parent [-]
		I like a lot of 4GL ideas. Closest I've come was working on ServiceNow which is sort of a really powerful system with ugly, ugly roots but the idea of your code being the database being the code really resonated with me, as a self-taught programmer. Similarly, Lisp's homoiconicity makes sense to me as a wonderfully aesthetic idea. I remember generating strings-of-text that were code, but still just text, and wishing that I could trivially step into the structure there like it was a map/dict... without realizing that that's what an AST is and what the language compiler / runtime is already always doing.

▲

troupo 3 days ago | parent | prev [-]

Lol. In a few years when the world is awash in AI-generated slop [1] my "past skills" will not only be relevant, they will be actively sought after.

[1] Like the recent "Gas Town" and "Beads" that people keep mentioning in the comments that require extensive scripts/human intervention to purge from the system: https://news.ycombinator.com/item?id=46510121

	▲	mikestorrent 2 hours ago \| parent [-]
		I'm probably the same age as you, and similarly counting on past skills - it's what lets me use AI to produce things that aren't slop.

▲

idiotsecant 3 days ago | parent | prev | next [-]

Agreed, it always seemed a little crazy that you could make wild amounts of money to just write software. I think the music is finally stopping and we'll all have to go back to actually knowing how to do something useful.

▲

ben_w 3 days ago | parent | prev [-]

> The vast majority of coders in employment barely write anything more complex than basic CRUD apps. These jobs were always going to be automated or abstracted away sooner or later.

My experience has been negative progress in this field. On iOS, UIKit in Interface Builder is an order of magnitude faster to write and to debug, with less weird edge cases, than SwiftUI was last summer. I say last summer because I've been less and less interested in iOS the more I learn about liquid glass, even ignoring the whole "aaaaaaa" factor of "has AI made front end irrelevant anyway?" and "can someone please suggest something the AI really can't do so I can get a job in that?"

	▲	marcosdumay 3 days ago \| parent [-]
		The 80s TUI frameworks are still not beaten in developer productivity buy GUI or web frameworks. They have been beaten by GUIs in usability, but then the GUIs reverted into a worse option. Too bad they were mostly proprietary and won't even run in modern hardware.

▲

square_usual 3 days ago | parent | prev | next [-]

You're free to not open these threads, you know!

▲

Workaccount2 3 days ago | parent | prev | next [-]

Democratizing coding so regular people can get the most out of computers is the opposite of oppression. You are mistaking your interests for societies interests.

It's the same with artists who are now pissed that regular people can manifest their artistic ideas without needing to go through an artist or spend years studying the craft. The artists are calling the AI companies oppressors because they are breaking the artist's stranglehold on the market.

It's incredibly ironic how socializing what was a privatized ability has otherwise "socialist" people completely losing their shit. Just the mask of pure virtue slipping...

▲

deergomoo 3 days ago | parent | next [-]

On what planet is concentrating an increasingly high amount of the output of this whole industry on a small handful of megacorps “democratising” anything?

Software development was already one of the most democratised professions on earth. With any old dirt cheap used computer, an internet connection, and enough drive and curiosity you could self-train yourself into a role that could quickly become a high paying job. While they certainly helped, you never needed any formal education or expensive qualifications to excel in this field. How is this better?

▲

Workaccount2 3 days ago | parent | next [-]

Open/local models are available.

Maybe not as good, but they can certainly do far far more than what was available a few years ago.

▲

bsder 3 days ago | parent [-]

The open models don't have access to all the proprietary code that the closed ones have trained on.

That's primarily why I finally had to suck it up and sign up for Claude. Claude clearly can cough up proprietary codebase examples that I otherwise have no access to.

	▲	simonw 3 days ago \| parent [-]
		Given that very few of the "open models" disclose their training data there's no reason at all to assume that the proprietary models have an advantage in terms of training on proprietary data. As far as I can tell the reason OpenAI and Anthropic are ahead in code is that they've invested extremely heavily in figuring out the right reinforcement learning training mix needed to get great coding results. Some of the Chinese open models are already showing signs of catching up.

▲

simonw 3 days ago | parent | prev [-]

It's better because now you can automate something tedious in your life with a computer without having to first climb a six month learning curve.

▲

biammer 3 days ago | parent [-]

> deergomoo: On what planet is concentrating an increasingly high amount of the output of this whole industry on a small handful of megacorps “democratising” anything?

> simonw: It's better because now you can automate something tedious in your life with a computer without having to first climb a six month learning curve.

Completely ignores, or enthusiastically accepts and endorses, the consolidation of production, power, and wealth into a stark few (friends), and claims superiority and increased productivity without evidence?

This may be the most simonw comment I have ever seen.

	▲	simonw 3 days ago \| parent [-]
		At the tail end of 2023 I was deeply worried about consolidation of power, because OpenAI were the only lab with a GPT-4 class model and none of their competitions had produced anything that matched it in the ~8 months since it had launched. I'm not worried about that at all any more. There are dozens of organizations who have achieved that milestone now, and OpenAI aren't even definitively in the lead. A lot of those top-class models are open weight (mainly thanks to the Chinese labs) and available for people to run on their own hardware. I wrote a bunch more about this in my 2024 wrap-up: https://simonwillison.net/2024/Dec/31/llms-in-2024/#the-gpt-...

▲

spaceman_2020 3 days ago | parent | prev | next [-]

I used claude code to set up a bunch of basic tools my wife was using in her daily work. Things like custom pomodoro timers, task managers, todo notes.

She used to log into 3 different websites. Now she just opens localhost:3000 and has all of them on the same page. No emails shared with anyone. All data stored locally.

I could have done this earlier but the time commitment with Claude Code now was writing a spec in 5-minutes and pressing approve a few times vs half a day.

I count this as an absolute win. No privacy breaches, no data sharing.

▲

spacechild1 3 days ago | parent | prev | next [-]

> The artists are calling the AI companies oppressors because they are breaking the artist's stranglehold on the market.

Tt's because these companies profit from all the existing art without compensating the artists. Even worse, they are now putting the very people out of a job who (unwittingly) helped to create these tools in the first place. Not to mention how hurtful it must be for artists seeing their personal style imitated by a machine without their consent.

I totally see how it can empower regular people, but it also empowers the megacorps and bad actors. The jury is still out on whether AI is providing a net positive to society. Until then, let's not ignore the injustice and harm that went into creating these tools and the potential and real dangers that come with it.

▲

biammer 3 days ago | parent | prev | next [-]

When you imagine my position, "I hate these companies for democratizing code/art", then debate that it is called a strawman logical fallacy.

Ascribing the goals of "democratize code/art" onto these companies and their products is called delusion.

I am sure the 3 letter agency directors on these company boards are thrilled you think they left their lifelong careers solely to finally realize their dream to allow you to code and "manifest your artistic ideas".

▲

Workaccount2 3 days ago | parent [-]

Again, open models exist. These companies don't have a monopoly on the tech and they know it.

So maybe celebrate open/private/local models for empowering people rather than selfishly complain about it?

	▲	icedchai 3 days ago \| parent [-]
		Yes, but the quality of output from open/local models isn't anywhere close to what you get from Claude or Gemini. You need serious hardware to get anything approaching decent processing speeds or even middling quality. It's more economical for the average person to spend $20/month on a subscription than it is for them to drop multiple thousands $ and untold hours of time experimenting. Local AI is a fun hobby though.

▲

elzbardico 3 days ago | parent | prev [-]

But people are not creating anything. They are just asking a computer to remix what other people created.

It's incredibly ironic how blatant theft has left otherwise capitalistic people so enthusiastic.

▲

Aurornis 3 days ago | parent | prev | next [-]

> If I am unable to convince you to stop meticulously training the tools of the oppressor (for a fee!) then I just ask you do so quietly.

I'm kind of fascinated by how AI has become such a culture war topic with hyperbole like "tools of the oppressor"

It's equally fascinating how little these comments understand about how LLMs work. Using an LLM for inference (what you do when you use Claude Code) does not train the LLM. It does not learn from your code and integrate it into the model while you use it for inference. I know that breaks the "training the tools of the oppressor" narrative which is probably why it's always ignored. If not ignored, the next step is to decry that the LLM companies are lying and are stealing everyone's code despite saying they don't.

▲

meowkit 3 days ago | parent | next [-]

We are not talking about inference.

The prompts and responses are used as training data. Even if your provider allows you to opt out they are still tracking your usage telemetry and using that to gauge performance. If you don’t own the storage and compute then you are training the tools which will be used to oppress you.

Incredibly naive comment.

▲

Aurornis 3 days ago | parent [-]

> The prompts and responses are used as training data.

They show a clear pop-up where you choose your setting about whether or not to allow data to be used for training. If you don't choose to share it, it's not used.

I mean I guess if someone blindly clicks through everything and clicks "Accept" without clicking the very obvious slider to turn it off, they could be caught off guard.

Assuming everyone who uses Claude is training their LLMs is just wrong, though.

Telemetry data isn't going to extract your codebase.

▲

lukan 3 days ago | parent [-]

"If you don't choose to share it, it's not used"

I am curious where your confidence that this is true, is coming from?

Besides lots of GPU's, training data seems the most valuable asset AI companies have. Sounds like strong incentive to me to secretly use it anyway. Who would really know, if the pipelines are set up in a way, if only very few people are aware of this?

And if it comes out "oh gosh, one of our employees made a misstake".

And they already admitted to train with pirated content. So maybe they learned their lesson .. maybe not, as they are still making money and want to continue to lead the field.

▲

simonw 3 days ago | parent | next [-]

My confidence comes from the following:

1. There are good, ethical people working at these companies. If you were going to train on customer data that you had promised not to train on there would be plenty of potential whistleblowers.

2. The risk involved in training on customer data that you are contractually obliged not to train on is higher than the value you can get from that training data.

3. Every AI lab knows that the second it comes out that they trained on paying customer data saying they wouldn't, those paying customers will leave for their competitors (and sue them int the bargain.)

4. Customer data isn't actually that valuable for training! Great models come from carefully curated training data, not from just pasting in anything you can get your hands on.

Fundamentally I don't think AI labs are stupid, and training on paid customer data that they've agreed not to train on is a stupid thing to do.

▲

RodgerTheGreat 3 days ago | parent | next [-]

1. The people working for these companies are already demonstrably ethically flexible enough to pirate any publicly accessible training data they can get their hands on, including but not limited to ignoring the license information in every repo on GitHub. I'm not impressed with any of these clowns and I wouldn't trust them to take care of a potted cactus.

2. The risk of using "illegal" training data is irrelevant, because no GenAI vendors have been meaningfully punished for violating copyright yet, and in the current political climate they don't expect to be anytime soon. Even so,

3. Presuming they get caught redhanded using personal data without permission- which, given the nature of LLMs would be extremely challenging for any individual customer to prove definitively- they may lose customers, and customers may try to sue, but you can expect those lawsuits to take years to work their way through the courts; long after these companies IPO, employees get their bag, and it all becomes someone else's problem.

4. The idea of using carefully curated datasets is popular rhetoric, but absolutely does not reflect how the biggest GenAI vendors do business. See (1).

AI labs are extremely shortsighted, sloppy, and demonstrably do not care a single iota about the long term when there's money to be made in the short term. Employees have gigantic financial incentives to ignore internal malfeasance or simple ineptitude. The end result is, if anything, far worse than stupidity.

▲

simonw 3 days ago | parent [-]

There is an important difference between openly training on scraped web data and license-ignored data from GitHub and training on data from your paying customers that you promised you wouldn't train on.

Anthropic had to pay $1.5bn after being caught downloading pirated ebooks.

▲

lunar_mycroft 3 days ago | parent [-]

So Anthropic had to pay less than 1% of their valuation despite approximately their entire business being dependent on this and similar piracy. I somehow doubt their takeaway from that is "let's avoid doing that again".

	▲	ben_w a day ago \| parent \| next [-]
		Two things: First: Valuations are based on expected future profits. For a lot of companies, 1% of valuation is ~20% of annual profit (P/E ratio 5); for fast growing companies, or companies where the market is anticipating growth, it can be a lot higher. Weird outlier example here, but consider that if Tesla was fined 1% of its valuation (1% of 1.5 trillion = 15 billion), that would be most of the last four quarter's profit on https://www.macrotrends.net/stocks/charts/TSLA/tesla/gross-p... Second: Part of the Anthropic case was that many of the books they trained on were ones they'd purchased and destructively scanned, not just pirated. The courts found this use was fine, and Anthropic had already done this before being ordered to: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
	▲	simonw 3 days ago \| parent \| prev [-]
		Their main takeaway was that they should legally buy paper books, chop the spines off and scan those for training instead.

▲

lunar_mycroft 3 days ago | parent | prev [-]

Every single point you made is contradicted by the observed behavior of the AI labs. If any of those factors were going to stop them from training on data they legally can't, they would have done so already.

▲

Aurornis 3 days ago | parent | prev | next [-]

> I am curious where your confidence that this is true, is coming from?

My confidence comes from working in big startups and big companies with legal teams. There's no way the entire company is going to gather all of the engineers and everyone around, have them code up a secret system to consume customer data into a secret part of the training set, and then have everyone involved keep quiet about it forever.

The whistleblowing and leaking would happen immediately. We've already seen LLM teams leak and and have people try to whistleblow over things that aren't even real, like the Google engineer who thought they had invented AGI a few years ago (lol). OpenAI had a public meltdown when the employees disagreed with Sam Altman's management style.

So my question to you is: What makes you think they would do this? How do you think they'd coordinate the teams to keep it all a secret and only hire people who would take this secret to their grave?

	▲	lukan 3 days ago \| parent [-]
		"There's no way the entire company is going to gather all of the engineers and everyone around, have them code up a secret system " No, that is why I wrote "Who would really know, if the pipelines are set up in a way, that only very few people are aware of this?" (Typo fixed) There is no need for everyone to know. I don't know their processes, but I can think of ways to only include very few people who need to know. The rest is just working on everything else. Some work with data, where they don't need to know where it came from, some with UI, some with scaling up, some .. they all don't need to know, that the source of DB XYZ comes from a dark source.

▲

theshrike79 2 days ago | parent | prev | next [-]

> I am curious where your confidence that this is true, is coming from?

We have a legal binding contract with Anthropic. Checked and vetted by our laywers, who are annoying because they actually READ the contracts and won't let us use services with suspicious clauses in them - unless we can make amendments.

If they're found to be in breach of said contract (which is what every paid user of Claude signs), Anthropic is going to be the target of SO FUCKING MANY lawsuits even the infinite money hack of AI won't save them.

	▲	lukan a day ago \| parent [-]
		Are you refering to the standard contract/terms of use, or does your company has a special contract made with them?

▲

ben_w 3 days ago | parent | prev [-]

> Besides lots of GPU's, training data seems the most valuable asset AI companies have. Sounds like strong incentive to me to secretly use it anyway. Who would really know, if the pipelines are set up in a way, if only very few people are aware of this?

Could be, but it's a huge risk the moment any lawsuit happens and the "discovery" process starts. Or whistleblowers.

They may well take that risk, they're clearly risk-takers. But it is a risk.

▲

yunwal 3 days ago | parent | next [-]

Eh they’re all using copyrighted training data from torrent sites anyway. If the government was gonna hold them accountable for this it would have happened already.

	▲	ragequittah 3 days ago \| parent \| next [-]
		You're probably right [1] [1]https://www.cbc.ca/news/business/anthropic-ai-copyright-sett...
	▲	ben_w 3 days ago \| parent \| prev [-]
		The piracy was found to be unlawful copyright infringement. The training was OK, but the piracy wasn't, they were held accountable for that.

▲

blibble 3 days ago | parent | prev [-]

the US no longer has any form of rule of law

so there's no risk

	▲	ben_w 3 days ago \| parent \| next [-]
		The USA is a mess that's rapidly getting worse, but it has not yet fallen that far.
	▲	Aurornis 3 days ago \| parent \| prev [-]
		> the US no longer has any form of rule of law AI threads really bring out the extreme hyperbole and doomerism.

▲

biammer 3 days ago | parent | prev [-]

I understand how these LLMs work.

I find it hard to believe there are people who know these companies stole the entire creative output of humanity and egregiously continually scrape the internet are, for some reason, ignoring the data you voluntarily give them.

> I know that breaks the "training the tools of the oppressor" narrative

"Narrative"? This is just reality. In their own words:

> The awards to Anthropic, Google, OpenAI, and xAI – each with a $200M ceiling – will enable the Department to leverage the technology and talent of U.S. frontier AI companies to develop agentic AI workflows across a variety of mission areas. Establishing these partnerships will broaden DoD use of and experience in frontier AI capabilities and increase the ability of these companies to understand and address critical national security needs with the most advanced AI capabilities U.S. industry has to offer. The adoption of AI is transforming the Department’s ability to support our warfighters and maintain strategic advantage over our adversaries [0]

Is 'warfighting adversaries' some convoluted code for allowing Aurornis to 'see a 1337x in productivity'?

Or perhaps you are a wealthy westerner of a racial and sexual majority and as such have felt little by way of oppression by this tech?

In such a case I would encourage you to develop empathy, or at least sympathy.

> Using an LLM for inference .. does not train the LLM.

In their own words:

> One of the most useful and promising features of AI models is that they can improve over time. We continuously improve our models through research breakthroughs as well as exposure to real-world problems and data. When you share your content with us, it helps our models become more accurate and better at solving your specific problems and it also helps improve their general capabilities and safety. We do not use your content to market our services or create advertising profiles of you—we use it to make our models more helpful. ChatGPT, for instance, improves by further training on the conversations people have with it, unless you opt out.

[0] https://www.ai.mil/latest/news-press/pr-view/article/4242822...

[1] https://help.openai.com/en/articles/5722486-how-your-data-is...

▲

ben_w 3 days ago | parent [-]

> Is 'warfighting adversaries' some convoluted code for allowing Aurornis to 'see a 1337x in productivity'?

Much as I despair at the current developments in the USA, and I say this as a sexual minority and a European, this is not "tools of the oppressor" in their own words.

Trump is extremely blunt about who he wants to oppress. So is Musk.

"Support our warfighters and maintain strategic advantage over our adversaries" is not blunt, it is the minimum baseline for any nation with assets anyone else might want to annex, which is basically anywhere except Nauru, North Sentinel Island, and Bir Tawil.

▲

biammer 3 days ago | parent [-]

> "Support our warfighters and maintain strategic advantage over our adversaries" is not blunt, it is the minimum baseline for any nation with assets anyone else might want to annex

I think its gross to distill military violence as defending 'assets [others] might want to annex'.

What US assets were being annexed when US AI was used to target Gazans?

https://apnews.com/article/israel-palestinians-ai-technology...

> Trump is extremely blunt about who he wants to oppress. So is Musk.

> our adversaries" is not blunt

These two thoughts seem at conflict.

What 'assets' were being protected from annexation here by this oppressive use of the tool? The chips?

https://www.aclu.org/news/privacy-technology/doritos-or-gun

	▲	ben_w 3 days ago \| parent \| next [-]
		> I think its gross to distill military violence as defending 'assets [others] might want to annex'. Yes, but that's how the world works: Another country wants a bit of your country for some reason, they can take it by force unless you can make at the very least a credible threat against them, sometimes a lot more than that. Note that this does not exclude that there has to be an aggressor somewhere. I'm not excluding the existence of aggressors, nor the capacity for the USA to be an aggressor. All I'm saying is your quotation is so vague as to also encompass those who are not. > What US assets were being annexed when US AI was used to target Gazans? First, I'm saying the statement is so broad as to encompass other things besides being a warmonger. Consider the opposite statement: "don't support our warfighters and don't maintain strategic advantage over our adversaries" would be absolutely insane, therefore "support our warfighters and maintain strategic advantage over our adversaries" says nothing. Second, in this case the country doing the targeting is… Israel. To the extent that the USA cares at all, it's to get votes from the large number of Jewish people living in the USA. Similar deal with how it treats Cuba since the fall of the USSR: it's about votes (from Cuban exiles in that case, but still, votes). Much as I agree that the conduct of Israel with regard to Gaza was disproportionate, exceeded the necessity, and likely was so bad as to even damage Israel's long-term strategic security, if you were to correctly imagine the people of Israel deciding "don't support our warfighters and don't maintain strategic advantage over our adversaries", they would quickly get victimised much harder than those they were victimising. That's the point there: the quote you cite as evidence, is so broad that everyone has approximately that, because not having it means facing ones' own destruction. There's a mis-attributed quote, "People sleep peaceably in their beds at night because rough men stand ready to do violence on their behalf", that's where this is at. > These two thoughts seem at conflict. Musk is openly and directly saying "Canada is not a real country.", says "cis" is hate speech, response to pandemic was tweeting "My pronouns are Prosecute/Fauci.", and self-justification for his trillion dollar bonus for hitting future targets is wanting to be in control of what he describes as a "robot army"; Trump openly and explicitly wants the USA to annex Canada, Greenland, Panama canal, is throwing around the national guard, openly calls critics traitors and calls for death penalty. They're a subtle as exploding volcanoes, nobody needs to take the worst case interpretations of what they're saying to notice this. Saying "support our warfighters" is something done by basically every nation everywhere all the time, because those places that don't do this quickly get taken over by nearby nations who sense weakness. Which is kinda how the USA got Texas, because again, I'm not saying the USA is harmless, I'm saying the quote doesn't show that. > What 'assets' were being protected from annexation here by this oppressive use of the tool? The chips? This would have been a much better example to lead with than the military stuff. I'm absolutely all on board with the general consensus that the US police are bastards in this specific way, have been since that kid got shot for having a toy gun in an open-carry state. (I am originally from a country where even the police are not routinely armed, I do not value the 2nd amendment, but if you're going to say "we allow open carry of firearms" you absolutely do not get to use "we saw someone carrying a firearm" as an excuse to shoot them). However: using LLMs to code doesn't seem to be likely to make a difference either way for this. If I was writing a gun-detection AI, perhaps I'm out of date, but I'd use a simpler model that runs locally on-device and doesn't do anything else besides the sales pitch.
	▲	cindyllm 3 days ago \| parent \| prev [-]
		[dead]

▲

Gud 3 days ago | parent | prev [-]

Frankly, in this comment thread you appear to be the oppressor.

▲

goatlover 3 days ago | parent | next [-]

Who is the parent oppressing? Making a comment and companies looking to automate labor are a little bit different. One might disagree that automation is oppressive or whatever goals the major tech CEOs have in developing AIs (surveillance, influencing politics, increasing wealth gap), but certainly commenting that they are oppressive is not the same thing.

▲

biammer 3 days ago | parent | prev [-]

[flagged]

▲

santoshalper 3 days ago | parent [-]

[flagged]

	▲	biammer 3 days ago \| parent \| next [-]
		> Why are you afraid of using your real account Careful with being blindly led by your own assumptions. I actually disagree with your thesis here. I think if every comment was posted under a new account this site would improve its average veracity. As it stands certain 'celebrity', or high karma, accounts are artificially bolstered by the network effect indifferent to the defensibility of their claims.
	▲	justinclift 3 days ago \| parent \| prev [-]
		Please don't go down the path of making personal attacks.

▲ animegolem 3 days ago | parent | prev [-]

I know someone who is using a vibe coded or at least heavily assisted text editor, praising it daily, while also saying llms will never be productive. There is a lot of dissonance right now.

▲ enum 3 days ago | parent | prev | next [-]

I teach at a university, and spend plenty of time programming for research and for fun. Like many others, I spent some time on the holidays trying to push the current generation of Cursor, Claude Code, and Codex as far as I could. (They're all very good.)

I had an idea for something that I wanted, and in five scattered hours, I got it good enough to use. I'm thinking about it in a few different ways:

1. I estimate I could have done it without AI with 2 weeks full-time effort. (Full-time defined as >> 40 hours / week.)

2. I have too many other things to do that are purportedly more important that programming. I really can't dedicate to two weeks full-time to a "nice to have" project. So, without AI, I wouldn't have done it at all.

3. I could hire someone to do it for me. At the university, those are students. From experience with lots of advising, a top-tier undergraduate student could have achieved the same thing, had they worked full tilt for a semester (before LLMs). This of course assumes that I'm meeting them every week.

▲

realusername 3 days ago | parent | next [-]

This is where the LLM coding shines in my opinion, there's a list of things they are doing very well:

- single scripts. Anything which can be reduced to a single script.

- starting greenfield projects from scratch

- code maintenance (package upgrades, old code...)

- tasks which have a very clear and single definition. This isn't linked to complexity, some tasks can be both very complex but with a single definition.

If your work falls into this list they will do some amazing work (and yours clearly fits that), if it doesn't though, prepare yourself because it will be painful.

▲

enum 3 days ago | parent [-]

I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code. I've done enough of that with LLMs, though not in large codebases.

I should say I'm hardly ever vibe-coding, unlike the original article. If I think I want code that will last, I'll steer the models in ways that lean on years of non-LLM experience. E.g., I'll reject results that might work if they violate my taste in code.

It also helps that I can read code very fast. I estimate I can read code 100x faster than most students. I'm not sure there is any way to teach that other than the old-fashioned way, which involves reading (and writing) a lot of code.

	▲	realusername 3 days ago \| parent [-]
		> I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code Yes indeed, these are the things on the other hand which aren't working well in my opinion: - large codebase - complex domain knowledge - creating any feature where you need product insights - tasks requiring choices (again, complexity doesn't matter here, the task may be simple but require some choices) - anything unclear where you don't know where you are going first While you don't experience any of these when teaching or side projects, these are very common in any enterprise context.

▲

vercaemert 3 days ago | parent | prev | next [-]

How do you compare Claude Code to Cursor? I'm a Cursor user quietly watching the CC parade with curiosity. Personally, I haven't been able to give up the IDE experience.

	▲	kaydub 2 days ago \| parent \| next [-]
		Im so sold on the cli tools that I think IDEs are basically dead to me. I only have an IDE open so I can read the code, but most often I'm just changing configs (like switching a bool, or bumping up a limit or something like that). Seriously, I have 3+ claude code windows open at a time. Most days I don't even look at the IDE. It's still there running in the background, but I don't need to touch it.
	▲	tstrimple 2 days ago \| parent \| prev \| next [-]
		I use CC for so much more than just writing code that I cannot imagine being constrained within an IDE. Why would I want to launch an IDE to have CC update the arr stack on my NAS to the latest versions for example? Last week I pointed CC at some media files that weren't playing correctly on my Apple TV. It detected what the problem formats were and updated my arr download rules to prefer other releases and then configured tdarr to re-encode problem files in my existing library.
	▲	lizardking 3 days ago \| parent \| prev \| next [-]
		When I'm using Claude Code, I usually have a text editor open as well. The CC plugin works well enough to achieve most of what Cursor was doing for me in showing real-time diffs, but in my experience, the output is better and faster. YMMV
	▲	subomi 3 days ago \| parent \| prev \| next [-]
		I was here a few weeks ago, but I'm now on the CC train. The challenge is that the terminal is quite counterintuitive. But if you put on the Linux terminal lens from a few years ago, and you start using it. It starts to make sense. The form factor of the terminal isn't intuitive for programming, but it's the ultimate. FYI, I still use cursor for small edits and reviews.
	▲	enum 3 days ago \| parent \| prev \| next [-]
		I don't think I can scientifically compare the agents. As it is, you can use Opus / Codex in Cursor. The speed of Cursor composer-1 is phenomenal -- you can use it interactively for many tasks. There are also tasks that are not easier to describe in English, but you can tab through them.
	▲	smw 3 days ago \| parent \| prev [-]
		Just FYI, these days cc has 'ide integration' too, it's not just a cli. Grab the vscode extension.

▲

franktankbank 3 days ago | parent | prev [-]

What did you build? I think people talk passed eachother when people don't share what exactly they were trying to do and achieving success/failure.

	▲	enum 3 days ago \| parent [-]
		Referring to this: https://github.com/arjunguha/slopcoder I then proceeded to use it to hack on its own codebase, and close a bunch of issues in a repository that I maintain (https://github.com/nuprl/MultiPL-E/commits/main/).

▲ TacticalCoder 3 days ago | parent | prev | next [-]

> Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code.

Nobody is sleeping. I'm using LLMs daily to help me in simple coding tasks.

But really where is the hurry? At this point not a few weeks go by without the next best thing since sliced bread to come out. Why would I bother "learning" (and there's really nothing to learn here) some tool/workflow that is already outdated by the time it comes out?

> 2026 is going to be a wake-up call

Do you honestly think a developer not using AI won't be able to adapt to a LLM workflow in, say, 2028 or 2029? It has to be 2026 or... What exactly?

There is literally no hurry.

You're using the equivalent of the first portable CD-player in the 80s: it was huge, clunky, had hiccups, had a huge battery attached to it. It was shiny though, for those who find new things shiny. Others are waiting for a portable CD player that is slim, that buffers, that works fine. And you're saying that people won't be able to learn how to put a CD in a slim CD player because they didn't use a clunky one first.

	▲	simonw 3 days ago \| parent \| next [-]
		I think getting proficient at using coding agents effectively takes a few months of practice. It's also a skill that compounds over time, so if you have two years of experience with them you'll be able to use them more effectively than someone with two months of experience. In that respect, they're just normal technology. A Python programmer with two years of Python experience will be more effective than a programmer with two months of Python.
	▲	vidarh a day ago \| parent \| prev \| next [-]
		> Nobody is sleeping. I'm using LLMs daily to help me in simple coding tasks. That is sleeping. > But really where is the hurry? At this point not a few weeks go by without the next best thing since sliced bread to come out. Why would I bother "learning" (and there's really nothing to learn here) some tool/workflow that is already outdated by the time it comes out? You're jumping to conclusions that haven't been justified by any of the development in this space. The learning compounds. > Do you honestly think a developer not using AI won't be able to adapt to a LLM workflow in, say, 2028 or 2029? It has to be 2026 or... What exactly? They will, but they'll be competing against people with 2-3 more years of experience in understanding how to leverage these tools.
	▲	jasonfarnon 3 days ago \| parent \| prev [-]
		"But really where is the hurry?" It just depends on why you're programming. For many of us not learning and using up to date products leads to a disadvantage relative to our competition. I personally would very much rather go back to a world without AI, but we're forced to adapt. I didn't like when pagers/cell phones came out either, but it became clear very quickly not having one put me at a disadvantage at work.

▲ BatteryMountain 3 days ago | parent | prev | next [-]

The crazy part is, once you have it setup and adapted your workflow, you start to notice all sorts of other "small" things:

claude can call ssh and do system admin tasks. It works amazingly well. I have 3 VM's, which depends on each other (proxmox with openwrt, adguard, unbound), and claude can prove to me that my dns chains works perfectly, my firewalls are perfect etc as claude can ssh into each. Setting up services, diagnosing issues, auditing configs... you name it. Just awesome.

claude can call other sh scripts on the machine, so over time, you can create a bunch of scripts that lets claude one shot certain tasks that would normally eat tokens. It works great. One script per intention - don't have a script do more than one thing.

claude can call the compiler, run the debug executable and read the debug logs.. in real time. So claude can read my android apps debug stream via adb.. or my C# debug console because claude calls the compiler, not me. Just ask it to do it and it will diagnose stuff really quickly.

It can also analyze your db tables (give it readonly sql access), look at the application code and queries, and diagnose performance issues.

The opportunities are endless here. People need to wake up to this.

	▲	vidarh a day ago \| parent \| next [-]
		> claude can call ssh and do system admin tasks Claude set up a Raspberry Pi with a display and conference audio device for me to use as an Alexa replacement tied to Home Assistant. I gave it an ssh key and gave it root. Then I told it what I wanted, and it did. It asked for me to confirm certain things, like what I could see on screen, whether I could hear the TTS etc. (it was a bit of a surprise when it was suddenly talking to me while I was minding my own business). It configured everything, while keeping a meticulous log that I can point it at if I want to set up another device, and eventually turn into a runbook if I need to.
	▲	3 days ago \| parent \| prev \| next [-]
		[deleted]
	▲	theshrike79 2 days ago \| parent \| prev [-]
		I have a /fix-ci-build slash command that instructs Claude how to use `gh` to get the latest build from that specific project's Github Actions and get the logs for the build In addition there are instructions on how and where to push the possible fixes and how to check the results. I've yet to encounter a build failure it couldn't fix automatically.

▲ Loeffelmann 3 days ago | parent | prev | next [-]

Why do all these AI generated readmes have a directory structure sections it's so redundant because you know I could just run tree

	▲	sonnig 3 days ago \| parent \| next [-]
		It makes me so exhausted trying to read them... my brain can tell immediately when there's so much redundant information that it just starts shutting itself off.
	▲	bakies 3 days ago \| parent \| prev [-]
		comments? also reading into an agent so the agent doesnt have to tool-call/bash out

▲ 6177c40f 3 days ago | parent | prev | next [-]

I think we're entering a world where programmers as such won't really exist (except perhaps in certain niches). Being able to program (and read code, in particular) will probably remain useful, though diminished in value. What will matter more is your ability to actually create things, using whatever tools are necessary and available, and have them actually be useful. Which, in a way, is the same as it ever was. There's just less indirection involved now.

▲

wiml 3 days ago | parent | next [-]

We've been living in that world since the invention of the compiler ("automatic programming"). Few people write machine code any more. If you think of LLMs as a new variety of compiler, a lot of their shortcomings are easier to describe.

▲

qwm 3 days ago | parent | next [-]

My compiler runs on my computer and produces the same machine code given the same input. Neither of these are true with AI.

	▲	wiml 2 days ago \| parent [-]
		You can run an LLM locally (and distributed compile systems, where the compiler runs in the cloud, are a thing, too) so that doesn't really produce a distinction between the two. Likewise, many optimization techniques involve some randomness, whether it's approximating an NP-thorny subproblem, or using PGO guided by statistical sampling. People might disable those in pursuit of reproducible builds, but no one would claim that enabling those features makes GCC or LLVM no longer a compiler. So nondeterminism isn't really the distinguishing factor either.

▲

bdangubic 3 days ago | parent | prev | next [-]

last thing I want is non-deterministic compiler, do not vibe this analogy at all…

▲

moffkalast 2 days ago | parent | prev [-]

Finally we've invented a compiler that we can yell at when it gives bullshit errors. I really missed that with gcc.

▲

pseidemann 3 days ago | parent | prev [-]

Isn't there more indirection as long as LLMs use "human" programming languages?

▲

xarope 3 days ago | parent | next [-]

If you think of the training data, e.g. SO, github etc, then you have a human asking or describing a problem, then the code as the solution. So I suspect current-gen LLMs are still following this model, which means for the forseeable future a human like language prompt will still be the best.

Until such time, of course, when LLMs are eating their own dogfood, in which case they - as has already happened - create their own language, evolve dramatically, and cue skynet.

▲

6177c40f 3 days ago | parent | prev | next [-]

More indirection in the sense that there's a layer between you and the code, sure. Less in that the code doesn't really matter as such and you're not having to think hard about the minutiae of programming in order to make something you want. It's very possible that "AI-oriented" programming languages will become the standard eventually (at least for new projects).

	▲	recursive 2 days ago \| parent [-]
		One benefit of conventional code is that it expresses logic in an unambiguous way. Much of "the minutiae" is deciding what happens in edge cases. It's even harder to express that in a human language than in computer languages. For some domains it probably doesn't matter.

▲

layer8 3 days ago | parent | prev [-]

It’s not clear how affordances of programming languages really differ between humans and LLMs.

▲ Yoric 3 days ago | parent | prev | next [-]

You intrigue me.

> have it learn your conventions, pull in best practices

What do you mean by "have it learn your conventions"? Is there a way to somehow automatically extract your conventions and store it within CLAUDE.md?

> For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.

Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?

▲

ac29 3 days ago | parent | next [-]

> What do you mean by "have it learn your conventions"?

I'll give you an example: I use ruff to format my python code, which has an opinionated way of formatting certain things. After an initial formatting, Opus 4.5, without prompting, will write code in this same style so that the ruff formatter almost never has anything to do on new commits. Sonnet 4.5 is actually pretty good at this too.

▲

UncleMeat 3 days ago | parent [-]

Isn't this a meaningless example? Formatters already exist. Generating code that doesn't need to be formatted is exactly the same as generating code and then formatting it.

I care about the norms in my codebase that can't be automatically enforced by machine. How is state managed? How are end-to-end tests written to minimize change detectors? When is it appropriate to log something?

▲

eterm 3 days ago | parent | next [-]

Here's an example:

We have some tests in "GIVEN WHEN THEN" style, and others in other styles. Opus will try to match each style of testing by the project it is in by reading adjacent tests.

	▲	vidarh a day ago \| parent [-]
		The one caveat with this, is that in messy code bases it will perpetuate bad things, unless you're specific about what you want. Then again, human developers will often do the same and are much harder to force to follow new conventions.

▲

gck1 2 days ago | parent | prev | next [-]

The second part is what I'd also like to have.

But I think it should be doable. You can tell it how YOU want the state to be managed and then have it write a custom "linter" that makes the check deterministic. I haven't tried this myself, but claude did create some custom clippy scripts in rust when I wanted to enforce something that isn't automatically enforced by anything out there.

	▲	UncleMeat 2 days ago \| parent [-]
		Lints are typically well suited for syntactic properties or some local semantic properties. Almost all interesting challenges in software design and evolution involve nonlocal semantic properties.

▲

scotty79 2 days ago | parent | prev [-]

Memes write themselves.

"AI has X"

"We have X at home"

"X at home: x"

▲

gingersnap 3 days ago | parent | prev | next [-]

Starting to use Opus 4.5 I'm reduces instrutions in claude.md and just ask claude to look in the codebase to understand the patterns already in use. Going from prompts/docs to instead having code being the "truth". Show don't tell. I've found this patterns has made a huge leap with Opus 4.5.

	▲	zoilism 3 days ago \| parent \| next [-]
		The Ash framework takes the approach you describe. From the docs (https://hexdocs.pm/ash/what-is-ash.html): "Model your application's behavior first, as data, and derive everything else automatically. Ash resources center around actions that represent domain logic."
	▲	kaydub 2 days ago \| parent \| prev [-]
		I feel like I've been doing this since Sonnet 3.5 or Sonnet 4. I'll clone projects/modules/whatever into the working directory and tell claude to check it out. Voila, now it knows your standards and conventions.

▲

vidarh a day ago | parent | prev | next [-]

Just ask it to.

/init in Claude Code already automatically extracts a bunch, but for something more comprehensive, just tell it which additional types of things you want it to look for and document.

> Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?

I don't know about the person above, but I tell Claude to write all my skills and agents for me. With some caveats, you can do this iteratively in a single session ("update the X agent, then re-run it. Repeat until it reliably does Y")

▲

oncallthrow 3 days ago | parent | prev | next [-]

When I ask Claude to do something, it independently, without me even asking or instructing it to, searches the codebase to understand what the convention is.

I’ve even found it searching node_modules to find the API of non-public libraries.

▲

jack_pp 3 days ago | parent [-]

This sounds like it would take a huge amount of tokens. I've never used agents so could you disclose how much you pay for it?

▲

garblegarble 3 days ago | parent | next [-]

If they're using Opus then it'll be the $100/month Claude Max 5x plan (could be the more expensive 20x plan depending on how intensive their use is). It does consume a lot of tokens, but I've been using the $100/mo plan and get a lot done without hitting limits. It helps to be mindful of context (regularly amending/pruning your CLAUDE.md instructions, clearing context between tasks, sizing your tasks to stay within the Opus context window). Claude Code plans have token limits that work in 5-hour blocks (that start when you send your first token, so it's often useful to prime it as early in the morning as possible).

Claude Code will spawn sub-agents (that often use their cheap Haiki model) for exploration and planning tasks, with only the results imported into the main context.

I've found the best results from a more interactive collaboration with Claude Code. As long as you describe the problem clearly, it does a good job on small/moderate tasks. I generally set two instances of Claude Code separate tasks and run them concurrently (the interaction with Claude Code distracts me too much to do my own independent coding simultaneously like with setting a task for a colleague, but I do work on architecture / planning tasks)

The one manner of taste that I have had to compromise on is the sheer amount of code - it likes to write a lot of code. I have a better experience if I sweat the low-level code less, and just periodically have it clean up areas where I think it's written too much / too repetitive code.

As you give it more freedom it's more prone to failure (and can often get itself stuck in a fruitless spiral) - however as you use it more you get a sense of what it can do independently and what's likely to choke on. A codebase with good human-designed unit & playwright tests is very good.

Crucially, you get the best results where your tasks are complex but on the menial side of the spectrum - it can pay attention to a lot of details, but on the whole don't expect it to do great on senior-level tasks.

To give you an idea, in a little over a month "npx ccusage" shows that via my Claude Code 5x sub I've used 5M input tokens, 1.5M output, 121M Cache Create, 1.7B Cache Read. Estimated pay-as-you-go API cost equivalent is $1500 (N.B. for the tail end of December they doubled everybody's API limits, so I was using a lot more tokens on more experimental on-the-fly tool construction work)

▲

NiloCK 3 days ago | parent [-]

FYI Opus is available and pretty usable in claude-code on the $20/Mo plan if you are at all judicious.

I exclusively use opus for architecture / speccing, and then mostly Sonnet and occasionally Haiku to write the code. If my usage has been light and the code isn't too straightforward, I'll have Opus write code as well.

▲

garblegarble 3 days ago | parent [-]

That's helpful to know, thanks! I gave Max 5x a go and didn't look back. My suspicion is that Opus 4.5 is subsidised, so good to know there's flexibility if prices go up.

	▲	baq 3 days ago \| parent [-]
		The $20 plan for CC is good enough for 10-20 minutes of opus every 5h and you’ll be out of your weekly limit after 4-5 days if you sleep during the night. I wouldn’t be surprised if Anthropic actually makes a profit here. (Yeah probably not, but they aren’t burning cash.)

▲

vidarh a day ago | parent | prev [-]

I use the $200/month Claude Code plan, and in the last week I've had it generate about half a million words of documentation without hitting any session limits.

I have hit the weekly limit before, briefly, but that took running multiple sessions in parallel continuously for many days.

▲

kaydub 2 days ago | parent | prev [-]

"Claude, clone this repo https://github.com/repo, review the coding conventions, check out any markdown or readme files. This is an example of coding conventions we want to use on this project"

▲ maxkfranz 2 days ago | parent | prev | next [-]

> Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time.

I agree with this, but I haven't needed to use any advanced features to get good results. I think the simple approach gets you most of the benefits. Broadly, I just have markdown files in the repo written for a human dev audience that the agent can also use.

Basically:

- README.md with a quick start section for devs, descriptions of all build targets and tests, etc. Normal stuff.

- AGENTS.md (only file that's not written for people specifically) that just describes the overall directory structure and has a short step of instructions for the agent: (1) Always read the readme before you start. (2) Always read the relevant design docs before you start. (3) Always run the linter, a build, and tests whenever you make code changes.

- docs/*.md that contain design docs, architecture docs, and user stories, just text. It's important to have these resources anyway, agent or no.

As with human devs, the better the docs/requirements the better the results.

▲

vidarh a day ago | parent [-]

I'd really encourage you to try using agents for tasks that are repeatable and/or wordy but where most of the words are not relevant for ongoing understanding.

It's a tiny step further, and sub-agents provide a massive benefit the moment you're ready to trust the model even a little bit (relax permissions to not have it prompt you for every little thing; review before committing rather than on every file edit) because they limit what goes into the top level context, and can let the model work unassisted for far longer. I now regularly have it run for hours at a time without stopping.

Running and acting on output from the linter is absolutely an example of that which matters even for much shorter runs.

There's no reason to have all the lint output "polluting" the top level context, nor to have the steps the agent needs to take to fix linter issues that can't be auto-fixed by the linter itself. The top level agent should only need to care about whether the linter run passed or failed (and should know it needs to re-run and possibly investigate if it fails).

Just type /agents, select "Create new agent" and describe a task you often do, and then forget about it (or ask Claude to make changes to it for you)

	▲	maxkfranz a day ago \| parent [-]
		That's a great point. There are a lot of things you can do to optimise things, and your suggestion is one of the lower hanging fruits. I was trying to get across the point that today you can get a lot of benefit from minimal setup, even one that's vendor-agnostic. (The steps I outlined work for Codex out of the box, too.) You're right to point out that the more you refine things, the more you'll get out of the tools. It used to be that you had to do a lot of refinements to start getting good results at all. Now, you can get a lot out of even a basic setup like I outlined, which is great for people who are new users -- or people who tried it before and weren't that impressed but are now giving it another try.

▲ dmbche 3 days ago | parent | prev | next [-]

Oh! An ad!

▲

savanaly 3 days ago | parent | next [-]

The most effective kind of marketing is viral word of mouth from users who love your product. And Claude Code is benefiting from that dynamic.

▲

OldGreenYodaGPT 3 days ago | parent | prev [-]

lol does sound like and ad, but is true. Also forgot about hooks use hooks too! I just use voice to text then had claude reword it. Still my real world ideas

	▲	Rapzid 2 days ago \| parent [-]
		Exactly what an ad would say.

▲ nijave a day ago | parent | prev | next [-]

I still struggle with these things being _too_ good at generating code. They have a tendency to add abstractions, classes, wrappers, factories, builders to things that didn't really need all that. I find they spit out 6 files worth of code for something that really only needed 2-3 and I'm spending time going back through simplifying.

There are times those extra layers are worth it but it seems LLMs have a bias to add them prematurely and overcomplicate things. You then end up with extra complexity you didn't need.

▲ majormajor 3 days ago | parent | prev | next [-]

All of these things work very well IMO in a professional context.

Especially if you're in a place where a lot of time was spent previously revising PRs for best practices, etc, even for human-submitted code, then having the LLM do that for you that saves a bunch of time. Most humans are bad at following those super-well.

There's a lot of stuff where I'm pretty sure I'm up to at least 2x speed now. And for things like making CLI tools or bash scripts, 10x-20x. But in terms of "the overall output of my day job in total", probably more like 1.5x.

But I think we will need a couple major leaps in tooling - probably deterministic tooling, not LLM tooling - before anyone could responsibly ship code nobody has ever read in situations with millions of dollars on the line (which is different from vibe-coding something that ends up making millions - that's a low-risk-high-reward situation, where big bets on doing things fast make sense. if you're already making millions, dramatic changes like that can become high-risk-low-reward very quickly. In those companies, "I know that only touching these files is 99.99% likely to be completely safe for security-critical functionality" and similar "obvious" intuition makes up for the lack of ability to exhaustively test software in a practical way (even with fuzzers and things), and "i didn't even look at the code" is conceding responsibility to a dangerous degree there.)

▲ keybored 3 days ago | parent | prev | next [-]

> (used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)

Reword? But why not just voice to text alone...

Oh but we all read the partially synthetic ad by this point. Psyche.

▲ jdthedisciple 3 days ago | parent | prev | next [-]

I'm curious: With that much Claude Code usage, does that put your monthly Anthropic bill above $1000/mo?

▲ hoten 3 days ago | parent | prev | next [-]

Mind sharing the bill for all that?

▲

OldGreenYodaGPT 3 days ago | parent | next [-]

My company pays for the team Claude code plan which is like $200 a month for each dev. The workflows cost like 10 - 50 cents a PR

▲

blahblaher 3 days ago | parent | next [-]

It will have to quintuple or more to make business sense for Anthropic. Sure, still cheaper than a full time developer, but don't expect it to stay at $200 for a long time. And then, when you explain to your boss how amazing it is, and can do all this work so easily and quickly, it's when your boss start asking the real question: what am I paying you for?

▲

benjiro 3 days ago | parent | next [-]

A programmer, if we use US standards is probably $8000 per month. If you can get 30% more value out of that programmer (trust me, its WAY more then 30%), you gained $2400 of value. If you pay $200, $500, $1000 for that, its still a net positive. Ignoring the salary range of a actual senior...

LLMs do not result in bosses firing people, it results in more projects / faster completed projects, what in turn means more $$$ for a company.

▲

bonesss 3 days ago | parent | prev | next [-]

More fundamentally: assume a 10 to 30% bump in actual productivity, find a niche (editing software, CRUD frameworks, SharePoint 2.0, stock trading, betting, whatever), and assume you had Anthropics billions or openAIs billions or Microsoft’s billions or Googles billions.

Why on earth would you be hunting $20 a month subscriptions from random assed people? Peanuts.

Lockheed-Martin could be, but isn’t, opening lemonade stands outside their offices… they don’t because of how buying a Ferrari works.

	▲	theshrike79 2 days ago \| parent \| next [-]
		> Why on earth would you be hunting $20 a month subscriptions from random assed people? Peanuts. For the same reason Microsoft never has and never will chase people for pirating home Windows or Office licenses When they hit the workforce, or even better, start a company guess which OS and office suite they'll use? Hint: It's not Linux and Openoffice. Same with Claude's $20 package. It lets devs use it at home and then compare it to the Copilot shit their company is pushing on them. Maybe they either grumble enough to get a Claude license or they're in a position to make the call. Cheap advertising pretty much. Worked for me too :) I've paid my own Claude license for over a year at home, grumbled at work and we got a Claude pilot going now - and everyone who's tried it so far isn't going back to Copilot + Sonnet 4.5/GPT5.
	▲	whattheheckheck 3 days ago \| parent \| prev [-]
		They data farming your intelligence

▲

HDThoreaun 3 days ago | parent | prev | next [-]

Im not sure about this. What they really need is to get rid of the free tier and widespread adoption. Inference on the $200 plan seems to be profitable right now so they just need more users to amortize training costs.

▲

senordevnyc 3 days ago | parent | prev [-]

All the evidence suggests that inference is quite profitable actually.

▲

square_usual 3 days ago | parent | prev [-]

It's $150, not a huge difference but worth noting that it's not the same ast the 20x Max plan.

▲

6177c40f 3 days ago | parent | prev | next [-]

Cheaper than hiring another developer, probably. My experience: for a few dollars I was able to extensively refactor a Python codebase in half a day. This otherwise would have taken multiple days of very tedious work.

▲

blahblaher 3 days ago | parent [-]

And that's what the C-suite wants to know. Prepare yourself to be replaced in the not so distant future. Hope you have a good "nest" to support yourself when you're inevitably fired.

▲

benjiro 3 days ago | parent | next [-]

> Prepare yourself to be replaced in the not so distant future.

Ignoring that this same developer, now has access to a tool, that makes himself a team.

Going independent was always a issue because being a full stack dev, is hard. With LLMs, you have a entire team behind you for making graphics, code, documents, etc... YOU becomes the manager.

We will see probably a lot more smaller teams/single devs making bigger projects, until they grow.

The companies that think they can fire devs, are the same companies that are going to go too far, and burn bridges. Do not forget that a lot of companies are founded on devs leaving a company, and starting out on their own, taking clients with them!

I did that years ago, and it worked for a while but eventually the math does not work out because one guy can only do so much. And when you start hiring, your costs balloon. But with LLMs ... Now your a one man team, ... hiring a second person is not hiring a person to make some graphics or doing more coding. Your hiring another team.

This is what people do not realize... they look too much upon this as the established order, ignoring what those fired devs now can do!

▲

icedchai 3 days ago | parent [-]

This sounds nice, except for the fact that almost everyone else can do this, too. Or at least try to, resulting in a fast race to the bottom.

Do you really want to be a middle manager to a bunch of text boxes, churning out slop, while they drive up our power bills and slowly terraform the planet?

▲

cakealert 3 days ago | parent | next [-]

The same way that having motorized farming equipment was a race to the bottom for farmers? Perhaps. Turned out to be a good outcome for most involved.

Just like farmers who couldn't cope with the additional leverage their equipment provided them, devs who can't leverage this technology will have to "go to the cities".

	▲	encyclopedism 2 days ago \| parent \| next [-]
		Please do read up on how farmers are doing with this race to the bottom (it hasn't been pretty). Mega farms are a thing because small farms simply can't compete. Small farmers have gone broke. The parent comment is trying to highlight this. If LLM's turn out the way C-Suite hopes. Let me tell you, you will be in a world of pain. Most of you won't be using LLM's to create your own businesses.
	▲	pluralmonad 2 days ago \| parent \| prev [-]
		But modern tillage/petrol based farming is an unsustainable aberration. Maybe a good example for this discussion, but in the opposite direction if it is.

▲

kaydub 2 days ago | parent | prev | next [-]

LOL what an argument.

Seeing the replies here it actually doesn't seem like everyone else can do this. Looks like a lot of people really suck at using LLMs to me.

	▲	icedchai 2 days ago \| parent [-]
		I'm not saying they can all do it now... but I don't think it's much of a stretch that they can learn it quickly and cheaply.

▲

benjiro 3 days ago | parent | prev [-]

> except for the fact that almost everyone else can do this, too. Or at least try to, resulting in a fast race to the bottom.

Ironically, that race to the bottom is no different then we already have. Have you already worked for a company before? A lot of software is developed, BADLY. I dare to say that a lot of software that Opus 4.5 generates, is often a higher quality then what i have seen in my 25 year carrier.

The amount of companies that cheapen out, hiring juniors fresh from school, to work as coding monkies is insane. Then projects have bugs / security issues, with tons of copy/pasted code, or people not knowing a darn thing.

Is that any different then your feared future? I dare to say, that LLms like Opus are frankly better then most juniors. As a junior to do a code review for security issues. Opus literally creates extensive tests, points out issues that you expect from a mid or higher level dev. Of course, you need to know to ask! You are the manager.

> Do you really want to be a middle manager to a bunch of text boxes, churning out slop, while they drive up our power bills and slowly terraform the planet?

Frankly, yes ... If you are a real developer, do you still think development is fun after 10 years, 20 years? Doing the exact same boring work. Reimplementing the 1001 login page, the 101 contact form ... A ton of our work is in reality repeating the same crap over and over again. And if we try to bypass it, we end up tied to tied to those systems / frameworks that often become a block around our necks.

Our industry has a lot of burnout because most tasks may start small but then grow beyond our scope. Todays its ruby on rails programming, then its angular, no wait, react, no wait, Vue, no wait, the new hotness is whatever again.

> slowly terraform the planet?

Well, i am actually making something.

Can you say the same for all the power / gpu draw with bitcoin, Ethereum whatever crap mining. One is productive, a tool with insane potential and usage, the other is a virtual currency where only one is ever popular with limited usage. Yet, it burns just as much for a way more limited return of usability.

Those LLMs that you are so against, make me a ton more productive. You wan to to try out something, but never really wanted to get committed because it was weeks of programming. Well, now you as manager, can get projects done fast. Learn from them way faster then your little fingers ever did.

▲

kaydub 2 days ago | parent | prev | next [-]

Homey, we're going to be replacing you devs that can't stand to use LLMs lol

▲

6177c40f 3 days ago | parent | prev | next [-]

You say this like it's some kind of ominous revelation, but that's just how capitalism works? Yeah, prepare for the future. All things are impermanent.

▲

goatlover 3 days ago | parent | next [-]

I suppose as long as either humans are always able to use new tools to create new jobs, or the wealth gets shared in a fully automated society, it won't be ominous. There are other scenarios.

	▲	6177c40f 3 days ago \| parent [-]
		I think we might make new jobs, but maybe not enough. I'll be pleasantly surprised if we get good at sharing wealth over the next few years. Maybe something like UBI will become so obviously necessary that it becomes politically feasible, I don't know. I suspect we'll probably limp along for awhile in mediocrity. Then we'll die. Same as it ever was. The important thing is to have fun with it.

▲

wiseowise 3 days ago | parent | prev [-]

> Yeah, prepare for the future.

Well excuse the shit out of my goddamn French, but being comfy for years and suddenly facing literal doom of my profession in a year wasn't on my bingo card.

And what do you even mean by "prepare"? Shit out a couple of mil out of my ass and invest asap?

▲

6177c40f 3 days ago | parent | next [-]

Sharpen sticks, hoard water maybe? We were always going to die someday, I don't see how this changes things.

▲

garblegarble 3 days ago | parent | prev [-]

>And what do you even mean by "prepare"?

Not the person you're responding to but... if you think it's a horse -> car change (and, to stretch the metaphor, if you think you're in the business of building stables) then preparation means train in another profession.

If you think it's a hand tools -> power tools change, learn how to use the new tools so you don't get left behind.

My opinion is it's a hand -> power tools change, and that LLMs give me the power to solve more problems for clients, and do it faster and more predictably than a client trying to achieve the same with an LLM. I hope I'm right :-)

▲

simonw 3 days ago | parent [-]

That's a good analogy. I'm on team hand tools to power tools too.

▲

SoftTalker 3 days ago | parent | next [-]

Why do you suppose that these tools will conveniently stop improving at some point that increases your productivity but are still too much for your clients to use for themselves?

▲

simonw 3 days ago | parent [-]

Because I've seen how difficult it is to get a client to explain to me what they need their software to do.

	▲	SoftTalker 2 days ago \| parent [-]
		And so the AI will develop the skills to interview the client and determine what they really need. There are textbooks written on how to do this, it's not going to be hard to incorporate into the training.

▲

th0ma5 3 days ago | parent | prev [-]

Power tools give way to robotics though so it seems small minded to think so small? Have you been following the latest trends though? New models come out all the time so you can't have this tool brand mindset. Keep studying and you'll get there.

▲

jack_pp 3 days ago | parent | prev [-]

Well probably OP won't be affected because management is very pleased with him and his output, why would they fire him? Hire someone who can probably have better output than him for 10% more money or someone who might have the same output for 25% less pay?

You think any manager in their right mind would take risks like that?

I think the real consequences are that they probably are so pleased with how productive the team is becoming that they will not hire new people or fire the ones who aren't keeping up with the times.

It's like saying "wow, our factory just produced 50% more cars this year, time to shut down half the factory to reduce costs!"

▲

wiseowise 3 days ago | parent [-]

> You think any manager in their right mind would take risks like that?

You really underestimate stupidity of your average manager. Two of our top performers left because they were underpaid and the manager (in charge of the comp) never even tried to retain them.

	▲	anomaly_ 3 days ago \| parent \| next [-]
		I bet they weren't as valuable as you think. This is a common issue with certain high performing line delivery employees (particularly those with technical skills, programmers, lawyers, accountants, etc), they always think they are carrying the whole team/company on their shoulders. It almost never turns out to be the case. The machine will keep grinding.
	▲	jack_pp 3 days ago \| parent \| prev [-]
		That's one kind of stupidity. Actually firing the golden goose is one step further

▲

aschobel 3 days ago | parent | prev [-]

i've never hit a limit with my $200 a month plan

▲ risyachka 3 days ago | parent | prev | next [-]

They are sleeping on it because there is absolutely no incentive to use it.

When needed it can be picked up in a day. Otherwise they are not paid based in tickets solved etc. If the incentives were properly aligned everyone would already use it

▲ dominicrose 3 days ago | parent | prev | next [-]

Use Claude Code... to do what? There are multiple layers of people involved in the decision process and they only come up with a few ideas every now and then. Nothing I can't handle. AI helps but it doesn't have to be an agent.

I'm not saying there aren't use cases for agents, just that it's normal that most software engineers are sleeping on it.

▲ chandureddyvari 3 days ago | parent | prev | next [-]

Came across official anthropic repo on gh actions very relevant to what you mentioned. Your idea on scheduled doc updation using llm is brilliant, I’m stealing this idea. https://github.com/anthropics/claude-code-action

▲ aschobel 3 days ago | parent | prev | next [-]

Agreed and skills are a huge unlock.

codex cli even has a skill to create skills; it's super easy to get up to speed with them

https://github.com/openai/skills/blob/main/skills/.system/sk...

▲ ndesaulniers 2 days ago | parent | prev | next [-]

Thanks for the example! There's a lot (of boilerplate?) here that I don't understand. Does anyone have good references for catching up to speed what's the purpose of all of these files in the demo?

▲ avereveard 3 days ago | parent | prev | next [-]

Also new haiku. Not as smart but lighting fast, I've it review code changes impact or if i need a wide but shallow change done I've it scan the files and create a change plan. Saves a lot of time waiting for claude or codex to get their bearing.

▲ andrekandre 3 days ago | parent | prev | next [-]

  > we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it.

(if you know) how is that compared to coderabbit? i'm seriously looking for something better rn...

▲

megalomanu 2 days ago | parent [-]

Never tried coderabbit, just because this is already good enough with Claude Code. It helped us to catch dozens of important issues we wouldn't have caught. We gave some instructions in the CLAUDE.md doc in the repository - with including a nice personalized roast of the engineer that did the review in the intro and conclusion to make it fun! :) Basically, when you do a "create PR" from your Claude Code, it will help you getting your Linear ticket (or creating one if missing), ask you some important questions (like: what tests have you done?), create the PR on Github, request the reviewers, and post a "Auto Review" message with your credentials. It's not an actual review per se but this is enough for our small team.

▲

andrekandre 2 days ago | parent [-]

thanks for the reply, yea we have a claude.md file, but coderabbit doesn't seem to pick it up or ignore it... hmmm wish we could try out claude code.

	▲	tinodb a day ago \| parent [-]
		Codex is even better in my experience at reviewing. You can find the prompt it uses in the repo

▲ philipwhiuk 2 days ago | parent | prev | next [-]

I was expecting a showcase to showcase what you've done with it, not just another person's attempt at instructing an AI to follow instructions.

▲ 3 days ago | parent | prev | next [-]

[deleted]

▲ moltar 3 days ago | parent | prev | next [-]

If anyone is excited about, and has experience with this kind of stuff, please DM. I have a role open for setting up these kinds of tools and workflows.

▲ theanonymousone 3 days ago | parent | prev | next [-]

Is Claude "Code" anything special,or it's mostly the LLM and other CLIs (e.g. Copilot) also work?

▲

square_usual 3 days ago | parent | next [-]

I've tried most of the CLI coding tools with the Claude models and I keep coming back to Claude Code. It hits a sweet spot of simple and capable, and right now I'd say it's the best from an "it just works" perspective.

▲

kaydub 2 days ago | parent | prev | next [-]

In my experience the CLI tool is part of the secret sauce. I haven't tried switching models per each CLI tool though. I use claude exclusively at work and for personal projects I use claude, codex, gemini.

▲

speedgoose 3 days ago | parent | prev | next [-]

It’s mostly the model, Copilot, Claude Code, OpenCode, snake oil like Oh My OpenCode, it’s not huge differences.

	▲	troupo 3 days ago \| parent \| next [-]
		Claude Code seems to package a relatively smart prompt as well, as it seems to work better even with one-line prompts than alternatives that just invoke the API. Key word: seems. It's impossible to do a proper qualitative analysis.
	▲	pluralmonad 2 days ago \| parent \| prev [-]
		Why do you call Oh My OpenCode snake oil?

▲

catlover76 3 days ago | parent | prev [-]

[dead]

▲ gjvc 2 days ago | parent | prev | next [-]

> (used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)

take my downvote as hard as you can. this sort of thing is awfully off-putting.

▲ kaydub 2 days ago | parent | prev | next [-]

I'm at the point where I say fuck it, let them sleep.

The tech industry just went through an insane hiring craze and is now thinning out. This will help to separate the chaff from the wheat.

I don't know why any company would want to hire "tech" people who are terrified of tech and completely obstinate when it comes to utilizing it. All the people I see downplaying it take a half-assed approach at using it then disparage it when it's not completely perfect.

I started tinkering with LLMs in 2022. First use case, speak in natural english to the llm, give it a json structure, have it decipher the natural language and fill in that json structure (vacation planning app, so you talk to it about where/how you want to vacation and it creates the structured data in the app). Sometimes I'd use it for minor coding fixes (copy and paste a block into chatgpt, fix errors or maybe just ideation). This was all personal project stuff.

At my job we got LLM access in mid/late 2023. Not crazy useful, but still was helpful. We got claude code in 2024. These days I only have an IDE open so I can make quick changes (like bumping up a config parameter, changing a config bool, etc.). I almost write ZERO code now. I usually have 3+ claude code sessions open.

On my personal projects I'm using Gemini + codex primarily (since I have a google account and chatgpt $20/month account). When I get throttled on those I go to claude and pay per token. I'll often rip through new features, projects, ideas with one agent, then I have another agent come through and clean things up, look for code smells, etc. I don't allow the agents to have full unfettered control, but I'd say 70%+ of the time I just blindly accept their changes. If there are problems I can catch them on the MR/PR.

I agree about the low hanging fruit and I'm constantly shocked at the sheer amount of FUD around LLMs. I want to generalize, like I feel like it's just the mid/jr level devs that speak poorly about it, but there's definitely senior/staff level people I see (rarely, mind you) that also don't like LLMs.

I do feel like the online sentiment is slowly starting to change though. One thing I've noticed a lot of is that when it's an anonymous post it's more likely to downplay LLMs. But if I go on linkedin and look at actual good engineers I see them praising LLMs. Someone speaking about how powerful the LLMs are - working on sophisticated projects at startups or FAANG. Someone with FUD when it comes to LLM - web dev out of Alabama.

I could go on and on but I'm just ranting/venting a little. I guess I can end this by saying that in my professional/personal life 9/10 of the top level best engineers I know are jumping on LLMs any chance they get. Only 1/10 talks about AI slop or bullshit like that.

	▲	throw1235435 2 days ago \| parent [-]
		Not entirely disagreeing with your point but I think they've mostly been forced to pivot recently for their own sakes; they will never say it though. As much as they may seem eager the most public people tend to also be better at outside communication and knowing what they should say in public to enjoy more opportunities, remain employed or for the top engineers to still seem relevant in the face of the communities they are a part of. Its less about money and more about respect there I think. The "sudden switch" since Opus 4.5 when many were saying just a few months ago "I enjoy actual coding" but now are praising LLM's isn't a one off occurrence. I do think underneath it is somewhat motivated by fear; not for the job however but for relevance. i.e. its in being relevant to discussions, tech talks, new opportunities, etc.

▲ ps 2 days ago | parent | prev | next [-]

OK, I am gonna be the guy and put my skin in the game here. I kind of get the hype, but the experience with e.g. Claude Code (or Github Copilot previously and others as weel) has so far been pretty unreliable.

I have Django project with 50 kLOC and it is pretty capable of understanding the architecture, style of coding, naming of variables, functions etc. Sometimes it excels on tasks like "replicate this non-trivial functionality for this other model and update the UI appropriately" and leaves me stunned. Sometimes it solves for me tedious and labourous "replace this markdown editor with something modern, allowing fullscreen edits of content" and does annoying mistake that only visual control shows and is not capable to fix it after 5 prompts. I feel as I am becoming tester more than a developer and I do not like the shift. Especially when I do not like to tell someone he did an obvious mistake and should fix it - it seems I do not care if it is human or AI, I just do not like incompetence I guess.

Yesterday I had to add some parameters to very simple Falcon project and found out it has not been updated for several months and won't build due to some pip issues with pymssql. OK, this is really marginal sub-project so I said - let's migrate it to uv and let's not get hands dirty and let the Claude do it. He did splendidly but in the Dockerfile he missed the "COPY server.py /data/" while I asked him to change the path... Build failed, I updated the path myself and moved on.

And then you listen to very smart guys like Karpathy who rave about Tab, Tab, Tab, while not understanding the language or anything about the code they write. Am I getting this wrong?

I am really far far away from letting agents touch my infrastructure via SSH, access managed databases with full access privileges etc. and dread the day one of my silly customers asks me to give their agent permission to managed services. One might say the liability should then be shifted, but at the end of the day, humans will have to deal with the damage done.

My customer who uses all the codebase I am mentioning here asked me, if there is a way to provide "some AI" with item GTINs and let it generate photos, descriptions, etc. including metadata they handcrafted and extracted for years from various sources. While it looks like nice idea and for them the possibility of decreasing the staff count, I caught the feeling they do not care about the data quality anymore or do not understand the problems the are brining upon them due to errors nobody will catch until it is too late.

TL;DR: I am using Opus 4.5, it helps a lot, I have to keep being (very) cautious. Wake up call 2026? Rather like waking up from hallucination.

▲ lfliosdjf 2 days ago | parent | prev | next [-]

Why dont I see any streams building apps as quickly as they say? Just HYpe

▲ winterbloom 3 days ago | parent | prev | next [-]

Didn't feel like reading all this so I shortened it! sorry!

I shortened it for anyone else that might need it

----

Software engineers are sleeping on Claude Code agents. By teaching it your conventions, you can automate your entire workflow:

Custom Skills: Generates code matching your UI library and API patterns.

Quality Ops: Automates ESLint, doc syncing, and E2E coverage audits.

Agentic Reviews: Performs deep PR checks against custom checklists.

Smart Triage: Pre-analyzes tickets to give devs a head start.

Check out the showcase repo to see these patterns in action.

	▲	gjvc 2 days ago \| parent [-]
		you are part of the problem

▲ mcny 3 days ago | parent | prev [-]

Everybody says how good Claude is and I go to my code base and I can't get it to correctly update one xaml file for me. It is quicker to make changes myself than to explain exactly what I need or learn how to do "prompt engineering".

Disclaimer: I don't have access to Claude Code. My employer has only granted me Claude Teams. Supposedly, they don't use my poopy code to train their models if I use my work email Claude so I am supposed to use that. If I'm not pasting code (asking general questions) into Claude, I believe I'm allowed to use whatever.

▲

spaceman_2020 3 days ago | parent [-]

What's even the point of this comment if you self-admittedly don't have access to the flagship tool that everyone has been using to make these big bold coding claims?

▲

hu3 3 days ago | parent | next [-]

isn't Claude Teams powerful? does it not have access to Opus?

pardon my ignorance.

I use GitHub Copilot which has access to llms like Gemini 3, Sonnet/Opus 4.5 ang GPT 5.2

▲

halfmatthalfcat 3 days ago | parent | prev [-]

Because the same claims of "AI tool does everything" are made over and over again.

▲

spaceman_2020 3 days ago | parent | next [-]

The claims are being made for Claude Code, which you don't have access to.

	▲	mr_mitm 3 days ago \| parent [-]
		I believe part of why Claude Code is so great because it has the chance to catch its own mistakes. It can run compilers, linters, browsers and check its own output. If it makes a mistake, it takes one or two extra iterations until it gets it right.

▲

fragmede 3 days ago | parent | prev [-]

It's not "AI tool does everything", it's specifically Claude Code with Opus 4.5 is great at "it", for whatever "it" a given commenter is claiming.