Remix.run Logo
jaggederest 2 days ago

Yeah I've handed it a naive scalar implementation and said "Make this use SIMD for Mac Silicon / NEON" and it just spits out a working implementation that's 3-6x faster and passes the tests, which are binary exact specifications.

jonstewart 2 days ago | parent [-]

It can do this at the level of a function, and that's -useful-, but like the parent reply to top-level comment, and despite investing the time, using skills & subagents, etc., I haven't gotten it to do well with C++ or Rust projects of sufficient complexity. I'm not going to say they won't some day, but, it's not today.

rtfeldman 2 days ago | parent | next [-]

Anecdotally, we use Opus 4.5 constantly on Zed's code base, which is almost a million lines of Rust code and has over 150K active users, and we use it for basically every task you can think of - new features, bug fixes, refactors, prototypes, you name it. The code base is a complex native GUI with no Web tech anywhere in it.

I'm not talking about "write this function" but rather like implementing the whole feature by writing only English to the agent, over the course of numerous back-and-forth interactions and exhausting multiple 200K-token context windows.

For me personally, definitely at least 99% all of the Rust code I've committed at work since Opus 4.5 came out has been from an agent running that model. I'm reading lots of Rust code (that Opus generated) but I'm essentially no longer writing any of it. If dot-autocomplete (and LLM autocomplete) disappeared from IDE existence, I would not notice.

mr_o47 a day ago | parent | next [-]

Woah that's a very interesting claim you made I was shying away from writing Rust as I am not a Rust developer but hearing from your experience looks like claude has gotten very good at writing Rust

jaggederest a day ago | parent [-]

Honestly I think the more you can give Claude a type system and effective tests, the more effective it can be. Rust is quite high up on the test strictness front (though I think more could be done...), so it's a great candidate. I also like it's performance on Haskell and Go, both get you pretty great code out of the box.

norir 2 days ago | parent | prev | next [-]

Have you ever worried that by programming in this way, you are methodically giving Anthropic all the information it needs to copy your product? If there is any real value in what you are doing, what is to stop Anthropic or OpenAI or whomever from essentially one-shotting Zed? What happens when the model providers 10x their costs and also use the information you've so enthusiastically given them to clone your product and use the money that you paid them to squash you?

rtfeldman 2 days ago | parent | next [-]

Zed's entire code base is already open source, so Anthropic has a much more straightforward way to see our code:

https://github.com/zed-industries/zed

kaydub 2 days ago | parent | prev [-]

That's what things like AWS bedrock are for.

Are you worried about microsoft stealing your codebase from github?

djhn a day ago | parent [-]

Isn’t it widely assumed Microsoft used private repos for LLM training?

And even with a narrower definition of stealing, Microsoft’s ability to share your code with US government agencies is a common and very legitimate worry in plenty of threat model scenarios.

ziml77 a day ago | parent | prev | next [-]

I just uninstalled Zed today when I realized the reason I couldn't delete a file on Windows because it was open in Zed. So I wouldn't speak too highly of the LLM's ability to write code. I have never seen another editor on Windows make the mistake of opening files without enabling all 3 share modes.

Snuggly73 2 days ago | parent | prev [-]

The article is arguing that it will basically replace devs. Do you think it can replace you basically one-shotting features/bugs in Zed?

And also - doesn’t that make Zed (and other editors) pointless?

rtfeldman 2 days ago | parent | next [-]

> Do you think it can replace you basically one-shotting features/bugs in Zed?

Nobody is one-shotting anything nontrivial in Zed's code base, with Opus 4.5 or any other model.

What about a future model? Literally nobody knows. Forecasts about AI capabilities have had horrendously low accuracy in both directions - e.g. most people underestimated what LLMs would be capable of today, and almost everyone who thought AI would at least be where it is today...instead overestimated and predicted we'd have AGI or even superintelligence by now. I see zero signs of that forecasting accuracy improving. In aggregate, we are atrocious at it.

The only safe bet is that hardware will be faster and cheaper (because the most reliable trend in the history of computing has been that hardware gets faster and cheaper), which will naturally affect the software running on it.

> And also - doesn’t that make Zed (and other editors) pointless?

It means there's now demand for supporting use cases that didn't exist until recently, which comes with the territory of building a product for technologists! :)

Snuggly73 2 days ago | parent [-]

Thanx. More of a "faster keyboard" so far then?

And yeah - if I had a crystal ball, I would be on my private island instead of hanging on HN :)

rtfeldman 2 days ago | parent [-]

Definitely more than a faster keyboard (e.g. I also ask the model to track down the source of a bug, or questions about the state of the code base after others have changed it, bounce architectural ideas off the model, research, etc.) but also definitely not a replacement for thinking or programming expertise.

kevin42 2 days ago | parent | prev [-]

Trying to one-shot large codebases is a exercise in futility. You need to let Claude figure out and document the architecture first, then setup agents for each major part of the project. Doing this keeps the context clean for the main agent, since it doesn't have to go read the code each time. So one agent can fill it's entire context understanding part of the code and then the main agent asks it how to do something and gets a shorter response.

It takes more work than one-shot, but not a lot, and it pays dividends.

dpark a day ago | parent [-]

Is there a guide for doing that successfully somewhere? I would love to play with this on a large codebase. I would also love to not reinvent the wheel on getting Claude working effectively on a large code base. I don’t even know where to start with, e.g., setting up agents for each part.

jaggederest 2 days ago | parent | prev | next [-]

I don't know if you've tried Chatgpt-5.2 but I find codex much better for Rust mostly due to the underlying model. You have to do planning and provide context, but 80%+ of the time it's a oneshot for small-to-medium size features in an existing codebase that's fairly complex. I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job.

If you have any opensource examples of your codebase, prompt, and/or output, I would happily learn from it / give advice. I think we're all still figuring it out.

Also this SIMD translation wasn't just a single function - it was multiple functions across a whole region of the codebase dealing with video and frame capture, so pretty substantial.

glhaynes 2 days ago | parent [-]

"I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job."

That's a good way to say it, I totally identify.

andai 2 days ago | parent | prev [-]

Is that a context issue? I wonder if LSP would help there. Though Claude Code should grep the codebase for all necessary context and LSP should in theory only save time, I think there would be a real improvement to outcomes as well.

The bigger a project gets the more context you generally need to understand any particular part. And by default Claude Code doesn't inject context, you need to use 3rd party integrations for that.