Remix.run Logo
GolDDranks a day ago

After using Claude 3.7 Sonnet for a few weeks, my verdict is that its coding abilities are unimpressive both for unsupervised coding but also for problem solving/debugging if you are expecting accurate results and correct code.

However, as a debugging companion, it's slightly better than a rubber duck, because at least there's some suspension of disbelief so I tend to explain things to it earnestly and because of that, process them better by myself.

That said, it's remarkable and interesting how quickly these models are getting better. Can't say anything about version 4, not having tested it yet, but in a five years time, the things are not looking good for junior developers for sure, and a few years more, for everybody.

jjmarr a day ago | parent | next [-]

As a junior developer it's much easier for me to jump into a new codebase or language and make an impact. I just shipped a new error message in LLVM because Cline found the 5 spots in 10k+ files where I needed to make the code changes.

When I started an internship last year, it took me weeks to learn my way around my team's relatively smaller codebase.

I consider this a skill and cost issue.

If you are rich and able to read fast, you can start writing LLVM/Chrome/etc features before graduating university.

If you cannot afford the hundreds of dollars a month Claude costs or cannot effectively review the code as it is being generated, you will not be employable in the workforce.

matsemann 17 hours ago | parent | next [-]

But if you had instead spent the "weeks to learn your way around the codebase", that would have given dividends forever. I'm a bit afraid that by oneshoting features like these, many will never get to the required level to do bigger changes that relies on a bigger understanding.

Of course, LLMs might get there eventually. But until then I think it will create a bigger divide between seniors and juniors than it traditionally has been.

jjmarr 16 hours ago | parent [-]

I've never been able to one-shot a feature with an agent. It's much easier to "learn my way around the codebase" by watching the AI search the codebase and seeing its motivation/mental model.

Going AFK is a terrible idea anyways because I have to intervene when it's making bad architectural decisions. Otherwise it starts randomly deleting stuff or changing the expected results of test cases so they'll pass.

kweingar a day ago | parent | prev | next [-]

> If you cannot afford the hundreds of dollars a month Claude costs

Employers will buy AI tools for their employees, this isn't a problem.

If you're saying that you need to buy and learn these tools yourself in order to get a job, I strongly disagree. Prompting is not exactly rocket science, and with every generation of models it gets easier. Soon you'll be able to pick it up in a few hours. It's not a differentiator.

jjmarr 16 hours ago | parent [-]

I need side projects and OSS contributions to get hired as a new graduate or an intern. The bar for both of those will be much higher if everyone is using AI.

quantumHazer 14 hours ago | parent [-]

Yes side project are for fun and for learning, not for prompting an LLM. Unless you dislike coding and problem solving.

14 hours ago | parent [-]
[deleted]
midasz 14 hours ago | parent | prev | next [-]

> make an impact.

To me, a junior devs biggest job is learning and not delivering value. Is a pitfall I'm seeing in my own team where he is so focused on delivering value that he's not gaining an understanding.

quantumHazer 15 hours ago | parent | prev | next [-]

You're sabotaging yourself though. You are avoiding learning.

What's the point of shipping a Chrome feature before graduating? Just to put in your CV that you've committed in some repo? In the past this would be signal of competence, but now you're working towards a future where doing this thing is not competence signaling anymore.

14 hours ago | parent [-]
[deleted]
mrheosuper a day ago | parent | prev | next [-]

Some companies do not like you upload their code to 3rd parties

BeefySwain a day ago | parent | prev [-]

I'm curious what tooling you are using to accomplish this?

jjmarr 16 hours ago | parent [-]

I used Cline+Claude 3.7 Sonnet for the initial draft of this LLVM PR. There's a lot of handholding and the final version was much different than the original.

https://github.com/llvm/llvm-project/pull/130458

Right now I'm using Roo Code and Claude 4.0. Roo Code looks cooler and draws diagrams but I don't know if it's better.

BeetleB a day ago | parent | prev | next [-]

I've noticed an interesting trend:

Most people who are happy with LLM coding say something like "Wow, it's awesome. I asked it to do X and it did it so fast with minimal bugs, and good code", and occasionally show the output. Many provide even more details.

Most people who are not happy with LLM coding ... provide almost no details.

As someone who's impressed by LLM coding, when I read a post like yours, I tend to have a lot of questions, and generally the post doesn't have the answers.

1. What type of problem did you try it out with?

2. Which model did you use (you get points for providing that one!)

3. Did you consider a better model (compare how Gemini 2.5 Pro compares to Sonnet 3.7 on the Aider leaderboard)?

4. What were its failings? Buggy code? Correct code but poorly architected? Correct code but used some obscure method to solve it rather than a canonical one?

5. Was it working on an existing codebase or was this new code?

6. Did you manage well how many tokens were sent? Did you use a tool that informs you of the number of tokens for each query?

7. Which tool did you use? It's not just a question of the model, but of how the tool handles the prompts/agents under it. Aider is different from Code which is different from Cursor which is different form Windsurf.

8. What strategy did you follow? Did you give it the broad spec and ask it to do anything? Did you work bottom up and work incrementally?

I'm not saying LLM coding is the best or can replace a human. But for certain use cases (e.g. simple script, written from scratch), it's absolutely fantastic. I (mostly) don't use it on production code, but little peripheral scripts I need to write (at home or work), it's great. And that's why people like me wonder what people like you are doing differently.

But such people aren't forthcoming with the details.

GolDDranks 21 hours ago | parent | next [-]

Two problems:

1) Writing a high-performance memory allocator for a game engine in Rust: https://github.com/golddranks/bang/tree/main/libs/arena/src (Still work in progress, so it's in a bit messy state.) Didn't seem to understand the design I had in mind, and/or the requirements and goes on tangents and starts changing the design. In the end, coded the main code myself and used LLM for writing tests with some success. Had to remove tons of inane comments that didn't provide any explanatory value.

2) Trying to fix a Django ORM expression that generates unoptimal and incorrect SQL. Constantly changes opinion whether something is even possible or supported by Django, apologizes when pointing out mistakes / bugs / hallucinations, but then proceeds to not internalize the implications of the said mistakes.

I used the Zed editor with its recently published agentic features. I tried to prompt it with a chat style discussion, but it often did bigger edits I would have liked, and failed to share a high-level plan in advance, something I often requested.

My biggest frustrations were not coding problems per se, but just general inability to follow instructions and see implications, and lacking the awareness to step back and ask for confirmations or better directions if there are "hold on, something's not right" kind of moments. Also, generally following through with "thanks for pointing that out, you are absolutely right!" even if you are NOT right. That yes-man style seriously erodes trust in the output.

BeetleB 20 hours ago | parent | next [-]

Thanks for the concrete examples! These seem to be more sophisticated than the cases I use them for. Mostly I'm using them for tedious, simpler routine code (needing to process all files in a directory in a certain way, output them in a similar tree structure, with changes to filenames, logging, etc).

Your Django ORM may be more complicated than the ones I use. I haven't tried it much with Django (still reluctant to use it with production code), but a coworker did use it on our code base and it found good optimizations for some of our inefficient ORM usage. He learned new Django features as a result (new to him, that is).

> I tried to prompt it with a chat style discussion, but it often did bigger edits I would have liked, and failed to share a high-level plan in advance, something I often requested.

With Aider, I often use /ask to do a pure chat (no agents). It gives me a big picture overview and the code changes. If I like it, I simply say "Go ahead". Or I refine with corrections, and when it gets it right, I say "Go ahead". So far it rarely has changed code beyond what I want, and the few times it did turned out to be a good idea.

Also, with Aider, you can limit the context to a fixed set of files. That doesn't solve it changing other things in the file - but as I said, rarely a problem for me.

One thing to keep in mind - it's better to view the LLM not as an extension of yourself, but more like a coworker who is making changes that you're reviewing. If you have a certain vision/design in mind, don't expect it to follow it all the way to low level details - just as a coworker will sometimes deviate.

> My biggest frustrations were not coding problems per se, but just general inability to follow instructions and see implications, and lacking the awareness to step back and ask for confirmations or better directions if there are "hold on, somethings not right" kind of moments.

You have to explicitly tell it to ask questions (and some models ask great questions - not sure about Sonnet 3.7). Read this page:

https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

I don't follow much of what's on his post, but the first part where you specify what you want it to do and have it ask you questions has always been useful! He's talking about big changes, but I sometimes have it ask me for minor changes. I just add to my prompts "Ask me something if it seems ambiguous".

GolDDranks 18 hours ago | parent [-]

Re: Prompting to ask. Thanks, I'll try that. And I'm gonna try version 4 as soon as I can.

drcongo 14 hours ago | parent [-]

I've been using Claude 3.7 in Zed for a while, and I've found that I've been getting better at asking it to do things (including a lot of Django ORM stuff). Each project I work on I now have a `.context.md` that gives a decent outline of the project, and also includes things I specifically don't want it to do, like create migrations or install packages. Then with the actual prompting, I tend to ask it to plan things first, and to stop and ask me if it thinks I've missed out any detail. I've been pretty impressed with how right it gets things with this setup.

And tiny tip, just in case you've never noticed it, there's a little + button just above the prompt input in Zed that lets you add files you want added to the context - this is where I add the `.context.md` whenever I start work on something.

jjmarr 15 hours ago | parent | prev [-]

Try Roo Code in Orchestrator mode or Cline in plan mode. It will do tons of requirements analysis before starting work.

sponnath a day ago | parent | prev | next [-]

I feel like the opposite is true but maybe the issue is that we both live in separate bubbles. Often times I see people on X and elsewhere making wild claims about the capabilities of AI and rarely do they link to the actual output.

That said, I agree that AI has been amazing for fairly closed ended problems like writing a basic script or even writing scaffolding for tests (it's about 90% effective at producing tests I'd consider good assuming you give it enough context).

Greenfield projects have been more of a miss than a hit for me. It starts out well but if you don't do a good job of directing architecture it can go off the rails pretty quickly. In a lot of cases I find it faster to write the code myself.

osn9363739 a day ago | parent [-]

I'm in the same bubble. I find if they do link to it it's some basic unimpressive demo app. That said, I want to see a video where of one of these people that apparently 10x'd there programming go against a dev without AI across various scenarios. I just think it would be interesting to watch if they had a similar base skill & understanding of things.

BeetleB a day ago | parent [-]

> That said, I want to see a video where of one of these people that apparently 10x'd there programming go against a dev without AI across various scenarios.

It would be interesting, but do understand that if AI coding is totally fantastic in one domain (basic automation scripting) and totally crappy in another (existing, complex codebase), it's still a (significant) improvement from the pre-AI days.

Concrete example: A few days ago I had an AI model write me a basic MCP tool: Creating a Jira story. In 15 minutes, it had written the API function for me, I manually wrapped it to make it an MCP tool, tested it, and then created tens of stories from a predefined list, and verified it worked.

Now if you already know the Jira APIs (endpoints, auth, etc), you could do it with similar speed. But I didn't. Just finding the docs, etc would take me longer.

Code quality is fine. This is not production code. It's just for me.

Yes, there are other Jira MCP libraries already. It was quicker for me to write my own than to figure out the existing ones (ditto for Github MCP). When using LLMs to solve a coding problem is faster than using Google/SO/official docs/existing libraries, that's clearly a win.

Would I do it this way for production code? No. Does that mean it's bad? No.

fumeux_fume a day ago | parent | prev | next [-]

Aside from the fact that you seem to be demanding a lot from someone who's informally sharing their experience online, I think the effectiveness really depends on what you're writing code for. With straightforward use cases that have ample documented examples, you can generally expect decent or even excellent results. However, the more novel the task or the more esoteric the software library, the likelier you are to encounter issues and feel dissatisfied with the outcomes. Additionally, some people are simply pickier about code quality and won't accept suboptimal results. Where I work, I regularly encounter wildly enthusiastic opinions about GenAI that lack any supporting evidence. Dissenting from the mainstream belief that AI is transforming every industry is treated as heresy, so such skepticism is best kept close to the chest—or better yet, completely to oneself.

BeetleB a day ago | parent [-]

> Aside from the fact that you seem to be demanding a lot from someone who's informally sharing their experience online

Looking at isolated comments, you are right. My point was that it was a trend. I don't expect everyone to go into details, but I notice almost none do.

Even what you pointed out ("great for somethings, crappy for others") has much higher entropy.

Consider this, if every C++ related submission had comments that said the equivalent of "After using C++ for a few weeks, my verdict is that its performance capabilities are unimpressive", and then didn't go into any details about what made them think that, I think you'd find my analogous criticism about such comments fair.

raincole a day ago | parent | prev | next [-]

Yeah, that's obvious. It's even worse for blog posts. Pro-LLM posts usually come with the whole working toy apps and the prompts that were used to generate them. Anti-LLM posts are usually some logical puzzles with twists.

Anyway that's the Internet for you. People will say LLM has been plateaued since 2022 with a straight face.

csomar 19 hours ago | parent | prev | next [-]

Maybe you are not reading what we are writing :) Here is an article of mine https://omarabid.com/gpt3-now

> But for certain use cases (e.g. simple script, written from scratch), it's absolutely fantastic.

I agree with that. I've found it to be very useful for "yarn run xxx" scripts. Can automate lots of tasks that I wouldn't bother with previously because the cost of coding the automation vs. doing them manually was off.

BeetleB 8 hours ago | parent [-]

That was a fun read - thanks.

whimsicalism 16 hours ago | parent | prev | next [-]

i think these developments produce job/economic anxiety and so a certain percentage of people react this way, even higher percents on reddit where there is a lot of job anxiety

jiggawatts a day ago | parent | prev [-]

Reminds me of the early days of endless “ChatGPT can’t do X” comments where they were invariably using 3.5 Turbo instead of 4, which was available to paying users only.

Humans are much lazier than AIs was my takeaway lesson from that.

StefanBatory a day ago | parent | prev [-]

Things were already not looking good for junior devs. I graduated this year in Poland, many of my peers were looking for jobs in IT for like a year before they were able to find anything. And many internships were faked as they couldn't get anything (here it's required for you to do internship if you want to graduate).

GolDDranks a day ago | parent | next [-]

I sincerely hope you'll manage to find a job!

What I meant was purely from the capabilities perspective. There's no way a current AI model would outperform an average junior dev in job performance over... let's say, a year to be charitable. Even if they'd outperform junior devs during the first week, no way for a longer period.

However, that doesn't mean that the business people won't try to pre-empt potential savings. Some think that AI is already good enough, and others don't, but they count it to be good enough in the future. Whether that happens remains to be seen, but the effects are already here.

v3np a day ago | parent | prev [-]

If I may ask: what university was this? Asking as I am the CTO of a YC startup and we are hiring junior engineers in Berlin!