Remix.run Logo
coffeeri 2 days ago

This video [0] is relevant, though it actually supports your point - it shows Claude Code struggling with non-trivial tasks and needing significant hand-holding.

I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

[0] https://www.youtube.com/watch?v=EL7Au1tzNxE

thecupisblue 2 days ago | parent | next [-]

Great video! Even more, shows a few things - how good it is with such a niche language but also exposes some direct flaws.

First off, Rust represents quite a small part of the training dataset (last I checked it was under 1% of code dataset) in most public sets, so it's got waaay less training then other languages like TS or Java. You added 2 solid features, backed with tests and documentation and nice commit messages. 80% of devs would not deliver this in 2.5 hours.

Second, there was a lot of time/token waste messing around with git and git messages. Few tips I noticed that could help you in the workflow:

#1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

#2: Claude has hooks, if your favorite language has a formatter like rust fmt, just use hooks to run rust fmt and similar.

#3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

#5: Saying "max 50 characters title" doesn't really mean anything to the LLM. They have no inherent ability to count, so you are relying on probability, which is quite low since your context is quite filled at this point. If they want to count the line length, they also have to use external tools. This is an inherent LLM design issue and discussing it with an LLM doesn't get you anywhere really.

newswasboring 2 days ago | parent | next [-]

> #3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

Heh, I write this for some production code too (python). I guess because python is not typed, I'm testing if my pydantic implementation works.

komali2 2 days ago | parent | prev [-]

> #1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

I've not heard of this for, what does this mean practically? Some kind of invocation in claude? Opening another claude window?

theshrike79 a day ago | parent | next [-]

Agents are basically separate "threads" with their own context window.

So the main claude can tell the test-runner agent "Run tests using `task test` and return the results"

Then the test-runner agent runs the tests, "wasters" its context by reading 500 lines of test results, sees that it's ok. Returns "tests ok" to the main context.

This way the main context is spared from the useless chatter and can go on for longer.

thecupisblue 2 days ago | parent | prev | next [-]

Oh you're about to unlock a whole new level of token burning. There is an /agents command that lets you define agents for specific tasks or areas. Each of them has their own context and their own rules.

Then claude can delegate the work to them when appropriate, or you can tell it directly to use the subagent, i.e. a subagent for your frontend, backend, specific microservice, database, etc etc.

Quite depends on your workflow which ones you create/need, but they are a really nice quality of life change.

Aeolun 2 days ago | parent | prev [-]

You ask claude to use an agent, and it’ll spawn a sub agent that takes a bunch of actions in a new context, then lets the original agent only know a summary of the results.

Aeolun 2 days ago | parent | prev [-]

> I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

Or we’re just having too much fun making stuff to make videos to convince people that are never going to be convinced.

Difwif 2 days ago | parent [-]

I took a quick informal poll of my coworkers and the majority of us have found workflows where CC is producing 70-99% of the code on average in PRs. We're getting more done faster. Most of these people tend to be anywhere from 5-12 yrs professional experience. There are some concerns that maybe more bugs are slipping through (but also there's more code being produced).

We agree most problems stem from: 1. Getting lazy and auto-accepting edits. Always review changes and make sure you understand everything. 2. Clearly written specification documents before starting complex work items 3. Breaking down tasks into a managable chunk of scope 4. Clean digestible code architecture. If it's hard for a human to understand (e.g: poor separation of concerns) it will be hard for the LLM too.

But yeah I would never waste my time making that video. Having too much fun turning ideas into products to care about proving a point.

rhubarbtree a day ago | parent | next [-]

> Having too much fun turning ideas into products to care about proving a point.

This is a strange response to me. Perhaps you and others aren’t aware that there’s a subculture of folks who livestream coding in general? Nothing to do with proving a point.

My interest in finding such examples is exactly due to the posting of comments like yours - strong claims of AI success - that don’t reflect my experience. I want to see videos that show what I’m doing wrong, and why that gives very different results.

I don’t have an agenda or point to prove, I just want to understand. That is the hacker way!

theshrike79 a day ago | parent | prev [-]

2, 3, 4 are all what human coders need to be efficient too :)

I'm kinda hoping that this LLM craze will force people to be better at it. Have documentation up to date and easily accessible is good for everyone.

Like we're (over here) better at marking lines in the road, because the EU mandated lane keeping assist needs the road markings to be there or it won't work.