Does anyone have a link to a video that uses Claude Code to produce clean robust code that solves a non trivial problem (ie not tic tac toe or a landing page) more quickly than a human programmer can write? I don’t want a “demo”, I want a livestream from an independent programmer unaffiliated with any AI company and thus not incentivised to hype.

I want the code to have subsequently been deployed in production and demonstrably robust, without additional work outside of the livestream.

The livestream should include code review, test creation, testing, PR creation.

It should not be on a greenfield project, because nearly all coding is not.

I want to use Claude and I want to be more productive, but my experience to date is that for writing code beyond autocomplete AI is not good enough and leads to low quality code that can’t be maintained, or else requires so much hand holding that it is actually less efficient than a good programmer.

There are lots of incentives for marketing at the grassroots level. I am totally open to changing my mind but I need evidence.

▲

M4v3R 2 days ago | parent | next [-]

I've live streamed how I've built a tower defense game over a span of a week entirely using AI. I've also written down all the prompts were used to create this game, you can read about it here: https://news.ycombinator.com/item?id=44463967

Mind you I've never wrote a non-trivial game before in my life. It would take me weeks to do this on my own without any AI assistance.

Right now I'm working on a 3d world map editor for Final Fantasy VII that was also almost exclusively vibe-coded. It's almost finished and I plan a write up and a video about it when I'm done.

Now of course you've made so many qualifiers in your post that you'll probably dismiss this as "not production", "not robust enough", "not clean" etc. But this doesn't matter to me. What matters is I manage to finish projects that I would not otherwise if not for the AI coding tools, so having them is a huge win for me.

▲

hvb2 2 days ago | parent | next [-]

> What matters is I manage to finish projects that I would not otherwise if not for the AI coding tools, so having them is a huge win for me.

I think the problem is in your definition of finishing a project.

Can you support said code, can you extend it, are you able to figure out where bugs are when they show up? In a professional setting, the answer to all of those should likely be yes. That's what production code is.

▲

ffsm8 2 days ago | parent [-]

I disagree with your sentiment.

The difference isn't what's finishing a project is, is the dissonance between what M4v3R and rhubarbtree understand when talking about "nontrivial production" software.

When you're working in enterprise, you usually have multiple stakeholders each defining sometimes even conflicting requirements to behavior of your software. And you're required to adhere to these requirements stringently.

That's an environment that's inherently a bad fit for vibe coding.

It can still be used there, too, but you will not get a 2-3x speed up, because the LLM will always introduce minor behavioral changes - which aren't important in M4v3R scenario, but a complete deal breaket for rhubarbtree.

From my own experience, I don't get a speed up at all via CoPilot agentic mode (Claude code is banned at my workplace). But I have had a significant boost in productivity in projects that don't need to adhere to any specific spec - namely projects I do an my own time (with Claude code right now).

I still use Copilot agentic mode though. While I haven't timed myself, I don't think I'm faster with it whatsoever. It's just less mentally involved in a lot of scenarios, so it's less exhausting - leaving more energy for side projects .

▲

mattmanser 2 days ago | parent [-]

I don't believe it's to do with the requirements. I think you'll still hit the same problems if those greenfield projects grow. It's still fundamentally about the code. I think you're missing the difference between a 10/100k+ lines of code professional software vs a quick 3k lines greenfield project.

In a few thousand lines of code you can get away with a massive amount of code bloat, quick hacks and inconsistent APIs. In a program that's anything more than a few thousand lines, you can't. It just becomes too confusing. You have to be deliberate. Code has to follow patterns so the cognitive load is lowered. Stuff has to be split up in a predictable manner.

And there's another problem, sensible and predictable maintenance. Changes and fixes have to be targeted and specific. They have to be written to avoid side-effect.

For organisation, it's been a huge effort on everyone's part these last 30 years to achieve that. Make code understandable, by organising it better. From one direction, languages have improved, with authors reducing boilerplate + cross-pollination of ideas between languages like anonymous methods. On the other, it's developers inventing + describing patterns or KISS or the single responsibility principle. Or even seemingly trivial things like choosing predictable folder structures and enforcing indentation rules[1]. I'm starting to feel that's often the main skill a senior dev brings to the table, organising code well.

Better code organization has made it possible for developers to make larger program. Code organisation is a need that becomes a big problem if you're not doing it well in large projects, but not really a problem if you're not doing it well in small projects.

And right now, AI isn't very good at code organisation. We might believe that you have to have a mental model of the whole program in your head, something an LLM is just not capable of right now. And I don't know if that's going to turn out to be a solvable problem as it seems like a huge context problem.

For maintenance, I'm not sure. AI seems pretty terrible at it. It often rewrites everything and throws the baby out with the bathwater. Again, it's a context problem.

Both could turn out to be easy to solve for this generation of AI, in the end.

[1] Younger programmers will not believe that even 15/20 years ago it was still a common problem that developers did not bother to indent their code consistently. In my first two jobs I'd regularly hit inconsistently indented code.

▲

MGriisser 2 days ago | parent [-]

I personally find Claude Code has no real issues working and producing code in the 40k LoC Ruby on Rails repo I work in nor in the 45k LoC Elixir/Phoenix repo I work in. For the last few months I'd say 99% of all changes I do to both are purely via Claude Code, I almost never use my editor anymore at all. It's common things don't work on the first try or aren't exactly what I want but usually just giving an error to Claude or further instructions will fix it in an iteration or two.

I think the code organization isn't amazing, but it's fine and frankly not that much of a concern to me usually as I'm usually just reading diffs and not digging around in the code much myself.

▲

ffsm8 2 days ago | parent [-]

Totally of topic, but the other day I was considering trying out elixir for a mainly vibe coded project, mainly because i thought the way you can structure code in it should be pretty much optimal for LLM driven development.

I haven't tried it yet, but I thought elixirs easily implementable static analysis of code could make enforcement whenever the LLM goes off rails highly useful, and an umbrella architecture would make modularity well established.

Modules could all define their own contexts via nested CLAUDE.md and subagents could be used to give it explicit implementation details.

Did you try something like that before MGriisser? (successfully or not?)

	▲	MGriisser a day ago \| parent [-]
		Unfortunately I don't do anything nearly that sophisticated, I honestly barely even know Elixir, I had just used it a little bit at a previous job and thought it would be a nice choice to try for the web server part of an application I was building. I mostly use Claude in that repo for controllers, DB access, and front end via heex templates, often with LiveView. I find it can get a bit mixed up with heex stuff occasionally given the weirdness of nested code into the HTML and all that but I think on pure Elixir it usually does a good job.

▲

sksrbWgbfK 2 days ago | parent | prev | next [-]

Unless you write tower defense games all day long for a living, I don't know how it's interesting.

▲

rhubarbtree a day ago | parent | prev [-]

> Now of course you've made so many qualifiers in your post that you'll probably dismiss this as "not production", "not robust enough", "not clean" etc

Sure, my interest is whether it’s suitable for production use on an existing codebase, ie for what constitutes most of software engineering.

But - thanks for sharing, I will take a look and watch some of the stream.

▲

infamousclyde 2 days ago | parent | prev | next [-]

Jon Gjengset (of MIT Missing Semester, Rust for Rustsceans, etc) shared a stream doing complex changes of increasing complexity to a geospatial math library in Rust. He’s an excellent engineer, and was able to pick apart AI-suggested changes liberally. The caveat is that the video is a bit long, but segmented nicely.

I think he had a positive experience overall, but it was clear throughout the stream that he was not yielding control to a pure-agent workflow soon.

https://youtu.be/eZ7DVHAK8hw?si=vWW4kz2qiRRceNMQ

	▲	toth 2 days ago \| parent \| next [-]
		I think you shared the wrong link. Based on a quick youtube search I think you meant this one https://youtu.be/EL7Au1tzNxE
	▲	rhubarbtree a day ago \| parent \| prev [-]
		Thanks! This looks great (correct link in the child comment).

▲

ochronus 2 days ago | parent | prev | next [-]

I agree. Based on my very subjective and limited experience (plus friends/colleagues), when it comes to producing solutions, what you get from AI is what you get from your 2-day hackathon—then you spend months making it production-ready.

And your starry-eyed CEO is asking the same old question: How come everything takes so long when a 2-person team over two days was able to produce a shiny new thing?!. sigh

Could be used for early prototyping, though, before you hire your first engineers just to fire them 6 months later.

▲

jf22 2 days ago | parent [-]

Yeah but you get the two days of hacking in 15 minutes.

And I highly doubt you spend months, as in 5+ weeks at the least making it production ready.

What even is "production readiness?" 100% fully unit tested and ready for planetary hyper scale or something? 95% of the human generated software I work on is awful but somehow makes people money.

▲

ruszki 2 days ago | parent [-]

First of all, you can rarely write down in English, what you want in 15 minutes… It’s even common to have longer specification, than its implementation. Just look at tests. Especially, if you want to do something which was never done before, the disparity can be staggering.

Claude Code for example is also not that quick at all. It produces some code quickly, but even scaffolding three hello world level example projects together definitely takes more than an hour. And that’s with zero novelty. The first version of code is done quickly, but the continuous loop of self corrections after that takes a long time. Even with Serena, Context7, and other MCPs.

And, of course, without real code review. That’s easily hours even with just few thousands lines of code, if it uses something which you don’t know. But I know that almost everybody gave up understanding “their” “own” code, during vibe coding. Even before AIs, it was a well known fact, that real code reviewing is hard, and people rarely did it.

AI can make you quicker in certain situations, but these “15 minutes” claims are totally baseless. This is one reason why many people are against AIs, vibe coding, etc. These stupid claims which cannot hold even the smallest scrutiny.

	▲	jf22 10 hours ago \| parent [-]
		I'm not sure if you're using these tools if you think a weekend hackathon project can't be done in 15 minutes.

▲

coffeeri 2 days ago | parent | prev | next [-]

This video [0] is relevant, though it actually supports your point - it shows Claude Code struggling with non-trivial tasks and needing significant hand-holding.

I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

[0] https://www.youtube.com/watch?v=EL7Au1tzNxE

▲

thecupisblue 2 days ago | parent | next [-]

Great video! Even more, shows a few things - how good it is with such a niche language but also exposes some direct flaws.

First off, Rust represents quite a small part of the training dataset (last I checked it was under 1% of code dataset) in most public sets, so it's got waaay less training then other languages like TS or Java. You added 2 solid features, backed with tests and documentation and nice commit messages. 80% of devs would not deliver this in 2.5 hours.

Second, there was a lot of time/token waste messing around with git and git messages. Few tips I noticed that could help you in the workflow:

#1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

#2: Claude has hooks, if your favorite language has a formatter like rust fmt, just use hooks to run rust fmt and similar.

#3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

#5: Saying "max 50 characters title" doesn't really mean anything to the LLM. They have no inherent ability to count, so you are relying on probability, which is quite low since your context is quite filled at this point. If they want to count the line length, they also have to use external tools. This is an inherent LLM design issue and discussing it with an LLM doesn't get you anywhere really.

▲

newswasboring 2 days ago | parent | next [-]

> #3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

Heh, I write this for some production code too (python). I guess because python is not typed, I'm testing if my pydantic implementation works.

▲

komali2 2 days ago | parent | prev [-]

> #1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

I've not heard of this for, what does this mean practically? Some kind of invocation in claude? Opening another claude window?

	▲	theshrike79 a day ago \| parent \| next [-]
		Agents are basically separate "threads" with their own context window. So the main claude can tell the test-runner agent "Run tests using `task test` and return the results" Then the test-runner agent runs the tests, "wasters" its context by reading 500 lines of test results, sees that it's ok. Returns "tests ok" to the main context. This way the main context is spared from the useless chatter and can go on for longer.
	▲	thecupisblue 2 days ago \| parent \| prev \| next [-]
		Oh you're about to unlock a whole new level of token burning. There is an /agents command that lets you define agents for specific tasks or areas. Each of them has their own context and their own rules. Then claude can delegate the work to them when appropriate, or you can tell it directly to use the subagent, i.e. a subagent for your frontend, backend, specific microservice, database, etc etc. Quite depends on your workflow which ones you create/need, but they are a really nice quality of life change.
	▲	Aeolun 2 days ago \| parent \| prev [-]
		You ask claude to use an agent, and it’ll spawn a sub agent that takes a bunch of actions in a new context, then lets the original agent only know a summary of the results.

▲

Aeolun 2 days ago | parent | prev [-]

> I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

Or we’re just having too much fun making stuff to make videos to convince people that are never going to be convinced.

▲

Difwif 2 days ago | parent [-]

I took a quick informal poll of my coworkers and the majority of us have found workflows where CC is producing 70-99% of the code on average in PRs. We're getting more done faster. Most of these people tend to be anywhere from 5-12 yrs professional experience. There are some concerns that maybe more bugs are slipping through (but also there's more code being produced).

We agree most problems stem from: 1. Getting lazy and auto-accepting edits. Always review changes and make sure you understand everything. 2. Clearly written specification documents before starting complex work items 3. Breaking down tasks into a managable chunk of scope 4. Clean digestible code architecture. If it's hard for a human to understand (e.g: poor separation of concerns) it will be hard for the LLM too.

But yeah I would never waste my time making that video. Having too much fun turning ideas into products to care about proving a point.

	▲	rhubarbtree a day ago \| parent \| next [-]
		> Having too much fun turning ideas into products to care about proving a point. This is a strange response to me. Perhaps you and others aren’t aware that there’s a subculture of folks who livestream coding in general? Nothing to do with proving a point. My interest in finding such examples is exactly due to the posting of comments like yours - strong claims of AI success - that don’t reflect my experience. I want to see videos that show what I’m doing wrong, and why that gives very different results. I don’t have an agenda or point to prove, I just want to understand. That is the hacker way!
	▲	theshrike79 a day ago \| parent \| prev [-]
		2, 3, 4 are all what human coders need to be efficient too :) I'm kinda hoping that this LLM craze will force people to be better at it. Have documentation up to date and easily accessible is good for everyone. Like we're (over here) better at marking lines in the road, because the EU mandated lane keeping assist needs the road markings to be there or it won't work.

▲

simonw 2 days ago | parent | prev | next [-]

Armin Ronacher (long-time Python and Rust open source community figure, creator of Flask and Jinja among others) has several YouTube videos that partially fit the bill. https://www.youtube.com/watch?v=sQYXZCUvpIc and https://www.youtube.com/watch?v=Y4_YYrIKLac and https://www.youtube.com/watch?v=tg61cevJthc

▲

ku1ik 2 days ago | parent | next [-]

I watched one of those videos and it was very underwhelming, imho not really selling Claude Code to anyone who isn’t convinced.

	▲	theshrike79 a day ago \| parent [-]
		What was your expectation? For me LLM coding is 90% going from "hey this kind of tool would be cool" to a workable MVC in an evening. The 10% is me using it at work to debug issues or create boilerplate crap.

▲

rhubarbtree a day ago | parent | prev [-]

I’d say they’re a good match, thanks for sharing! Will watch in detail.

▲

stared 2 days ago | parent | prev | next [-]

I wouldn't dive for these. Vibe coding is a slot machine - sometimes you get wonderful results on the first prompt, more than often - not. So, a cherry-picked example is not a proof it works.

If you want me to show an example of vibe coding, I bet I can migrate someone's blog to Astro with Claude Code faster than a frontend engineer.

> It should not be on a greenfield project, because nearly all coding is not.

Well, Claude Code does not work the best for existing projects. (With some exceptions.)

▲

MontyCarloHall 2 days ago | parent | prev | next [-]

Forget a livestream, I want to hear from maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite). Highly capable coding LLMs have been out for long enough that if they do indeed have meaningful impact on writing non-trivial, non-greenfield/boilerplate code, it ought to be clearly apparent in an uptick of positive contributions to projects like these.

▲

stitched2gethr 2 days ago | parent | next [-]

This contains some specific data with pretty graphs: https://youtu.be/tbDDYKRFjhk?t=623

But if you do professional development and use something like Claude Code (the current standard, IMO) you'll quickly get a handle on what it's good at and what it isn't. I think it took me about 3-4 weeks of working with it at an overall 0x gain to realize what it's going to help me with and what it will make take longer.

	▲	MontyCarloHall 2 days ago \| parent [-]
		This is a great conference talk, thanks for sharing! To summarize, the authors enlisted a panel of expert developers to review the quality of various pull requests, in terms of architecture, readability, maintainability, etc. (see 8:27 in the video for a partial list of criteria), and then somehow aggregate these criteria into an overall "productivity score." They then trained a model on the judgments of the expert developers, and found that their model had a high correlation with the experts' judgment. Finally, they applied this model to PRs across thousands of codebases, with knowledge of whether the PR was AI-assisted or not. They found a 35-40% productivity gain for easy/greenfield tasks, 10-15% for hard/greenfield tasks, 15-20% for easy/brownfield tasks, and 0-10% for hard/brownfield tasks. Most productivity gains went towards "reworked" code, i.e. refactoring of recent code. All in all, this is a great attempt at rigorously quantifying AI impact. However, I do take one major issue with it. Let's assume that their "productivity score" does indeed capture the overall quality of a PR (big assumption). I'm not sure this measures the overall net positive/negative impact to the codebase. Just because a PR is well-written according to a panel of expert engineers doesn't mean it's valuable to the project as a whole. Plenty of well-written code is utterly superfluous (trivial object setters/getters come to mind). Conversely, code that might appear poorly written to an outsider expert engineer could be essential to the project (the highly optimized C/assembly code of ffmpeg comes to mind, or to use an extreme example, anything from Arthur Whitney). "Reworking" that code to be "better written" would be hugely detrimental, even though the judgment of an outside observer (and an AI trained on it) might conclude that said code is terrible.

▲

rhubarbtree a day ago | parent | prev | next [-]

Yes, this would be really useful.

AI coding should be transforming OSS, and we should be able to get a rough idea of the scale of the speed up in development. It’s an ideal application area.

▲

brookst 2 days ago | parent | prev [-]

So what percentage of human programmers, in the entire world, do you think contribute to meaningful projects like those?

	▲	MontyCarloHall 2 days ago \| parent [-]
		I picked these specific projects because they are a) mature, b) complex, and as a result c) unlikely to have development needs for lots of new boilerplate code. I would estimate the majority of developers spend most of their time on problems encompassing all three of these, even if their software is not as meaningful/widely used as the previous examples. Everyone knows that LLMs are fantastic at generating greenfield boilerplate very quickly. They are an invaluable rapid prototyping/MVP generation tool, and that in itself is hugely useful. But that's not where developers spend most of their time. They spend it maintaining complicated, mature codebases, and the utility of LLMs is much less proven for that use case. This utility would be most easily measured in contributions to open-source projects, since all commits are public and maintainers have no monetary incentive to misrepresent the impact of AI [0, 1, 2, ...]. [0] https://www.businessinsider.com/anthropic-ceo-ai-90-percent-... [1] https://www.cnbc.com/2025/06/26/ai-salesforce-benioff.html [2] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...

▲

sirpalee 2 days ago | parent | prev | next [-]

I had success with it on a large, established project when using it for refactoring, moving around functions, implementing simple things and writing documentation. It failed when implementing complex new features and horribly went off the rails when trying to debug issues. Almost all its recommendations were wrong, and it kept trying to change things that certainly weren't the problem.

	▲	apercu 2 days ago \| parent [-]
		This matches my experience as well. One unexpected benefit is that I learned a couple pieces of hardware inside and out because LLMs make so many mistakes. If I wouldn’t have used an LLM I wouldn’t have gone down all these rabbit holes based on incorrect info - I would have just read the docs and solved my use case but missed out on deeper understanding. Just reinforces my biases that LLMs are currently garbage for anything new and complicated. But they are a great interactive note taker and brainstorming tool.

▲

vincent_builds 2 days ago | parent | prev | next [-]

Author here. I think that's a great idea.

I've considered live-streaming my work a few times, but all my work is on closed-source backend applications with sensitive code and data. If I ever get to work on an open-source product, I'll ask about live-streaming it. I think it would be a fun experience.

Although I cannot show the live stream or the code, I am writing and deploying production code for a brownfield project.

Two recent production features:

1. Quota crossing detection system for billable metrics - Complex business logic for billing infrastructure - Detects when usage crosses configurable thresholds across multiple metric types - Time: 4 days while working on other smaller tasks in parallel work vs probably 10 days focused without AI

2. Sentry monitoring wrapper for metering cron jobs - Reusable component wrapping all cron jobs with Sentry monitoring capabilities - Time: 1 day parallelled with other tasks vs 2 days focused

As you can probably tell, my work is not glamorous :D. It's all the head-scratching backend work, extending the existing system with more capabilities or to make it more robust.

I agree there is a lot of hand-holding required, but I'm betting on the systems getting better as time goes on. We are only two years into this AI journey, and the capabilities will most likely improve over the next few years.

▲

mathieuh 2 days ago | parent | prev | next [-]

I actually don't think I've ever had AI solve a non-trivial problem by itself. I do find it useful but I always have to give it the breakthrough which it can then implement.

▲

dewey 2 days ago | parent | prev | next [-]

One of these things where you just have to put in the work yourself for a while and see how it works for your workflow and project.

▲

rhubarbtree a day ago | parent [-]

That’s unusual though? I think programming languages, idioms, features - for example - are adopted by consensus, not by every programmer starting out from scratch and evaluating each one.

▲

theshrike79 a day ago | parent [-]

So if the "consensus" adopts ... Erlang, you will just start using it?

And because this "consensus" adopted it, you know what it's good for and what kind of problems its good at solving and whether it's a good option for what you specifically are doing?

Using LLMs is a skill that's (currently) a bit hard to teach, it's a ball of math and vectors that doesn't work in a deterministic way. Some magic words in the prompt will try to make it do something, but not always.

You really need to use one, preferably a few different ones, and get a feel for how they operate. Like driving a car. You can watch 420 hours of videos of people driving cars, but you really need to sit in one to get comfortable doing it.

	▲	rhubarbtree 13 hours ago \| parent [-]
		> So if the "consensus" adopts ... Erlang, you will just start using it? If everyone’s using it I will certainly learn it, yes.

▲

adastral 2 days ago | parent | prev | next [-]

PostgresTV livestreams "vibe coding" 1h sessions implementing small PostgreSQL features with Cursor (mostly claude-4-sonnet model) every week, by experienced PostgreSQL contributors. [0] is their latest stream.

I personally have not watched much, but it sounds just like what you are looking for!

[0] https://www.youtube.com/watch?v=3MleDtXZUlM

	▲	rhubarbtree a day ago \| parent [-]
		Yes, good enough for me, thanks. I look forward to watching it. This is particularly interesting as it’s a group stream.

▲

boesboes 2 days ago | parent | prev | next [-]

I've been using it to do all my work for the last month or two and have decided it's not worth it. I haven't made any recordings or anything, so this is purely my subjective experience: it's ok at greenfield stuff with some hand-holding to do things properly all the time. It knows the framework well, but won't try to use it correctly and go off on weird detours to 'debug' things that fail because of it. But on a bigger refactor of legacy code, that is well tested and the 'migration' process to the new architecture documented it just was very infuriating. One moment it seems to be doing alright and then suddenly I'm going backwards for days because it just makes things look like they work. It gets stuck on bad idea's and keeps trying them. Keeps making the same mistakes over and over, despite clear instruction on how to do it correctly..

I think it misses a feedback loop. Something that evaluates what went wrong, what works, what wont, and remembers that and then can use that to make better plans. From making sure it runs the tests correctly (instead of trying 5 different methods each time) to how to do TDD and what comments to add.

▲

sunnyam 2 days ago | parent [-]

I have the same opinion, but my worry with this attitude is that it's going to hold me back in the long run.

A common thread in articles about developers using AI is that they're not impressed at first but then write more precise instructions and provide context in a more intuitive manner for the AI to read and that's the point at which they start to see results.

Would these principles not apply to regular developers as well? I suspect that most of my disappointment with these tools is that I haven't spend enough time learning how to use them correctly.

With Claude Code you can tell it what it did wrong. It's a bit hit-or-miss as to whether it will take your comments on board (or take them too literally) but I do think it's too powerful a tool to just ignore.

I don't want someone to just come and eat my cake because they've figured out how to make themselves productive with it.

	▲	apercu 2 days ago \| parent [-]
		I think of current state LLMs as precocious but green assistants that are sometimes useful but often screw up. It requires a significant amount of hand holding, still usually a net positive in my workflow but only (arbitrarily) a modest productivity bump (e.g. 10-15%). I feel like if I can get better at reigning in LLMs I can improve this productivity enhancement a bit more, but the idea that we can wholesale replace technical people is not realistic yet. If I were a non-tech, non-specialist and/or had no business skills/experience and my job was mostly office admin I would be retraining however, because those jobs may be over except as vanity positions.

▲

ochronus a day ago | parent | prev | next [-]

<fun> Just found the video you're looking for! https://www.youtube.com/watch?v=JeNS1ZNHQs8 </fun>

▲

sunir 2 days ago | parent | prev | next [-]

I’ve built an agent system to quality control the output following my engineering know how.

The quality is much better but it is much slower than a human engineer. However that’s irrelevant to me. If I can build two projects a day I am more productive than if I can build one. And more importantly I can build projects that increase my velocity and capability.

The difference is I run my own business so that matters to me more than my value or aptitude as an engineer.

▲

sdeframond 2 days ago | parent | prev | next [-]

Does any experienced dev have experience outsourcing to another dev that produces clean robust code that solves a non trivial problem (ie not tic tac toe or a landing page) more quickly than she would by herself?

I think not.

The reason is about missing context. Such non-trivial problems have a lot of specific unwritten context. It takes a lot of effort to share that context. Often more than doing anything one self.

▲

lysecret 2 days ago | parent | prev | next [-]

https://news.ycombinator.com/item?id=44159166

▲

Kiro 2 days ago | parent | prev | next [-]

Very few people want to record themselves doing stuff or have an incentive to convince anyone except for winning internet arguments.

▲

nosianu 2 days ago | parent | next [-]

> Very few people .... have an incentive to convince anyone

We are already only talking about the subset the writes AI blog posts, not about all of humanity.

▲

rhubarbtree a day ago | parent | prev [-]

Streaming coding is a popular and widespread activity, and it is usually nothing to do with “convincing” folks.

	▲	Kiro a day ago \| parent [-]
		I was talking about people successfully using LLMs having no reason to convince anyone. Their success is not dependent on converting or even informing other people.

▲

benterix 2 days ago | parent | prev | next [-]

I guess someone could make such a video, the question is, would anyone have the patience to watch it.

▲

coverj a day ago | parent | prev | next [-]

Are there even non-AI development streams/videos that meet this criteria?

▲

2 days ago | parent | prev | next [-]

[deleted]

▲

2 days ago | parent | prev | next [-]

[deleted]

▲

wooque 2 days ago | parent | prev | next [-]

You got it wrong, the purpose of this blog post is not marketing Claude Code, but marketing their company. Writing about AI just happens to get more eyeballs.

▲

sneak 2 days ago | parent | prev | next [-]

Most of the code I write is greenfield projects. I’m pretty spoiled, I guess. Claude Code has helped me ship a lot of things I always wanted to build but didn’t have time to do.

▲

FirmwareBurner 2 days ago | parent | prev | next [-]

[flagged]

▲

thecupisblue 2 days ago | parent | prev | next [-]

[flagged]

▲

izacus 2 days ago | parent | next [-]

People live stream their work all the time, it's really not unreasonable to ask for an example/tutorial on how to use the technology in the real world.

▲

thecupisblue 2 days ago | parent | next [-]

Yes, people who are:

- Working on hobby projects/sideprojects

- Working on open-source projects

- Selling stuff

For someone to create this example, they would either have to do it in a codebase they don't have problem open sourcing or which is open source, so they do not break NDA's and divulge company info/source code.

How many people are ready to do that?

The conditions of the OP are:

- No demo, independent programmer

- Non-greenfield project

- Non-trivial problem

- Code deployed in production and robust

- Code review, test, testing, PR creation

- Person be willing to live-stream their work and code while building

Which is a pretty unreasonable set of conditions to prove "it works", when the person could read a tutorial and try it themselves.

▲

rhubarbtree 2 days ago | parent [-]

I’m happy for it to be on OSS, so long as that software is reasonably well known (ie user base is not a handful of people) and is used in production.

	▲	newswasboring 2 days ago \| parent [-]
		What difference does it make how many people use it? Complex software exists all over the world for handful of users. I personally work in an industry where anything we create will be used by at max 100 people worldwide. Does it diminish the complexity of code? I think not.

▲

Kiro 2 days ago | parent | prev [-]

The people live streaming their work is a minuscule percentage of all programmers. And you can ask but the incentive to make such a video is not there unless you're selling an AI product yourself, which reduces the sample even more.

▲

ochronus 2 days ago | parent | prev | next [-]

I think you completely missed the original point :/

▲

mattmanser 2 days ago | parent | prev | next [-]

Firstly the OP asked has anyone done it, not will they do it.

Secondly, extraordinary claims require extraordinary evidence.

	▲	ochronus 2 days ago \| parent [-]
		This.

▲

troupo 2 days ago | parent | prev [-]

> You want someone to spend their time to live-stream their codebase and them working on it using Claude code, which will then make it into production, going through all the processes on a non-greenfield project just so you can be convinced that it is worth it

Why not? Plenty of people stream their work that later makes into production. Gaming community for example has no end of people building their games publicly.

And yet, for all the "amazing one-shot capabilities that obviate the need for programmers" no one streams working with any of the AI tools.

All we have is unverifiable claims like yours.

▲

thecupisblue 2 days ago | parent [-]

>amazing one-shot capabilities that obviate the need for programmers

"Amazing one-shot capabilities" is not the same as "an extremely useful tool that saves a ton of time"

	▲	troupo 2 days ago \| parent [-]
		So where are all the streams using this "extremely useful tools that saves a lot of time"? If it's as useful or saving a lot of time, it would be a no brainer for people who build in public to use this tool, right?

▲

brookst 2 days ago | parent | prev [-]

You’re coming at this from a highly biased and even angry position, which means I don’t think you’ll be satisfied with anything people can show you.

Which isn’t entirely unreasonable; AI is not really there yet. If you took this moment and said AI will never get better, and tools and processes will never improve to better accommodate AI, and the only fair comparison is a top-tier developer, and the only legitimate scenario is high quality human-maintainable code at scale… then yes, AI coding is a lot of hype with little value.

But that’s not what’s going on, is it? The trajectory here is breathtaking. A year ago you could have set a much lower bar and AI still would have failed. And the tooling to automate PRs and documentation was rough.

AI is already providing massive leverage to both amateur and professional developers. They use the tools differently (in my world the serious developers mostly use it for boilerplate and tests).

I don’t think you’ll be convinced if the value until the revolution is in the past. Which is fine! For many of us (me being in the amateur but lifelong programmer camp) it’s already delivering value that makes its imperfections worthwhile.

Is the code I’m generating world class, ready to be handed over to humans at enterprise sclae? No, definitely not. But it exists, and the scale of my amateur projects has gone through the roof, while quality is also up because tests take near zero effort.

I know it won’t convince you, and you have every right to be skeptical and dismiss the whole thing as marketing. But IMO rejecting this new tech in the short term means you’re in for a pretty rough time when the evidence is so insurmountable. Which might be a year or two. Or even three!