This works until you get to the point that your actual programming skills atrophy due to lack of use.

Face it, the only reason you can do a decent review is because of years of hard won lessons, not because you have years of reading code without writing any.

▲ sevensor 3 days ago | parent | next [-]

What the article describes is:

1. Learn how to describe what you want in an unambiguous dialect of natural language.

2. Submit it to a program that takes a long time to transform that input into a computer language.

3. Review the output for errors.

Sounds like we’ve reinvented compilers. Except they’re really bad and they take forever. Most people don’t have to review the assembly language / bytecode output of their compilers, because we expect them to actually work.

▲

ako 3 days ago | parent [-]

No, it sounds like the work of a product manager, you’re just working with agents rather than with developers.

▲

sarchertech 3 days ago | parent | next [-]

Product managers never get that right though. In practice it always falls back on the developer to understand the problem and fill in the missing pieces.

In many cases it falls on the developer to talk the PM out of the bad idea and then into a better solution. Agents aren’t equipped to do any of that.

For any non trivial problem, a PM with the same problem and 2 different dev teams will produce a drastically different solutions 99 times out of 100.

▲

ako 3 days ago | parent [-]

Agree with the last bit, dev teams are even more non-deterministic than LLMs.

▲

sarchertech 3 days ago | parent [-]

Dev teams are much less non-deterministic than LLMs. If you ask the same dev team to build the same product multiple times they’ll eventually converge on the producing the same product.

The 2nd time it will likely be pretty different because they’ll use what they learned to build it better. The 3rd time will be better still, but each time after that it will essentially be the same product.

An LLM will never converge. It definitely won’t learn from each subsequent iteration.

Human devs are also a lot more resilient to slight changes in requirements and wording. A slight change in language that wouldn’t impact a human at all will cause an LLM to produce completely different output.

▲

ako 3 days ago | parent [-]

An LLM within the right context/environment can also converge: just like with humans you need to provide guidelines, rules, and protocols to instruct how to implement something. Just like with humans I’ve used the approach you describe: generate something one until it works the way you want it, then ask it so document insights, patterns and rules, and for the next project instruct it to follow the rules you persisted. Will result in more or less the same project.

Humans are very non deterministic: if you ask me to solve a problem today, the solution will be different from last week, last year or 10 years ago. We’ve learnt to deal with it, and we can also control the non-determinism of LLMs.

And humans are also very prone to hallucinations: remember those 3000+ gods that we’ve created to explain the world, or those many religions that are completely incompatible? Even if some are true, most of them must be hallucinations just by being incompatible to the others.

	▲	sarchertech 3 days ago \| parent [-]
		That only works with very small projects to the point where the specification document is a very large percentage of the total code. If you are very experienced, you won’t solve the problem differently day to day. You probably would with a 10 year difference, but you won’t ever be running the next model 10 years out (even if the technology matures), so there’s no point in doing that comparison. Solving the same problem with the same constraints in radically different ways day to day comes from inexperience (unless you’re exploring and doing it on purpose). Calling what LLMs do hallucinations and comparing it to human mythology is stretching the analogy into absurdity.

▲

didibus 2 days ago | parent | prev | next [-]

If you work at "product manager" level, that would be vibe coding, as you prompt for functional and non-functional changes and you review the behavior and characteristics of the program, not the code generated.

I believe the author was trying to specifically distinguish their workflow from that, in that they are prompting for changes to the code in terms of the code itself, and reviewing the code that is generated (maybe along with also mentioning the functionality and testing it).

▲

sevensor 3 days ago | parent | prev | next [-]

What I described is precisely the reception of early compilers. How is the LLM different? It’s slower? Its input looks more like natural language? Its output is less reliable? It runs on somebody else’s computer? What’s the essential difference between these two technologies that transform one text into another?

▲

skydhash 3 days ago | parent | prev | next [-]

Is it the work of a product manager? I believe the latter only specify features and business rules (and maybe some other specifications like UX and performance). But no technical details at all. That would be like an architect reviewing the brand of nails used in an house framing.

▲

Graphon1 3 days ago | parent | prev [-]

Tech Lead, not PM. (in my experience)

▲ MisterTea 4 days ago | parent | prev | next [-]

Coding interview of the future: "Show us how you would prompt this binary sort."

▲

brothrock 3 days ago | parent | next [-]

I think this is better than many current coding interview methods. Assuming you have an agent setup to not give the interviewee the answer directly.

Of course there are times when you need someone extremely skilled at a particular language. But from my experience I would MUCH prefer to see how someone builds out a problem in natural language and have guarantees to its success. I’ve been in too many interviews where candidates trip over syntax, pick the wrong language, or are just not good at memorization and don’t want to look dumb looking things up. I usually prefer paired programming interviews where I cater my assistance to expectations of the position. AI can essentially do that for us.

▲

Herring 3 days ago | parent [-]

Yeah research says the interview process should match the day to day expectations as closely as possible, even to a trial day/week/month. All these leetcode and tricky puzzles are very low on signal. They don't tell you how a person will do on the job at all, not to mention they're bad for women and minorities.

	▲	brothrock 2 days ago \| parent [-]
		preach. Take home assignments are another example. It gives huge bias towards those with free time.

▲

joenot443 3 days ago | parent | prev | next [-]

My understanding is it's already here [1]

[1] https://news.ycombinator.com/item?id=44723289

▲

Graphon1 3 days ago | parent | prev [-]

not a joke.

Also, the future you are referring to is... like... 6 weeks from now.

▲ lenerdenator 4 days ago | parent | prev | next [-]

Agreed.

> Hand it off. Delegate the implementation to an AI agent, a teammate, or even your future self with comprehensive notes.

The AI agent just feels like a way to create tech debt on a massive scale while not being able to identify it as tech debt.

▲

CuriouslyC 3 days ago | parent | next [-]

I have a static analysis and refactoring tool that does wonders to identify duplication and poor architecture patterns and provide a roadmap for agents to fix the issues. It's like magic, just point it at your codebase then tell the agent to grind away at the output (making sure to come up for air and rerun tests regularly) and it'll go for hours.

▲

lenerdenator 3 days ago | parent [-]

What's it called?

▲

CuriouslyC 3 days ago | parent [-]

Official release is tomorrow, I'm just doing final release prep cleanup and getting the product page in order, but the crate/brew formula is in decent shape, just missing some features I'll be shipping soon. https://github.com/sibyllinesoft/valknut if you want to jump the line though.

▲

svieira 3 days ago | parent [-]

A quick pass through this repository validates that there is a lot of tech debt in this code too. For a very simple example, consider that the analysis code is doing string searches to attempt to detect various syntatic constructs. https://github.com/sibyllinesoft/valknut/blob/aaf2b818a97b8d...

	▲	CuriouslyC 3 days ago \| parent [-]
		I realize, my marketing page even puts it front and center (I use the tool to analyze the tool, yay dogfood). I'm compute limited at the moment and I don't have cycles to burn refactoring this code base since it's pretty close to feature complete right now, I need to put all my compute towards finishing up development of a few projects I'm trying to ship this week, and some exhaustive bechmark matrixes. Regarding that string search, you really have to fight Claude to get it to use tree sitter consistently, I have to do a search through my codebase to build an audit list for this stuff.

▲

segfaultex 3 days ago | parent | prev [-]

This is what a lot of business leaders miss.

The benefits you might gain from LLMs is that you are able to discern good output from bad.

Once that's lost, the output of these tools becomes a complete gamble.

	▲	bpt3 3 days ago \| parent [-]
		The business leaders already can't discern good from bad.

▲ CuriouslyC 3 days ago | parent | prev | next [-]

You're right, reviews aren't the way forward. We don't do code reviews on compiler output (unless you're writing a compiler). The way forward is strong static and analytic guardrails and stochastic error correction (multiple solutions proposed with LLM as a judge before implementation, multiple code review agents with different personas that have been prompted to be strict/adversarial but not nit-pick) with robust test suites that have also been through multiple passes of audits and red-teaming by agents. You should rarely have to look at the code, it should be a significant escalation event like when you need to coordinate with Apple due to XCode bugs.

▲ JackSlateur 3 days ago | parent | next [-]

Static and analytic guardrails ??

Unless you are writing some shitty code for a random product that will be used for some demo then trashed, code can be resumed to a simple thing:

  Code is a way to move ideas into the real world through a keyboard

So, reading that the future is using a random machine with an averaged output (by design), but that this output of average quality will be good enough because the same random machine will generate tests of the same quality : this is ridiculous

Tests are probably the thing you should never build randomly, you should put a lot of thoughts in them: do they make sense ? Do your code make sense ? With tests, you are forced to use your own code, sometimes as your users will

Writing tests is a good way to force yourself to be empathic with your users

People that are coding through IA are the equivalent of the pre-2015 area system administrators that renewed TLS certificates manually. They are people that can (and are replacing themselves) with bash scripts. I don't miss them and I won't miss this new kind.

▲

CuriouslyC 3 days ago | parent [-]

I actually have a bayesian stochastic process model for LLM codegen that incorporates the noisy channel coding theorem, it turns out that just like noisy communications channels can be encoded to give arbitrarily low error rate communication, LLM agents workflows can be coded to give arbitrarily low final error rate output. The only limitation on this is when model priors are highly mis-aligned with the work that needs to be done, in that case you need hard steering via additional context.

▲

JackSlateur 3 days ago | parent [-]

Which model gives you creatives outputs ?

	▲	CuriouslyC 3 days ago \| parent [-]
		Creative outputs start with gemini (because of long context support, it can get longform stuff right), with successive refinement passes using claude for line/copy edits because (it's the least purple).

▲ lelanthran 3 days ago | parent | prev | next [-]

> You should rarely have to look at the code, it should be a significant escalation event

This is the bit I am having problems with: if you are rarely looking at the code, you will never have the skills to actually debug that significant escalation event.

▲ dingnuts 3 days ago | parent | prev [-]

good fucking luck writing adequate test suites for qualitative business logic

if it's even possible it will be more work than writing the code manually

▲ gobdovan 3 days ago | parent | prev | next [-]

For generative skills I agree, but for me the real change is in how I read and debug code. After reading so much AI-generated code with subtle mistakes, I can spot errors much quicker even in human-written code. And when I can't, that usually means the code needs a refactor.

I'd compare it to gym work: some exercises work best until they don't, and then you switch to a less effective exercise to get you out of your plateau. Same with code and AI. If you're already good (because of years of hard won lessons), it can push you that extra bit.

But yeah, default to the better exercise and just code yourself, at least on the project's core.

▲

suddenlybananas 3 days ago | parent [-]

What do you mean you can spot errors much quicker?

▲

gobdovan 3 days ago | parent [-]

I mean that I've read so much AI generated code with subtle mistakes that my brain jumps straight to the likely failure point and I've noticed it generalizes. Even when I look at an OSS project I'm not super familiar with, I can usually spot the bugs faster then before. I'll edit my initial response for clarity.

▲

N2yhWNXQN3k9 3 days ago | parent [-]

> subtle mistakes that my brain jumps straight to the likely failure ... I can usually spot the bugs faster then before

doubt intensifies

	▲	3 days ago \| parent \| next [-]
		[deleted]
	▲	gobdovan 3 days ago \| parent \| prev [-]
		Doubt accepted. A spot-the-bug challenge on real OSS/prod code would be fun.

▲ ankrgyl 3 days ago | parent | prev | next [-]

(Author here) Personally, I try to combat this by synchronously working on 1 task and asynchronously working on others. I am not sure it's perfect, but it definitely helps me avoid atrophy.

	▲	iman453 3 days ago \| parent [-]
		By synchronously working on 1 do you mean coding it with minimal AI? Nice article by the way. I've found my workflow to be pretty much exactly the same using Claude code.

▲ NooneAtAll3 3 days ago | parent | prev [-]

so... normal team lead -> manager pipeline?