Remix.run Logo
pron 7 days ago

Programmers (and I'm including myself here) often go to great lengths to not think, to the point of working (with or without a coding assistant) for hours in the hope of avoiding one hour of thinking. What's the saying? "An hour of debugging/programming can save you minutes of thinking," or something like that. In the end, we usually find that we need to do the thinking after all.

I think coding assistants would end up being more helpful if, instead of trying to do what they're asked, they would come back with questions that help us (or force us) to think. I wonder if a context prompt that says, "when I ask you to do something, assume I haven't thought the problem through, and before doing anything, ask me leading questions," would help.

I think Leslie Lamport once said that the biggest resistance to using TLA+ - a language that helps you, and forces you to think - is because that's the last thing programmers want to do.

ChrisMarshallNY 7 days ago | parent | next [-]

I do both. I like to develop designs in my head, and there’s a lot of trial and error.

I think the results are excellent, but I can hit a lot of dead ends, on the way. I just spent several days, trying out all sorts of approaches to PassKeys/WebAuthn. I finally settled on an approach that I think will work great.

I have found that the old-fashioned “measure twice, cut once” approach is highly destructive. It was how I was trained, so walking away from it was scary.

rablackburn 7 days ago | parent [-]

> I have found that the old-fashioned “measure twice, cut once” approach is highly destructive. It was how I was trained, so walking away from it was scary.

To be fair it’s great advice when you’re dealing with atoms.

Mutable patterns of electrons, not so much (:

cycomanic 7 days ago | parent | prev | next [-]

> Programmers (and I'm including myself here) often go to great lengths to not think, to the point of working (with or without a coding assistant) for hours in the hope of avoiding one hour of thinking. What's the saying? "An hour of debugging/programming can save you minutes of thinking," or something like that. In the end, we usually find that we need to do the thinking after all.

This is such a great observation. I'm not quite sure why this is. I'm not a programmer, but a signal-processing/system engineer/researcher. The weird thing seems that it's the process of programming that causes the "not-thinking" behaviour, e.g. when I program a simulation and I find that I must have a sign error somewhere in my implementation (sometimes you can see this from the results), I end up switching every possible sign around instead of taking a pen and pencil and comparing theory and implementation, if I do other work, e.g. theory, that's not the case. I suspect we try to avoid the cost of the context switch and try to stay in the "programming-flow".

polskibus 7 days ago | parent | next [-]

This is your brain trying conserve your energy/time by recollecting/brute-forcing/following known patterns, instead of diving into unknown. Otherwise known as „being lazy” / procrastinating.

nine_k 7 days ago | parent | prev [-]

There is an illusion that the error is tiny, and its nature is obvious, so it could be fixed by an instant, effortless tweak. Sometimes it is so (when the compiler complains about a forgotten semicolon), sometimes it may be arbitrarily deeply wrong (even it manifests just as a reversed sign).

PaulHoule 7 days ago | parent | prev | next [-]

Sometimes thinking and experimenting go together. I had to do some maintenance on some Typescript/yum that I didn't write but had done a little maintenance.

Typescript can make astonishingly complex error messages when types don't match up so I went through a couple of rounds of showing the errors to the assistant and getting suggestions to fix it that were wrong but I got some ideas and did more experiments and over the course of two days (making desired changes along the way) I figured out what was going wrong and cleared up the use of types such that I was really happy with my code and when I saw a red squiggle I usually knew right away what was wrong and if I did ask the assistant it would also get it right right away.

I think there's no way I would have understood what was going on without experimenting.

xyzzy123 7 days ago | parent [-]

Agree, also llms change the balance of plan vs do for me, sometimes it cheaper to do & review than up-front plan.

When you can see what goes wrong with the naive plan you then have all the specific context in front of you for making a better plan.

If something is wrong with the implementation then I can ask the agent to then make a plan which avoids the issues / smells I call out. This itself could probably be automated.

The main thing I feel I'm "missing" is, I think it would be helpful if there were easier ways to back up in the conversation such that the state of the working copy was restored also. Basically I want the agent's work to be directly integrated with git such that "turns" are commits and you can branch at any point.

cruffle_duffle 7 days ago | parent | prev | next [-]

I like that prompt idea. Because I hate hate hate when it just starting “doing work”. Those things are much better as sounding board for ideas and clarifying my thinking than writing one-shot code.

panarky 7 days ago | parent | prev | next [-]

> assume I haven't thought the problem through

This is the essence of my workflow.

I dictate rambling, disorganized, convoluted thoughts about a new feature into a text file.

I tell Claude Code or Gemini CLI to read my slop, read the codebase, and write a real functional design doc in Markdown, with a section on open issues and design decisions.

I'll take a quick look at its approach and edit the doc to tweak its approach and answer a few open questions, then I'll tell it to answer the remaining open questions itself and update the doc.

When that's about 90% good, I'll tell the local agent to write a technical design doc to think through data flow, logic, API endpoints and params and test cases.

I'll have it iterate on that a couple more rounds, then tell it to decompose that work into a phased dev plan where each phase is about a week of work, and each task in the phase would be a few hours of work, with phases and tasks sequenced to be testable on their own in frequent small commits.

Then I have the local agent read all of that again, the codebase, the functional design, the technical design, and the entire dev plan so it can build the first phase while keeping future phases in mind.

It's cool because the agent isn't only a good coder, it's also a decent designer and planner too. It can read and write Markdown docs just as well as code and it makes surprisingly good choices on its own.

And I have complete control to alter its direction at any point. When it methodically works through a series of small tasks it's less likely to go off the rails at all, and if it does it's easy to restore to the last commit and run it again.

oblio 7 days ago | parent [-]

1. Shame on you, that doesn't sound like fun vibe coding, at all!

2. Thank you for the detailed explanation, it makes a lot of sense. If AI is really a very junior dev that can move fast and has access to a lot of data, your approach is what I imagine works - and crucially - why there is such a difference in outcomes using it. Because what you're saying is, frankly, a lot of work. Now, based on that work you can probably double your output as a programmer, but considering the many code bases I've seen that have 0 documentation, 0 tests, I think there is a huge chunk of programmers that would never do what you're doing because "it's boring".

3. Can you share maybe an example of this, please:

> and write a real functional design doc in Markdown, with a section on open issues and design decisions.

Great comment, I've favorite'd it!

pjmlp 7 days ago | parent | prev | next [-]

I agree with your comment in general, however I would say that on my field, the resistence to TLA+ isn't having to think, rather having to code twice without guarantees that it actually maps to the theorical model.

Tools like Lean and Dafny are much more appreciated, as they generate code from the model.

pron 7 days ago | parent [-]

But both Dafny and Lean (which are really hard to put in the same category [1]) are used even less than TLA+, and the problem of formally tying a spec to code exists only when you specify at a level that's much higher than the code, which is what you want most of the time because that's where you get the most bang for you buck. It's a little like saying that the resistance to blueprints is that a rolled blueprint makes a poor hammer.

TLA+ is for when you have a 1MLOC database written in Java or a 100KLOC GC written in C++ and you want to make sure your design doesn't lead to lost data or to memory corruption/leak (or for some easier things, too). You certainly can't do that with Dafny, and while I guess you could do it in Lean (if you're masochistic and have months to spare), it wouldn't be in a way that's verifiably tied to the code.

There is no tool that actually formally ties spec to code in any affordable way and at real software scale, and I think the reason people say they want what doesn't exist is precisely because they want to avoid the thinking that they'll have to do eventually anyway.

[1]: Lean and TLA+ are sort-of similar, but Dafny is something else altogether.

sirwhinesalot 7 days ago | parent | next [-]

Architectural blueprints are very precise. What gets built is a more detailed form of what is in the blueprint.

That is not the case for the TLA+ spec and your 1MLOC Java Database. You hope with fingers crossed that you've implemented the design, but have you?

I can measure that a physical wall has the same dimensions as specified in the blueprint. How do I know my program follows the TLA+ spec?

I'm not being facetious, I think this is a huge issue. While Dafny might not be the answer we should strive to find a good way to do refinement.

And the thing is, we can do it for hardware! Software should actually be easier, not harder. But software is too much of a wild west.

That problem needs to be solved first.

pron 6 days ago | parent [-]

> That is not the case for the TLA+ spec and your 1MLOC Java Database.

That is the case. Of course, nobody bothers to write the TLA+ proof that that is the case, because even if somebody had the resources to do it, the ROI on doing that is not good. If you can avoid 4 major bugs with 10 hours of work, you probably won't want to work an extra 10,000 hours to avoid two additional minor ones. That most people choose to stop when the ROI gets bad and not when they achieve perfection is not a problem.

The question isn't what tool guarantees perfection (there isn't one), but what toolset can reduce the greatest number of (costly) bugs with the least effort, and tools that help you think rigorously about design are a part of such a toolset.

> You hope with fingers crossed that you've implemented the design, but have you?

The same way you always validate that you've implemented what you intended - which is more than just keeping your fingers crossed - except that TLA+'s job is to make sure that what you intend actually works (if implemented).

> While Dafny might not be the answer we should strive to find a good way to do refinement.

TLA+ does refinement in a much more powerful way than Dafny. Neither is able to do it from a high-level design to a large and realistic codebase, certainly in any afforable way, but nothing can. I guess that is a problem, but it's not the problem we can solve, and there are other big problems that we can.

sirwhinesalot 6 days ago | parent [-]

Too defeatist. If much of the software infrastructure of the world was built on say... Idris, we could do it. That's the promise of dependent types, proof carrying code.

Can we extend that to large scale software? There's no obvious barrier to it, beyond a lack of existing provably correct code to build upon.

I don't expect this to change, however, since the cost/benefit ratio just isn't there. And that makes me sad. We build everything on quicksand.

pron 6 days ago | parent [-]

> There's no obvious barrier to it

It's like saying that there's no barrier to turning lead into gold except for the fact that it's easier to get gold by other means; that's a good thing! The cost of deriving formal deductive proofs is high, while unsound methods are much cheaper and highly effective.

The reason proofs can be expensive is that proving things has an intrinsic computational complexity cost (it's a search problem in a large space). A decade ago I summarised some relevant results of the last three or four decades about the difficulty in proving programs correct: https://pron.github.io/posts/correctness-and-complexity

If software were generally simple enough for all of its interesting properties to be easily proven, that would mean that there wouldn't be much point in the software at all. I think it was a formal methods researcher/practitioner at NASA who once said something like, "computers were built to surprise us; if they didn't, there would be no point in building them in the first place."

> We build everything on quicksand.

You do realise that while an abstract algorithm or a program on the page can be proven correct, it is impossible to prove that a software system is correct, because it's a physical system, and not a mathematical object anymore (you cannot prove that hardware will behave as specified). If mathematical proofs were the only thing that's not "quicksand", then everything in the physical world is quicksand.

pjmlp 6 days ago | parent | prev [-]

Appreciation isn't the same as market share, formal proofs in general are pretty much inexistent in enterprise, unless there are legal requirements to do otherwise.

I fail to see how you validate that TLA+ model is actually correctly mapped into the written Java code.

pron 6 days ago | parent | next [-]

> formal proofs in general are pretty much inexistent in enterprise, unless there are legal requirements to do otherwise.

Formal proofs are rarely used when specifying with TLA+, too, BTW. Writing formal proofs (as you would in Lean) has a very low ROI, and even formal method fans (like me) would tell you that's a tool you should reach for very rarely, and only if you must.

> I fail to see how you validate that TLA+ model is actually correctly mapped into the written Java code.

You don't (not even with Lean), but that we can't have cars that are completely crash-proof doesn't mean that's the standard for accepting or rejecting a safety measure. With TLA+ you can make sure that the design that you have (and you can't validate is actually implemented in code with or without TLA+) is actually good.

In other words, it lets us think about design rigorously. Maybe that's not all we wish for, but it's a lot, and it's not like there are better, easier ways of doing that. Of course, if the goal is to avoid thinking hard about design, then a tool that helps us think even harder isn't what we want.

hwayne 6 days ago | parent | prev [-]

Main way we're validating that now is by using TLA+ models to generate test suites. Mongo came out with a new paper on this recently: https://will62794.github.io/assets/papers/mdb-txns-modular-v...

creamyhorror 7 days ago | parent | prev | next [-]

> "An hour of debugging/programming can save you minutes of thinking,"

I get what you're referring to here, when it's tunnel-vision debugging. Personally I usually find that coding/writing/editing is thinking for me. I'm manipulating the logic on screen and seeing how to make it make sense, like a math problem.

LLMs help because they immediately think through a problem and start raising questions and points of uncertainty. Once I see those questions in the <think> output, I cancel the stream, think through them, and edit my prompt to answer the questions beforehand. This often causes the LLM's responses to become much faster and shorter, since it doesn't need to agonise over those decisions any more.

alfalfasprout 7 days ago | parent | prev | next [-]

it's funny, I feel like I'm the opposite and it's why I truly hate working with stuff like claude code that constantly wants to jump into implementation. I want to be in the driver's seat fully and think about how to do something thoroughly before doing it. I want the LLM to be, at most, my assistant. Taking on the task of being a rubber duck, doing some quick research for me, etc.

It's definitely possible to adapt these tools to be more useful in that sense... but it definitely feels counter to what the hype bros are trying to push out.

makeitdouble 7 days ago | parent | prev | next [-]

In general agreement about the need to think it through, and she should be careful to not oraise the other extreme.

> "An hour of debugging/programming can save you minutes of thinking"

The trap so many dev fall into is assuming code behaves like they think it is. Or believing documentation or seemingly helpful comments. We really want to believe.

People's mental image is more often than not wrong, and debugging tremendously helps bridge the gap.

Fanmade 6 days ago | parent | prev [-]

Absolutely! I have used Copilot for a few weeks and then stopped when I worked on a machine that didn't have Copilot installed and I immediately struggled with even basic syntax. Now I often use LLMs as advanced rubber ducks. By describing my problems, the solution often comes to my mind on its own and sometimes the responses I get are enough for me to continue on my own. In my opinion, letting LLMs directly code can be really harmful for the software developers, because they forget to think for themselves. Maybe I'm wrong and I am just slow to accept the new reality, but I try to keep writing most of my code on my own and improve my coding skills more than my prompting skills (while still using these tools, of course). For me, LLMs are like a grumpy and cynical old senior dev who is forced to talk in a very positive manner and who has fun trickling in some completely random bullshit between his actual helpful advice.