| ▲ | EastLondonCoder 3 hours ago | |
I’ve ended up with a workflow that lines up pretty closely with the guidance/oversight framing in the article, but with one extra separation that’s been critical for me. I’m working on a fairly messy ingestion pipeline (Instagram exports → thumbnails → grouped “posts” → frontend rendering). The data is inconsistent, partially undocumented, and correctness is only visible once you actually look at the rendered output. That makes it a bad fit for naïve one-shotting. What’s worked is splitting responsibility very explicitly: • Human (me): judge correctness against reality. I look at the data, the UI, and say things like “these six media files must collapse into one post”, “stories should not appear in this mode”, “timestamps are wrong”. This part is non-negotiably human. • LLM as planner/architect: translate those judgments into invariants and constraints (“group by export container, never flatten before grouping”, “IG mode must only consider media/posts/*”, “fallback must never yield empty output”). This model is reasoning about structure, not typing code. • LLM as implementor (Codex-style): receives a very boring, very explicit prompt derived from the plan. Exact files, exact functions, no interpretation, no design freedom. Its job is mechanical execution. Crucially, I don’t ask the same model to both decide what should change and how to change it. When I do, rework explodes, especially in pipelines where the ground truth lives outside the code (real data + rendered output). This also mirrors something the article hints at but doesn’t fully spell out: the codebase isn’t just context, it’s a contract. Once the planner layer encodes the rules, the implementor can one-shot surprisingly large changes because it’s no longer guessing intent. The challenges are mostly around discipline: • You have to resist letting the implementor improvise. • You have to keep plans small and concrete. • You still need guardrails (build-time checks, sanity logs) because mistakes are silent otherwise. But when it works, it scales much better than long conversational prompts. It feels less like “pair programming with an AI” and more like supervising a very fast, very literal junior engineer who never gets tired, which, in practice, is exactly what these tools are good at. | ||