It's a great point and everyone should know it: the core of a coding agent is really simple, it's a loop with tool calling.

Having said that, I think if you're going to write an article like this and call it "The Emperor Has No Clothes: How to Code Claude Code in 200 Lines of Code", you should at least include a reference to Thorsten Ball's excellent article from wayyy back in April 2025 entitled "How to Build an Agent, or: The Emperor Has No Clothes" (https://ampcode.com/how-to-build-an-agent)! That was (as far as I know) the first of these articles making the point that the core of a coding agent is actually quite simple (and all the deep complexity is in the LLM). Reading it was a light-bulb moment for me.

FWIW, I agree with other commenters here that you do need quite a bit of additional scaffolding (like TODOs and much more) to make modern agents work well. And Claude Code itself is a fairly complex piece of software with a lot of settings, hooks, plugins, UI features, etc. Although I would add that once you have a minimal coding agent loop in place, you can get it to bootstrap its own code and add those things! That is a fun and slightly weird thing to try.

(By the way, the "January 2025" date on this article is clearly a typo for 2026, as Claude Code didn't exist a year ago and it includes use of the claude-sonnet-4-20250514 model from May.)

Edit: and if you're interested in diving deeper into what Claude Code itself is doing under the hood, a good tool to understand it is "claude-trace" (https://github.com/badlogic/lemmy/tree/main/apps/claude-trac...). You can use it to see the whole dance with tool calls and the LLM: every call out to the LLM and the LLM's responses, the LLM's tool call invocations and the responses from the agent to the LLM when tools run, etc. When Claude Skills came out I used this to confirm my guess about how they worked (they're a tool call with all the short skill descriptions stuffed into the tool description base prompt). Reading the base prompt is also interesting. (Among other things, they explicitly tell it not to use emoji, which tracks as when I wrote my own agent it was indeed very emoji-prone.)

▲ bredren a day ago | parent | next [-]

I've been exploring the internals of Claude Code and Codex via the transcripts they generate locally (these serve as the only record of your interactions with the products)[1].

Given the stance of the article, just the transcript formats reveals what might be a surprisingly complex system once you dig in.

For Claude Code, beyond the basic user/assistant loop, there's uuid/parentUuid threading for conversation chains, queue-operation records for handling messages sent during tool execution, file-history-snapshots at every file modification, and subagent sidechains (agent-*.jsonl files) when the Task tool spawns parallel workers.

So "200 lines" captures the concept but not the production reality of what is involved. It is particularly notable that Codex has yet to ship queuing, as that product is getting plenty of attention and still highly capable.

I have been building Contextify (https://contextify.sh), a macOS app that monitors Claude Code and Codex CLI transcripts in real-time and provides a CLI and skill called Total Recall to query your entire conversational history across both providers.

I'm about to release a Linux version and would love any feedback.

[1] With the exception of Claude Code Web, which does expose "sessions" or shared transcripts between local and hosted execution environments.

▲ jake-coworker a day ago | parent | next [-]

IMO these articles are akin to "Twitter in 200 lines of code!" and "Why does Uber need 1000 engineers?" type articles.

They're cool demos/POCs of real-world things, (and indeed are informative to people who haven't built AI tools). The very first version of Claude Code probably even looked a lot like this 200 line loop, but things have evolved significantly from there

	▲	tomtomtom777 a day ago \| parent [-]
		> IMO these articles are akin to "Twitter in 200 lines of code!" I don't think it serves the same purpose. Many people understand the difference between a 200 lines twitter prototype and the real deal. But many of those may not understand what the LLM client tool does and how it relates to the LLM server. It is generally consumed as one magic black box. This post isn't to tell us how everyone can build a production grade claude-code; it tells us what part is done by the CLI and what part is done by the LLM's which I think is a rather important ingredient in understanding the tools we are using, and how to use them.

▲ d4rkp4ttern a day ago | parent | prev | next [-]

Nice, I have something similar [1], a super-fast Rust/Tantivy-based full-text search across Claude Code + Codex-CLI session JSONL logs, with a TUI (for humans) and a CLI/JSONL mode for agents.

For example there’s a session-search skill and corresponding agent that can do:

    aichat search —json  [search params]

So you can ask Claude Code to use the searcher agent to recover arbitrary context of prior work from any of your sessions, and build on that work in a new session. This has enabled me to completely avoid compaction.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

▲ dnw a day ago | parent | prev | next [-]

That is a cool tool. Also one can set "cleanupPeriodDays": in ~/.claude/settings.json to extend cleanup. There is so much information these tools keep around we could use.

I came across this one the other day: https://github.com/kulesh/catsyphon

▲ Johnny_Bonk a day ago | parent | prev | next [-]

This is very interesting, especially if you could then use an llm across that search to figure out what has and maybe has not been completed, and then reinject those findings into a new Claude code session

	▲	bredren a day ago \| parent \| next [-]
		I haven't written the entry yet but it is pretty incredible what you can get when letting a frontier model RAG your complete CLI convo history. You can find out not just what you did and did not do but why. It is possible to identify unexpectedly incomplete work streams, build a histogram of the times of day you get most irritated with the AI, etc. I think it is very cool and I have a major release coming. I'd be very appreciative of any feedback.
	▲	handfuloflight a day ago \| parent \| prev [-]
		For that you'd be better off having the LLM write TODO stubs in the codebase and search for that. In fact, most of the recent models just do this, even without prompting.

▲ lelanthran 15 hours ago | parent | prev [-]

> So "200 lines" captures the concept but not the production reality of what is involved.

How many lines would you estimate it takes to capture that production reality of something like CC? I ask because I got downvoted for asking that question on a different story[1].

I asked because in that thread someone quoted the CC dev(s) as saying:

>> In the last thirty days, I landed 259 PRs -- 497 commits, 40k lines added, 38k lines removed.

My feeling is that a tool like this, while it won't be 200 lines, can't really be 40k lines either.

[1] If anyone is interested, https://news.ycombinator.com/item?id=46533132

	▲	foltik 14 hours ago \| parent [-]
		My guess is <5k for a coherent and intentional expert human design. Certainly <10k. It’s telling that they can’t fix the screen flickering issue, claiming “the problem goes deep.”

▲ misternugget a day ago | parent | prev | next [-]

Hey! Thorsten Ball here. Thanks for the shout-out. I was quite confused when someone sent me this article: same "Emperor has no clothes", same "it's only x hundred lines", implements the same tools, it even uses the same ASCII colors when printing you/assistant/tool. Then I saw the "January 2025" in the title and got even more confused.

So, thanks for your comment and answering all the questions I had just now about "wait, did I wake up in a parallel universe where I didn't write the post but someone else did?"

	▲	libraryofbabel 9 hours ago \| parent \| next [-]
		Hi! Thanks again for writing that first Emperor Has No Clothes blog post; like I said, it really inspired me and made everything click early on when I was first dipping my toes into the world of agents. Whenever I teach this stuff to other engineers, I often think back to that moment of realizing exactly how the exchange between LLM, tool call requests, tool functions, and agent code works and I try to get that across as the central takeaway. These days I usually add diagrams to get really clear on what happens on which side of the model api. I do wonder whether the path here was: 1) You wrote the article in April 2025 2) The next generation of LLMs trained on your article 3) The author of TFA had a similar idea, and heavily used LLMs to help write the code and the article, including asking the LLM to think of a catchy title. And guess what title the LLM comes up with? There are also less charitable interpretations, of course. But I'd rather assume this was honestly, if sloppily, done.
	▲	justanotherprof 16 hours ago \| parent \| prev [-]
		Many thanks for your article, it was one of the true "aha" moments for me in 2025! It is a shame that your work is apparently being appropriated without attribution to sell an online course...

▲ aszen a day ago | parent | prev | next [-]

The most imp part is editing code, to do that reliably Claude models are trained on their own str replace tool schema I think. Models find it hard to modify existing code, they also can't just rewrite whole files bcz that's expensive and doesn't scale.

▲

embedding-shape a day ago | parent | next [-]

Here's where I was hoping openly available models would shine. Some community gets together, starts sharing successful/failed runs with their own agent, start building a open dataset for their specific syntax and tooling. then finally finetune new variants with it for the community.

▲

libraryofbabel a day ago | parent | prev [-]

Yeah, there is definitely some RLVR training going on for the Claude LLMs to get them good at some of the specific tool calls used in Claude Code, I expect. Having said that, the string replacement tool schema for file edits is not very complicated at all (you can see it in the tool call schema Claude Code sends to the LLM), so you could easily use that in your own 200-300 line agent if you wanted to make sure you're playing to the LLM's strengths.

▲

aszen a day ago | parent [-]

Yeah that's one example, but I suspect they train the model on entire sequences of tool calls, so unless you prompt the model exactly as them you won't get the same results.

There's a reason they won the agent race, their models are trained to use their own tools.

▲

libraryofbabel a day ago | parent [-]

Agree, the RLVR tasks are probably long series of tool calls at this point doing complex tasks in some simulated dev environment.

That said, I think it's hard to say how much of a difference it really makes in terms of making Claude Code specifically better than other coding agents using the same LLM (versus just making the LLM better for all coding agents using roughly similar tools). There is probably some difference, but you'd need to run a lot of benchmarks to find out.

	▲	aszen a day ago \| parent [-]
		Agreed it probably contributes to the model improving for all agents but crucially it is verifiably better against their own agent. So they get a good feedback loop to improve both

▲ alansaber a day ago | parent | prev | next [-]

Ah I just assumed it was the same article reposted

▲ justanotherprof 16 hours ago | parent | prev | next [-]

I am glad you pointed out Thorsten Ball's truly excellent article: I was about to add a comment to that effect!

▲ KellyCriterion a day ago | parent | prev [-]

can you show us the >>core of a coding agent which is, according to your words, >>really simple and would you mind sharing a URL so I could check it out then?

▲

libraryofbabel a day ago | parent [-]

It's in TFA or in the https://ampcode.com/how-to-build-an-agent article I linked? Or is that not what you're looking for?

	▲	KellyCriterion a day ago \| parent [-]
		Sorry, sounded like "your version", instead of the one listed :-)