Build a Basic AI Agent from Scratch: Long Task Planning

athrowaway3z an hour ago | parent | next [-]

I've tried most form of planning - from the basic AGENTS.md guide to keeping ./dev/ plan files, todo list tools, sqlite db with both minimal and extensive tracking, etc.

None of them have been worth it. A year ago the models needed to be reminded. Today they can follow a plan from text alone. This is my experience from working on a project alone - in teams ... i actually think the same lesson holds in the new AI paradigm.

My current scheme is basically this - in order of the task's complexity:

- Tell an agent to do something

- Tell an agent to make a plan then tell it to execute on it.

- Tell an agent to make a plan, write to a file, have a subagent review it, then execute it.

- Do the above, but instead tell the agent they're in a supervise mode and to have subagents implement as many phases and rollover with a handoff.md while they, as the supervisor agent, keeps driving the task to completion.

The latter two i have under a sigil so they're prepared prompts i can inject with a few keystrokes.

If i feel very fancy i'll tell them to update the plan with a checklist and add checkboxes, but it just doesn't pay enough to have 'init-prompt' level planning feature or tools if in the same context you already have files/read/write.

▲

manishsharan 14 minutes ago | parent [-]

Please don't take offense to this very dumb question:

Why can't you do the planning ? Figure out what needs to be done , break it down into small tasks and then ask the agent to execute those small tasks?

When we executed projects in the past, this is what I would do as a lead: figure out the overall software architecture and delegate the tasks to developers.

This way I always knew how the system worked and could extend it as needed. I am not in development role anymore but I am trying to understand why we are delegating planning and software architecture to coding agents?

	▲	nnnnico 7 minutes ago \| parent [-]
		whatever you delegated in the past probably also required planning by the engineer that went down and got it done, most planning done by agents is at this same level, agent explores the codebase, understands where to touch, tradeoffs, code-level architecture, and ask the user for more context or balance with assumptions and other patterns already present in code

▲

Havoc 2 hours ago | parent | prev | next [-]

What’s with all the aggression here. Not very hn

▲

int3trap an hour ago | parent [-]

1. People don't like medium, rightly so.

2. The content is lower quality.

▲

jdw64 an hour ago | parent | next [-]

I find it hard to agree with the point that the content quality is low. Of course, that design does have some issues. But it is still valuable and worth reading.

The strengths are that the design forces Chain of Thought as a memory buffer and the TODO list in an FSM style. I think those are fine. The recovery strategy is also pretty good.

However, the problem is that the business logic does not run as Python code but lives inside the prompt. And it does not support parallel execution. But as a single run script, it is helpful enough for understanding the concept.

Of course, if I were to do the code properly, I would use a separate storage instead of in memory, and more carefully verify tool constraints and the actual scope limitations of the tools. But still, I think this is helpful enough.

▲

hilariously 41 minutes ago | parent [-]

The recovery strategy in my mind would be what to do in case of a crash, which would just wipe out all the context here (scratch pad, todo list, etc) - it doesn't seem very recoverable.

	▲	jdw64 33 minutes ago \| parent [-]
		This is the difficult part of programming debates. What you mentioned is about the TODO list disappearing immediately when Python shuts down, right? What I was talking about is the point where the LLM retries when something goes wrong due to a mistake in the previous task. Actually, that's why I included the sentence 'If I were to do the code properly, I would use a separate storage instead of in memory.' I guess I unintentionally caused some confusion.

▲

cmrdporcupine 8 minutes ago | parent | prev | next [-]

Seems odd that it would get upvoted to the front page then in the first place?

▲

ramon156 an hour ago | parent | prev [-]

I agree with 1, same for substack. bearblog seems cool tho

I don't think the content is low quality, though.

▲

jdw64 an hour ago | parent | prev | next [-]

I don't understand why people criticize this post. When you run a homepage or a blog, it's unavoidable to write script style code. Even if the quality is a bit low, that's the limit within a tutorial. Because if you go into actual design, things like boundaries, policies, error handling, and so on require a lot of prior knowledge. So when certain knowledge is needed, you can only post something as a simple runnable script.

For example, if I were building real software, I would design everything from policy to error logging policies and so on. But when writing a blog post, it's just simplified into a short runnable script.

▲

b800h 2 hours ago | parent | prev | next [-]

Why do people use Medium?

	▲	jdw64 an hour ago \| parent \| next [-]
		At least Medium's algorithm shows it to users within Medium. A personal homepage doesn't get picked up well by SEO, and unless it becomes famous, you can't see any comments from people. Just like my homepage(makonea.com) that no one visits
	▲	antonvs an hour ago \| parent \| prev [-]
		Because it gives them a way to post articles for free? What should they use instead, your highness? Why do people post comments like this?

▲

elxr 2 hours ago | parent | prev | next [-]

Code tutorial on medium (who's formatting is absolutely not meant for this)?

Please stop posting.

	▲	preommr 2 hours ago \| parent [-]
		elaborate? It's using code blocks that have language highlighting, and the appropriate whitespacing. What's the problem?

▲

aafaqzahid 2 hours ago | parent | prev | next [-]

Are people using medium in 2026?

▲

mxkopy 2 hours ago | parent | prev | next [-]

Jesus the terminology is so fucked… compare the contents of this blog post with any RL paper containing the words “long term planning”…

▲

niggischiggi 3 hours ago | parent | prev [-]

Yeah yeah... the world needs even more "aI aGenTz". This will help fighting climate change and child starvation.

▲

paulluuk 2 hours ago | parent | next [-]

That seems pretty harsh. How do new frontend frameworks, GPU shaders or another article about how great Rust is (which it is) help fight climate change or child starvation?

	▲	trollbridge 24 minutes ago \| parent \| next [-]
		Since the migration from setuptools -> poetry -> uv -> full Rust, I think my computer burns up less energy (not to mention all the CI/CD pipelines) from running slow tools over and over. So that's a win for Rust there.
	▲	bcjdjsndon an hour ago \| parent \| prev \| next [-]
		> great Rust is (which it is) They just took undefined behaviour and called it unsafe. Theyve not really solved anything. Even their own std lib has security bugs in unsafe code. And their only ever retort is "there are thousands of these bugs a day in c code"... Let's wait until rust gets used seriously in the systems and embedded space first, no point comparing c to minnows like rust when it comes to total cves.
	▲	reactordev 2 hours ago \| parent \| prev [-]
		The point they were making sarcastically is that this, doesn’t.

▲

pixel_popping 10 minutes ago | parent | prev | next [-]

Yes, the world does need more.

▲

antonvs an hour ago | parent | prev [-]

Go find another website to spew your nonsense on.