Agent design is still hard

mritchie712 28 minutes ago | parent | next [-]

Some things we've[0] learned on agent design:

1. If your agent needs to write a lot of code, it's really hard to beat Claude Code (cc) / Agent SDK. We've tried many approaches and frameworks over the past 2 years (e.g. PydanticAI), but using cc is the first that has felt magic.

2. Vendor lock-in is a risk, but the bigger risk is having an agent that is less capable then what a user gets out of chatgpt because you're hand rolling every aspect of your agent.

3. cc is incredibly self aware. When you ask cc how to do something in cc, it instantly nails it. If you ask cc how to do something in framework xyz, it will take much more effort.

4. Give your agent a computer to use. We use e2b.dev, but Modal is great too. When the agent has a computer, it makes many complex features feel simple.

0 - For context, Definite (https://www.definite.app/) is a data platform with agents to operate it. It's like Heroku for data with a staff of AI data engineers and analysts.

▲

CuriouslyC 17 minutes ago | parent | next [-]

Be careful about what you hand off to Claude versus another agent. Claude is a vibe project monster, but it will fail at hard things, come up with fake solutions, and then lie to you about them. To the point that it'll add random sleeps and do pointless work to cover up the fact that it's reward hacking. It's also very messy.

For brownfield work, work on hard stuff or work in big complex codebases you'll save yourself a lot of pain if you use Codex instead of CC.

▲

smcleod 19 minutes ago | parent | prev [-]

It's quite worrying that I have several times in the last few months had to really drive home why people should probably not be building bespoke agentic systems just to essentially act as a half baked version of an agentic coding tool when they could just go use Claude code and instead focus their efforts on creating value rather than instant technical debt.

	▲	CuriouslyC 16 minutes ago \| parent [-]
		You can pretty much completely reprogram agents just by passing them through a smart proxy. You don't need to rewrite claude/codex, just add context engineering and tool behaviors at the proxy layer.

▲

postalcoder an hour ago | parent | prev | next [-]

I've been building agent type stuff for a couple years now and the best thing I did was build my own framework and abstractions that I know like the back of my hand.

I'd stay clear of any llm abstraction. There are so many companies with open source abstractions offering the panacea of a single interface that are crumbling under their own weight due to the sheer futility of supporting every permutation of every SDK evolution, all while the same companies try to build revenue generating businesses on top of them.

	▲	the_mitsuhiko 41 minutes ago \| parent \| next [-]
		Author here. I’m with you on the abstractions part. I dumped a lot of my though so this into a follow up post: https://lucumr.pocoo.org/2025/11/22/llm-apis/
	▲	NitpickLawyer 24 minutes ago \| parent \| prev \| next [-]
		Yes, this is great advice. It also applies to interfaces. When we designed a support "chat bot", we went with a diferent architecture than what's out there already. We designed the system with "chat rooms" instead, and the frontend just dumps messages to a chatroom (with a session id). Then on the backend we can do lots of things, incrementally adding functionality, while the front end doesn't have to keep up. We can also do things like group messages, have "system" messages that other services can read, etc. It also feels more natural, as the client can type additional info while the server is working, etc. If you have to use some of the client side SDKs, another good idea is to have a proxy where you can also add functionality without having to change the frontend.
	▲	_pdp_ 18 minutes ago \| parent \| prev [-]
		This is a huge undertaking though. Yes it is quite simple to build some basic abstraction on top of openai.complete or similar but this like 1% of an agent need to do. My bet is that agent frameworks and platform will become more like game engines. You can spin your own engine for sure and it is fun and rewarding. But AAA studios will most likely decide to use a ready to go platform with all the batteries included.

▲

_pdp_ 6 minutes ago | parent | prev | next [-]

I've started a company in this space about 2 years ago. We are doing fine. What we've learned so far is that a lot of these techniques are simply optimisations to tackle some deficiency in LLMs that is a problem "today". These are not going to be problems tomorrow because the technology will shift. As it happened many time in the span of the last 2 years.

So yah, cool, caching all of that... but give it a couple of months and a better technique will come out - or more capable models.

Many years ago when disc encryption on AWS was not an option, my team and I had to spend 3 months to come up with a way to encrypt the discs and do so well because at the time there was no standard way. It was very difficult as that required pushing encrypted images (as far as I remember). Soon after we started, AWS introduced standard disc encryption that you can turn on by clicking a button. We wasted 3 months for nothing. We should have waited!

What I've learned from this is that often times it is better to do absolutely nothing.

▲

CuriouslyC 19 minutes ago | parent | prev | next [-]

The 'Reinforcement in the Agent Loop' section is a big deal, I use this pattern to enable async/event steered agents, it's super powerful. In long context you can use it to re-inject key points ("reminders"), etc.

▲

srameshc 17 minutes ago | parent | prev | next [-]

I still feel there is no sure shot way to build an abstraction yet. Probably that is why Loveable decided to build on Gemini AI rather than giving options of choosing model. On the other hand I like Pydantic AI framework and got myself a decent working solution where my preference is to stick with cheaper models by default and only use expensive only in cases where failure rate is too high.

	▲	_pdp_ 15 minutes ago \| parent [-]
		This is true. Based on my experience with real customers, they really don't know what is the difference between the different models. What they want is to get things done. The model is simply means to an end. As long as the task is completed, everything else is secondary.

▲

Fiveplus 21 minutes ago | parent | prev [-]

I liked reading this but got a silly question as I am a noob in these things. If explicit caching is better, does that mean the agent is just forgetting stuff unless we manually save its notes? Are these things really that forgetful? Also why is there a virtual file system? So the agent is basically just running around a tiny digital desktop looking for its files? Why can't the agent just know where the data is? I'm sorry if these are juvenile questions.

	▲	the_mitsuhiko 7 minutes ago \| parent \| next [-]
		> If explicit caching is better, does that mean the agent is just forgetting stuff unless we manually save its notes? Caching is unrelated to memory, it's about how to not do the same work over and over again due to the distributed nature of state. I wrote a post that goes into detail from first principles here with my current thoughts on that topic [1]. > Are these things really that forgetful? No, not really. They can get side-tracked which is why most agents do a form of reinforcement in-context. > Why is there a virtual file system? So that you don't have dead-end tools. If a tool creates or manipulates state (which we represent on a virtual file system), another tool needs to be able to pick up the work. > Why can't the agent just know where the data is? And where would that data be? [1]: https://lucumr.pocoo.org/2025/11/22/llm-apis/
	▲	pjm331 15 minutes ago \| parent \| prev [-]
		You are maybe confusing caching and context windows. Caching is mainly about keeping inference costs down