| ▲ | mccoyb 2 days ago |
| Building agents has been fun for me, but it's clear that there are serious problems with "context engineering" that must be overcome with new ideas. In particular, no matter how big the context window size is increased - one must curate what the agent sees: agents don't have very effective filters on what is relevant to supercharge them on tasks, and so (a) you must leave *.md files strewn about to help guide them and (b) you must put them into roles. The *.md system is essentially a rudimentary memory system, but it could get be made significantly more robust, and could involve e.g. constructing programs and models (in natural language) on the fly, guided by interactions with the user. What Claude Code has taught me is that steering an agent via a test suite is an extremely powerful reinforcement mechanism (the feedback loop leads to success, most of the time) -- and I'm hopeful that new thinking will extend this into the other "soft skills" that an agent needs to become an increasingly effective collaborator. |
|
| ▲ | blks a day ago | parent | next [-] |
| Sounds like you are spending more time battling with your own tools than doing actual work. |
| |
| ▲ | mccoyb a day ago | parent [-] | | Ah yes, everything has to be about getting work done, right? You always have to be productive! Do you think, just maybe, it might be interesting to play around with these tools without worrying about how productive you're being? | | |
| ▲ | RealityVoid a day ago | parent | next [-] | | No. They're tools, not pets, and since everyone is raving how good these tools are, I expect to be able to use them as tools. My ideea of a good time is understanding the system in depth and building it while trusting it does what I expect. This is going away, though. | |
| ▲ | rpcorb a day ago | parent | prev [-] | | Yes, actually. Everything is about getting work done. Delivering value. If you're spending more time playing with your tools vs. using your tools to deliver value, that's not success at your job. Play around on your own time. | | |
| ▲ | southernplaces7 a day ago | parent [-] | | Since they didn't mention whether they play around at work or on their own free time, what the hell are you even talking about? And no, not everything is about pushing to be productive and deliver some dry corporate HR turd of a definition about value. If anything such tedious obsessions can just cloud a person's mind against creating something interesting that does turn out to also have long-term import. I mean, I assume i'm talking to either a troll or an idiot given the weird rant you replied with, but it's good to remember that value doesn't always come in a specifically molded form. | | |
|
|
|
|
| ▲ | franktankbank 2 days ago | parent | prev | next [-] |
| Is there a recommended way to construct .md files for such a system? For instance when I make them for human consumption they'd have lots of markup for readability but that may or may not be consumable by an llm. Can you create a .md the same as for human consumption that doesn't hinder an llm? |
| |
| ▲ | artpar 2 days ago | parent | next [-] | | I am using these files (most of them are llm generated based on my prompt to reduce its lookups when working on a codebase) https://gist.github.com/artpar/60a3c1edfe752450e21547898e801... (specially the AGENT.knowledge is quite helpful) | | |
| ▲ | HumanOstrich 2 days ago | parent [-] | | Can you provide any form of demonstration of an LLM reading these files and acting accordingly? Do you know how each item added affects its behavior? I'd also be interested in your process for creating these files, such as examples of prompts, tools, and references for your research. | | |
| ▲ | artpar a day ago | parent [-] | | claude doesn't read them reliably and has to be reminded across sessions. I ususally do @AGENT.main and @AGENT.knowledge and it figures out the rest. Over the period of doing this claude is able to maintain the "project management" part itself, as in terms of "whats the current state of the project" and "what are the next ideal todos and how to go about them". > Can you provide any form of demonstration of an LLM reading these files and acting accordingly claude does update them at the end of the session (i say wrap up on prompt). the ones you are seeing in that gist are original forms, they evolve with each commit. |
|
| |
| ▲ | sothatsit 2 days ago | parent | prev | next [-] | | Just writing a clear document, like you would for a person, gets you 95% of the way there. There are little tweaks you can do, but they don't matter as much as just being concise and factual, and structuring the document clearly. You just don't want the documentation to get too long. | |
| ▲ | golergka 2 days ago | parent | prev [-] | | I've had very good experience with building a very architecture-conscious folder structure and putting AGENTS.md in every folder (and, of course, instruction to read _and_ update those in the root prompt). But with Agent-written docs I also have to run doc maintainer agent pretty often. | | |
| ▲ | troupo 2 days ago | parent [-] | | > and putting AGENTS.md in every folder (and, of course, instruction to read _and_ update those in the root prompt). For me, Claude Code completely ignores the instruction to read and follow AGENTS.md, and I have to remind it every time. The joys of non-deterministic blackboxes. |
|
|
|
| ▲ | zmgsabst 2 days ago | parent | prev | next [-] |
| I’ve found managing the context is most of the challenge: - creating the right context for parallel and recursive tasks; - removing some steps (eg, editing its previous response) to show only the corrected output; - showing it its own output as my comment, when I want a response; Etc. |
| |
| ▲ | mccoyb 2 days ago | parent | next [-] | | I've also found that relying on agents to build their own context _poisons_ it ... that it's necessary to curate it constantly. There's kind of a <1 multiplicative thing going on, where I can ask the agent to e.g. update CLAUDE.mds or TODO.mds in a somewhat precise way, and the agent will multiply my request in a lot of changes which (on the surface) appear well and good ... but if I repeat this process a number of times _without manual curation of the text_, I end up with "lower quality" than I started with (assuming I wrote the initial CLAUDE.md). Obvious: while the agent can multiply the amount of work I can do, there's a multiplicative reduction in quality, which means I need to account for that (I have to add "time doing curation") | | |
| ▲ | prmph 2 days ago | parent [-] | | In other words, the old adage still applies: there is no free lunch. More seriously, yes it makes sense that LLMs are not going to be able to take humans entirely out of the loop. Think about what it would mean if that were the case: if people, on the basis of a few simple prompts could let the agents loose and create sophisticated systems without any further input, the there would be nothing to differentiate those systems, and thus they would lose their meaning and value. If prompting is indeed the new level of abstraction we are working at, then what value is added by asking Claude: make me a note-taking app? A million other people could also issue this same low-effort prompt; thus what is the value added here by the prompter? |
| |
| ▲ | ModernMech 2 days ago | parent | prev [-] | | It's funny because things are finally coming full circle in ML. 10-15 years ago the challenge in ML/PR was "feature engineering", the careful crafting of rules that would define features in the data which would draw the attention of the ML algorithm. Then deep learning came along and it solved the issue of feature engineering; just throw massive amounts of data at the problem and the ML algorithms can discern the features automatically, without having to craft them by hand. Now we've gone as far as we can with massive data, and the problem seems to be that it's difficult to bring out the relevent details when there's so much data. Hence "context engineering", a manual, heuristic-heavy processes guided by trial and error and intuition. More an art than science. Pretty much the same thing that "feature engineering" was. |
|
|
| ▲ | moritz64 2 days ago | parent | prev [-] |
| > steering an agent via a test suite is an extremely powerful reinforcement mechanism can you elaborate a bit? how do you proceed? what does your process look like? |
| |
| ▲ | mccoyb a day ago | parent [-] | | I spend a significant amount of time (a) curating the test suite, and making sure it matches my notion of correctness and (b) forcing the agent to make PNG visuals (which Claude Code can see, by the way, and presumably also Gemini CLI, and maybe Aider?, etc) I'd have to do this anyways, if I was writing the code myself, so this is not "time above what I'd normally spend" The visuals it makes for me I can inspect and easily tell if it is on the right path, or wrong. The test suite is a sharper notion of "this is right, this is wrong" -- more sharp than just visual feedback and my directions. The basic idea is to setup a feedback loop for the agent, and then keep the agent in the loop, and observe what it is doing. The visuals are absolutely critical -- as a compressed representation of the behavior of the codebase, which I can quickly and easily parse and recognize if there are issues. |
|