Remix.run Logo
Show HN: Agent framework that generates its own topology and evolves at runtime(github.com)
83 points by vincentjiang 11 hours ago | 28 comments

Hi HN,

I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they sleep. They want services, not tools.

Existing agent frameworks (LangChain, AutoGPT) failed in production - brittle, looping, and unable to handle messy data. General Computer Use (GCU) frameworks were even worse. My reflections:

1. The "Toy App" Ceiling & GCU Trap Most frameworks assume synchronous sessions. If the tab closes, state is lost. You can't fit 2 weeks of asynchronous business state into an ephemeral chat session.

The GCU hype (agents "looking" at screens) is skeuomorphic. It’s slow (screenshots), expensive (tokens), and fragile (UI changes = crash). It mimics human constraints rather than leveraging machine speed. Real automation should be headless.

2. Inversion of Control: OODA > DAGs Traditional DAGs are deterministic; if a step fails, the program crashes. In the AI era, the Goal is the law, not the Code. We use an OODA loop to manage stochastic behavior:

- Observe: Exceptions are observations (FileNotFound = new state), not crashes.

- Orient: Adjust strategy based on Memory and - Traits.

- Decide: Generate new code at runtime.

- Act: Execute.

The topology shouldn't be hardcoded; it should emerge from the task's entropy.

3. Reliability: The "Synthetic" SLA You can't guarantee one inference ($k=1$) is correct, but you can guarantee a System of Inference ($k=n$) converges on correctness. Reliability is now a function of compute budget. By wrapping an 80% accurate model in a "Best-of-3" verification loop, we mathematically force the error rate down—trading Latency/Tokens for Certainty.

4. Biology & Psychology in Code "Hard Logic" can't solve "Soft Problems." We map cognition to architectural primitives: Homeostasis: Solving "Perseveration" (infinite loops) via a "Stress" metric. If an action fails 3x, "neuroplasticity" drops, forcing a strategy shift. Traits: Personality as a constraint. "High Conscientiousness" increases verification; "High Risk" executes DROP TABLE without asking.

For the industry, we need engineers interested in the intersection of biology, psychology, and distributed systems to help us move beyond brittle scripts. It'd be great to have you roasting my codes and sharing feedback.

Repo: https://github.com/adenhq/hive

Anujsharma002 11 minutes ago | parent | next [-]

Impressive work by the Aden team!

Hive stands out as one of the most thoughtfully designed agent frameworks I’ve seen. The goal-driven approach, automatic graph generation, and self-healing adaptation loop make it genuinely production-focused—not just another demo-style agent system.

Highlights that really shine:

Outcome-first agent design (no hardcoded workflows)

Built-in observability, HITL, and cost controls

Strong CLI + TUI experience for real-world debugging

Clean docs and excellent open-source hygiene

This is a solid foundation for teams building long-running, autonomous AI systems at scale. Excited to explore more and potentially contribute—great job pushing the agent ecosystem forward!

kkukshtel an hour ago | parent | prev | next [-]

The comments on this post that congratulate/engage with OP all seem to be from hn accounts created in the past three months that have only ever commented on this post, so it seems like there is some astro-turfing going on here.

nishant_b555 an hour ago | parent | prev | next [-]

’ve been exploring the Hive repo over the past few days. One thing I found interesting is the OODA-loop inspired control flow and the idea of topology emerging at runtime rather than being statically defined.

I opened an issue (#3905) proposing built-in performance metrics and monitoring, since observability seems especially important when execution graphs aren’t deterministic.

Curious how others approach monitoring in dynamic agent systems — do you treat them more like distributed systems (tracing, spans, structured logs), or something closer to workflow DAG monitoring?

matchaonmuffins 4 hours ago | parent | prev | next [-]

This looks very cool! Will definitely try this out in the coming days.

However, a few questions:

A few weeks ago, Moonshot AI released their agentic swarm system that claimed to "self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls." I wonder what your thoughts are on this kind of architecture, where systems rely on a "master node" of sorts for orchestration.

By contrast, your OODA approach seems to rely more on emergent agent behavior -- which I think may result in less than ideal outcomes in at least the near term, especially without specially RL'ed swarm agents.

Also, there's the problem of conflicting objectives. What happens if multiple agents pursuing the same goal arrive at conflicting strategies?

khimaros an hour ago | parent | prev | next [-]

i have been working on something similar, trying to build the leanest agent loop that can be self modifying. ended up building it as a plugin within OpenCode with the cow pulled out into python hooks that the agent can modify at runtime (with automatic validation of existing behavior). this allows it to create new tools for itself, customize it's system prompt preambles, and of course manage its own traits. also contains a heartbeat hook. it all runs in an incus VM for isolation and provides a webui and attachable TUI thanks to OpenCode.

CuriouslyC 6 hours ago | parent | prev | next [-]

Failures of workflows signal assumption violations that ultimately should percolate up to humans. Also, static dags are more amenable to human understanding than dynamic task decomposition. Robustness in production is good though, if you can bound agent behavior.

Best of 3 (or more) tournaments are a good strategy. You can also use them for RL via GRPO if you're running an open weight model.

ipnon 5 hours ago | parent [-]

In HNese this means "very impressive, keep up the good work."

mubarakar95 2 hours ago | parent | prev | next [-]

It forces you to write code that is "strategy-aware" rather than just "procedural." It’s a massive shift from standard DAGs where one failure kills the whole run. Really interesting to see how the community reacts to this "stochastic" approach to automation.

vincentjiang 11 hours ago | parent | prev | next [-]

To expand on the "Self-Healing" architecture mentioned in point #2:

The hardest mental shift for us was treating Exceptions as Observations. In a standard Python script, a FileNotFoundError is a crash. In Hive, we catch that stack trace, serialize it, and feed it back into the Context Window as a new prompt: "I tried to read the file and failed with this error. Why? And what is the alternative?"

The agent then enters a Reflection Step (e.g., "I might be in the wrong directory, let me run ls first"), generates new code, and retries.

We found this loop alone solved about 70% of the "brittleness" issues we faced in our ERP production environment. The trade-off, of course, is latency and token cost.

I'm curious how others are handling non-deterministic failures in long-running agent pipelines? Are you using simple retries, voting ensembles, or human-in-the-loop?

It'd be great to hear your thoughts.

Fayek_Quazi an hour ago | parent | prev | next [-]

Hive looks like a promising framework for AI agents. I recently contributed a docs PR and found the onboarding experience improving quickly. Excited to see where this goes.

mapace22 3 hours ago | parent | prev | next [-]

Hi there,

As someone working at the intersection of Finance and Data Science, the "Exceptions as Observations" approach isn't just a clever dev trick—it’s a massive shift in Risk Management.

In accounting, we don't aim for a "likely" correct balance, we need precision. Traditional bots fail here because they lack the "common sense" to fix a broken path. By treating a python error as a feedback signal to trigger a self/correction loop.

Hive bridges the gap between AI flexibility and the rigid accuracy required in ERP environments.

From a data science perspective, this creates a "Synthetic Audit Trail". If an agent spends $1 in tokens to autonomously resolve an edge case that would have cost $5000 in human auditing time or accounting errors, the ROI is immediate.

It's moving the needle from "AI as a chatbot" to "AI as a resilient financial infrastructure" that actually prevents fraud and internal irregularities through tighter, automated oversight.

Best regards

mhitza 6 hours ago | parent | prev | next [-]

3. What, or who, is the judge of correctness (accuracy); regardless of the many solutions run in parallel. If I optimize for max accuracy how close can I get to 100% matemathically and how much would that cost?

kaicianflone an hour ago | parent | next [-]

I’m working on an open source project that treats this as a consensus problem instead of a single model accuracy problem.

You define a policy (majority, weighted vote, quorum), set the confidence level you want, and run enough independent inferences to reach it. Cost is visible because reliability just becomes a function of compute.

The question shifts from “is this output correct?” to “how much certainty do we need, and what are we willing to pay for it?”

Still early, but the goal is to make accuracy and cost explicit and tunable.

mapace22 3 hours ago | parent | prev [-]

Hi there,

To be fair, achieving 100% accuracy is something even humans don't do. I don't think this is about a system just asking an AI if something is right or wrong. The "judge" isn't another AI flipping a coin, it’s a code validator based on mathematical forms or pre established rules.

For example, if the agent makes a money transfer, the judge enters the database and validates that the number is exact. This is where we are merging AI intelligence with the security of traditional, "old school" code. Getting this close to 100% accuracy is already a huge deal. It’s like having three people reviewing an invoice instead of just one, it makes it much harder for an error to occur.

Regarding the cost, sure, the AI might cost a bit more because of all these extra validations. But if spending one dolar in tokens saves a company from losing five hundred dollar, due to an accounting error, the system has already paid for itself. It’s an investment, not a cost. Plus, this tighter level of control helps prevent not just errors, but also internal fraud and external irregularities. It’s a layer of oversight that pays off.

Best regards

Sri_Madhav an hour ago | parent | prev | next [-]

I am glad to join Hive Community, and contributing the hive project and learning from it truly excites me to work more!

Multicomp 7 hours ago | parent | prev | next [-]

I am of course unqualified to provide useful commentary on it, but I find this concept to be new and interesting, so I will be watching this page carefully.

My use case is less so trying to hook this up to be some sort of business workflow ClawdBot alternative, but rather to see if this can be an eventually consistent engine that lets me update state over various documents across the time dimension.

could I use it to simulate some tabletop characters and their locations over time?

that would perhaps let me remove some bookkeeping how to see where a given NPC would be on a given day after so many days pass between game sessions. Which lets me do game world steps without having to manually do them per character.

timothyzhang7 7 hours ago | parent [-]

That's a very interesting use case you brought to the table! I've also dreamt about having an agent as my co-host running the sessions. It's a great PoC idea we might look into soon.

foota 7 hours ago | parent | prev | next [-]

I was sort of thinking about a similar idea recently. What if you wrote something like a webserver that was given "goals" for a backend, and then told agents what the application was supposed to be and told it to use the backend for meeting them and then generate feedback based on their experience.

Then have an agent collate the feedback, combined with telemetry from the server, and iterate on the code to fix it up.

In theory you could have the backend write itself and design new features based on what agents try to do with it.

I sort of got the idea from a comparison with JITs, you could have stubbed out methods in the server that would do nothing until the "JIT" agent writes the code.

vincentjiang 7 hours ago | parent | next [-]

Fascinating concept, you essentially frame the backend not as a static codebase, but as an adaptive organism that evolves based on real-time usage.

A few things that come to my mind if I were to build this:

The 'Agent-User' Paradox: To make this work, you'd need the initial agents (the ones responding and testing the goals) to be 'chaotic' enough to explore edge cases, but 'structured' enough to provide meaningful feedback to the 'Architect' agent.

The Schema Contract: How would you ensure that as the backend "writes itself," it doesn't break the contract with the frontend? You’d almost need a JIT Documentation layer that updates in lockstep.

Verification: I wonder if the server should run the 'JIT-ed' code in a sandbox first, using the telemetry to verify the goal was met before promoting the code to the main branch.

It’s a massive shift from Code as an Asset to Code as a Runtime Behavior. Have you thought about how you'd handle state/database migrations in a world where the backend is rewriting itself on the fly? It feels to me that you're almost building a lovable for backend services. I've seen a few OS projects like this (e.g. MotiaDev) But none has executed this perfectly yet.

barelysapient 5 hours ago | parent | prev | next [-]

It’s funny I’ve been pondering something similar. I’ve started by writing an agent first api framework that simplifies the service boundary and relies on code gen for sql stubs and APIs.

My next thought was to implement a multi agent workforce on top of this where it’s fully virtuous (like a cycle) and iterative.

https://github.com/swetjen/virtuous

If you’re interested in working on this together my personal website and contact info is in my bio.

timothyzhang7 7 hours ago | parent | prev [-]

The "JIT" agent closely aligns with the long-term vision we have for this framework. When the orchestrating agent of the working swarm is confident enough to produce more sub-agents, the agent graph(collection) could potentially extend itself based on the responsibility vacuum that needs to be filled.

abhishekgoyal19 2 hours ago | parent | prev | next [-]

Really like the “exceptions as observations” framing treating failures as state transitions instead of crashes feels like the right mental shift for production agents. Curious how you’re bounding long-term state growth though especially with evolving topology + multi-iteration verification loops. Are you seeing token/memory compaction becoming a bottleneck over multi-day workflows?

omhome16 5 hours ago | parent | prev | next [-]

Strongly agree on the 'Toy App' ceiling with current DAG-based frameworks. I've been wrestling with LangGraph for similar reasons—once the happy path breaks, the graph essentially halts or loops indefinitely because the error handling is too rigid.

The concept of mapping 'exceptions as observations' rather than failures is the right mental shift for production.

Question on the 'Homeostasis' metric: Does the agent persist this 'stress' state across sessions? i.e., if an agent fails a specific invoice type 5 times on Monday, does it start Tuesday with a higher verification threshold (or 'High Conscientiousness') for that specific task type? Or is it reset per run?

Starred the repo, excited to dig into the OODA implementation.

woldan 3 hours ago | parent | prev | next [-]

Spot on. UIs are designed for humans because we have eyes and hands; forcing an AI to use a mouse is just 'skeuomorphic' friction. AI should operate at the speed of code, not at the speed of clicks. The headless approach and OODA loops are definitely the right way to move forward.

spankalee an hour ago | parent | prev | next [-]

> The topology shouldn't be hardcoded; it should emerge from the task's entropy

What does this even mean?

fwip 4 hours ago | parent | prev | next [-]

Yet more LLM word vomit. If you can't be bothered to describe your new project in your own words, it's not worth posting about.

Biswabijaya 7 hours ago | parent | prev | next [-]

Great work team.

andrew-saintway 6 hours ago | parent | prev [-]

Aden Hive is a goal-driven Agent framework whose core philosophy represents a shift from “hard-coded workflows” to a “result-oriented architecture.” Traditional Agent frameworks typically rely on predefined procedural flows, which become fragile when faced with complex or uncertain business logic. Hive treats the “goal” as a first-class entity. Developers define objectives, success criteria, and constraints in natural language, and the system automatically generates and evolves an executable Node Graph to achieve them.

A key innovation in Hive is the introduction of a Coding Agent. Based on the defined goal, it automatically generates the code that connects nodes and constructs the execution graph. When failures occur during execution, the system does not merely log errors; it captures runtime data and triggers an Evolution Loop to regenerate and redeploy the agent graph. This closed-loop self-healing capability fundamentally differentiates Hive from traditional “process-oriented frameworks” such as LangChain or AutoGen.

Architecturally, Hive adopts a highly modular monorepo structure and uses uv for dependency management. The core runtime resides in the core/ directory and is responsible for graph execution, node scheduling, and lifecycle management. Tool capabilities are encapsulated in tools/ (aden_tools) and communicate with the core runtime via the MCP (Model Context Protocol), ensuring strong decoupling. The exports/ directory stores agent packages automatically generated by the Coding Agent, including agent.json and custom logic. Claude Code skill instructions are placed in the .claude directory to guide AI-assisted agent construction and optimization.

At runtime, each Hive agent is defined by an agent.json specification. This file includes the goal definition (goal), node list (nodes), edge connections (edges), and a default model configuration (default_model). Nodes may represent LLM calls, function executions, or routing logic. Edges support success, failure, and conditional transitions, enabling non-linear execution flows.

When a node executes, a NodeContext object is injected. NodeContext provides memory (shared cross-node state), llm (a multi-provider model client), tools (a registry of available tools), input (data from the previous node), and metadata (execution tracing information). This “dependency injection” design ensures that nodes remain stateless and highly testable while enabling large-scale composability.

AgentRunner serves as the execution core. Its lifecycle includes loading and validating agent.json (ensuring no cyclic dependencies), initializing MCP tool connections, establishing credential environments, and traversing the graph from the entry node. During execution, all inputs and outputs, token usage, and latency metrics are streamed in real time to a TUI dashboard for monitoring and observability.

One of Hive’s most forward-looking features is its Self-Healing mechanism. When runtime exceptions occur, a SelfHealingRunner initiates a “healing cycle.” This process includes diagnosing the failure (analyzing stack traces and source code), generating a patch (LLM-produced diff), writing updates back to the filesystem, hot-reloading modified modules, and resuming execution. Each failure is treated as a training signal, allowing the system to iteratively improve its success probability. Theoretically, as iterations increase, the probability of success P(S) converges upward.

In terms of extensibility, Hive standardizes tool invocation through MCP. Tools are registered via register_all_tools(), and each tool name is mapped to a unique runtime identifier to guarantee precise invocation. Current integrations include file processing, Slack/Gmail communication, HubSpot/Jira/Stripe systems, logging utilities, and web scraping. The tool layer remains isolated from the runtime, preventing external dependencies from contaminating execution stability.

Hive implements a layered memory system, including Short-Term Memory (STM), Long-Term Memory (LTM), and Reinforcement Learning Memory (RLM). It is transitioning to a message-based fine-grained persistence model, where each message is stored as an atomic record. Before executing an LLM node, relevant historical context can be precisely reconstructed per session. The system also supports “proactive compaction” strategies to manage token limits and extends LLMResponse to track reasoning tokens and cached tokens for accurate cost accounting.

For observability and evaluation, AgentEvaluator generates multidimensional performance metrics, including success rate, latency, cost, and composite scoring. FailureAnalyzer categorizes errors into input validation failures, logic errors, and external API failures. When error frequencies exceed defined thresholds, ImprovementTrigger automatically signals the Coding Agent to optimize prompt structures or validation logic. This establishes an automated evaluation-to-improvement feedback loop.

The development workflow is tightly integrated with Claude Code. Developers define goals, generate node graphs, apply design patterns, and auto-generate test cases through structured skill commands. Generated agents can be validated structurally via CLI commands and monitored in real time through the TUI dashboard. For learning purposes, manual_agent.py demonstrates how to construct a simple agent purely in Python without external APIs.

Overall, Aden Hive transforms Agents from “pre-scripted workflow executors” into “goal-driven, self-evolving systems.” Its core mechanisms—automatic goal-to-graph generation, graph-based execution, MCP-based decoupled tool integration, and runtime failure feedback with self-healing loops—form a cohesive architecture. This design enables Agents to progressively improve reliability and resilience in complex environments, representing a shift from manual maintenance toward autonomous system evolution in AI software engineering.

https://docs.google.com/document/d/1PyBzm2GCOswBNlKWpgJxOr8c...