Remix.run Logo
joshuamoyers 3 days ago

I really like this approach. Nice job!

> We also plan to compile solved steps into micro‑policies. If you're running something like a RPA task or similar workflow as before, you can simply run the execution locally (with archon-mini running locally) and not have to worry about the planning. Over time, the planner is a background teacher, not a crutch.

Conceptually, I really like this - why re-do the work of reasoning about an already solved task? Just do it again. For some plausibly large majority of things, this could speed things up considerably.

> In the future we hope to run a streaming capture pipeline similar to Gemma 3. Consuming frames at 20–30 fps, emitting actions at 5–10 Hz, and verifying state on each commit.

I love targets like this. It makes you tune the architecture and abstractions to push the boundary of whats possible with a traditional agent loop.

The salience heat map compression is a great idea. I think you could take this a step further and tune a model so that it compresses an image into a textual semantic/interactive element hierarchy. This is effectively what browser-use is doing, just using javascript instead of a vision model.

This seems like a task that would benefit from narrow focus. I'm aware of the "Bitter Lesson," but my intuition seems to tell me that chaining together fit to purpose classification as an input to an intelligent planning system is the way to go.