Remix.run Logo
andai 11 hours ago

Yeah, came here to mention that too!

From the article:

>To manage all this, I built laconic, an agentic researcher specifically optimized for running in a constrained 8K context window. It manages the LLM context like an operating system's virtual memory manager—it "pages out" the irrelevant baggage of a conversation, keeping only the absolute most critical facts in the active LLM context window.

The 8K part is the most startling to me. Is that still a thing? I worked under that constraint in 2023 in the early GPT-4 days. I believe Ollama still has the default context window set to 8K for some reason. But the model mentioned on laconic GitHub (Qwen3:4B) should support 32K. (Still pretty small, but.. ;)

I'll have to take a proper look at the architecture, extreme context engineering is a special interest of mine :) Back when Auto-GPT was a thing (think OpenClaw but in 2023), I realized that what most people were using it for was just internet research, and that you could get better results, cheaper, faster, and deterministically, by just writing a 30 line Python script.

Google search (or DDG) -> Scrape top N results -> Shove into LLM for summarization (with optional user query) -> Meta-summary.

In such straightforward, specialized scenarios, letting the LLM drive was, and still is, "swatting a fly with a plasma cannon."

(The analog these days would be that many people would be better off asking Claw to write a scraper for them, than having it drive Chromium 24/7...)

jstanley 9 hours ago | parent [-]

> (The analog these days would be that many people would be better off asking Claw to write a scraper for them, than having it drive Chromium 24/7...)

Possibly. But possibly you have a very long tail of sites that you hardly ever look at, and that change more frequently than you use them, and maintaining the scraper is harder work than just using Chromium.

The dream is that the Claw would judge for itself whether to write a scraper or hand-drive the browser.

That might happen more easily if LLMs were a bit lazier. If they didn't like doing drudgery they would be motivated to automate it away. Unfortunately they are much too willing to do long, boring, repetitive tasks.

andai 9 hours ago | parent [-]

Yeah, I think the ideal setup is two-tier.

extremely lazy, large model

        +
extremely diligent Ralph

Not sure if top model should be the biggest one though. I hear opposite opinions there. Small model which delegates coding to bigger models, vs big model which delegates coding to small models.

The issue is you don't want the main driver to be big, but it needs to be big enough to have common sense w.r.t. delegating both up[0] and down...

[0] i.e. "too hard for me, I will ping Opus ..." :) do models have that level of self awareness? I wanna say it can be after a failed attempt, but my failure mode is that the model "succeeds" but the solution is total ass.

drewstiff 8 hours ago | parent [-]

Re: your footnote, Anthropic certainly seem to think so [0]

[0] https://claude.com/blog/the-advisor-strategy