When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.

▲

SequoiaHope 12 hours ago | parent | next [-]

Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.

▲

lukan 4 hours ago | parent [-]

Are those long unsupervised sessions useful? In the sense, do they produce useful code or do you throw most of it away?

	▲	brookst an hour ago \| parent [-]
		I get very useful code from long sessions. It’s all about having a framework of clear documentation, a clear multi-step plan including validation against docs and critical code reviews, acceptance criteria, and closed-loop debugging (it can launch/restsart the app, control it, and monitor logs) I am heavily involved in developing those, and then routinely let opus run overnight and have either flawless or nearly flawless product in the morning.

▲

MikeNotThePope 12 hours ago | parent | prev | next [-]

I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.

▲

ashdksnndck 12 hours ago | parent | next [-]

My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.

▲

tudelo 12 hours ago | parent | prev [-]

I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.

▲

takwatanabe 2 hours ago | parent [-]

Right. At Opus 4.6 rates, once you're at 700k context, each tool call costs ~$1 just for cache reads alone. 100 tool calls = $100+ before you even count outputs. 'Standard pricing' is doing a lot of work here lol

	▲	brookst an hour ago \| parent [-]
		Cache reads don’t count as input tokens you pay for lol. https://www.claudecodecamp.com/p/how-prompt-caching-actually...

▲

boredtofears 12 hours ago | parent | prev [-]

All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)

▲

grafmax 12 hours ago | parent | next [-]

A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.

▲

not_kurt_godel 11 hours ago | parent [-]

Just curious, what kind of work are you doing where agentic workflows are consistently able to make notable progress semi-autonomously in parallel? Hearing people are doing this, supposedly productively/successfully, kind of blows my mind given my near-daily in-depth LLM usage on complex codebases spanning the full stack from backend to frontend. It's rare for me to have a conversation where the LLM (usually Opus 4.6 these days) lasts 30 minutes without losing the plot. And when it does last that long, I usually become the bottleneck in terms of having to think about design/product/engineering decisions; having more agents wouldn't be helpful even if they all functioned perfectly.

▲

avereveard 10 hours ago | parent [-]

I've passed that bottleneck with a review task that produces engineering recommendations along six axis (encapsulation, decoupling, simplification, dedoupling, security, reduce documentation drift) and a ideation tasks that gives per component a new feature idea, an idea to improve an existing feature, an idea to expand a feature to be more useful. These two generate constant bulk work that I move into new chat where it's grouped by changeset and sent to sub agent for protecting the context window.

What I'm doing mostly these days is maintaining a goal.md (project direction) and spec.md (coding and process standards, global across projects). And new macro tasks development, I've one under work that is meant to automatically build png mockup and self review.

▲

not_kurt_godel 10 hours ago | parent [-]

What are you using to orchestrate/apply changes? Claude CLI?

	▲	avereveard 8 hours ago \| parent [-]
		I prefer in IDE tools because I can review changes and pull in context faster. At home I use roo code, at work kiro. Tbh as long as it has task delegation I'm happy with it.

▲

chrisweekly 12 hours ago | parent | prev | next [-]

weary (tired) -> wary (cautious)

▲

saaaaaam 12 hours ago | parent | prev [-]

Wary, not weary. Wary: cautious. Weary: tired.

	▲	dentalnanobot 6 hours ago \| parent [-]
		This is really common, I think because there’s also “leery” - cautious, distrustful, suspicious.