This is a confusing piece. A lot of it would make sense if Weakly was talking about a coding agent (a particular flavor of agent that worked more like how antirez just said he prefers coding with AI in 2025 --- more manual, more advisory, less do-ing). But she's not: she's talking about agents that assist in investigating and resolving operations incidents.

The fulcrum of Weakly's argument is that agents should stay in their lane, offering helpful Clippy-like suggestions and letting humans drive. But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents? AI tools are fundamentally better at this task than humans are, for the same reason that computers are better at playing chess.

What Weakly seems to be doing is laying out a bright line between advising engineers and actually performing actions --- any kind of action, other than suggestions (and only those suggestions the human driver would want, and wouldn't prefer to learn and upskill on their own). That's not the right line. There are actions AI tools shouldn't perform autonomously (I certainly wouldn't let one run a Terraform apply), but there are plenty of actions where it doesn't make sense to stop them.

The purpose of incident resolution is to resolve incidents.

▲

cmiles74 3 days ago | parent | next [-]

There's no AI tool today that will resolve incidents to anyone's satisfaction. People need to be in the loop not only to take responsibility but to make sure the right actions are performed.

▲

kookamamie 2 days ago | parent | next [-]

Exactly. There seems to be this fantasy in which you can somehow string different kinds of agents together, one designing and one reviewing, and that finally producing something superior as output - I just don't buy that.

Sounds like heuristics added on top of statistics, which is trying to remedy some root problem with another hack.

▲

rusticpenn 2 days ago | parent | next [-]

The whole field of metaheuristic algorithms rests on a similar idea. a lot of stupid "agents" finding a good solution. by metaheuristics i mean genetic algorithms, PSO, ACO etc.

▲

yunohn a day ago | parent | prev [-]

Hmm, but this provably works right now though? All LLMs perform better with roleplay direction and focused scope. Using coding agents with plan then execute makes noticeable quality improvements.

▲

kookamamie 21 hours ago | parent [-]

Why isn't this the de facto then? Anyone packaging such commercial solutions?

	▲	yunohn 21 hours ago \| parent [-]
		Most agents solutions have modes or roles already. There’s no standard, but this is already being used IRL. Heck, even system prompts are role play too.

▲

tptacek 3 days ago | parent | prev [-]

Nobody disputes this. Weakly posits a bright line between agents suggesting active steps and agents actually performing active steps. The problem is that during incident investigations, some active steps make a lot of sense for agents to perform, and others don't; the line isn't where she seems to claim it is.

▲

cmiles74 3 days ago | parent [-]

Understood. To your example about the logs, my concern would be be that the AI chooses the wrong thing to focus on and people decide there’s nothing of interest in the logs, thus overlooking a vital clue.

	▲	tptacek 3 days ago \| parent [-]
		You wouldn't anticipate using AI tools to one-shot complex incidents, just to rapidly surface competing hypotheses.

▲

miltonlost 3 days ago | parent | prev | next [-]

It's not a confusing piece if you don't skip/ignore the first part. You're using her one example and removing the portion about how human beings learn and how AI is actively removing that process. The incident resolution is an example of her general point.

▲

tptacek 3 days ago | parent [-]

I feel pretty comfortable with how my comment captures the context of the whole piece, which of course I did read. Again: what's weird about this is that the first part would be pretty coherent and defensible if applied to coding agents (some people will want to work the way she spells out, especially earlier in their career, some people won't), but doesn't make as much sense for the example she uses for the remaining 2/3rds of the piece.

▲

JoshTriplett 3 days ago | parent | next [-]

It makes perfect sense for that case too. If you let AI do the whole job of incident handling (and leaving aside the problem where they'll get it horribly wrong), that also has the same problem of breaking the processes by which people learn. (You could make the classic "calculator" vs "long division" argument here, but one difference is, calculators are reliable.)

Also:

> some people will want to work the way she spells out, especially earlier in their career

If you're going to be insulting by implying that only newbies should be cautious about AI preventing them from learning, be explicit about it.

▲

tptacek 3 days ago | parent | next [-]

You can simply disagree with me and we can hash it out. The "early career" thing is something Weakly herself has called out.

I disagree with you that incident responders learn best by e.g. groveling through OpenSearch clusters themselves. In fact, I think the opposite thing is true: LLM agents do interesting things that humans don't think to do, and also can put more hypotheses on the table for incident responders to consider, faster, rather than the ordinary process of rabbitholing serially down individual hypothesis, 20-30 minutes at a time, never seeing the forest for the trees.

I think the same thing is probably true of things like "dumping complicated iproute2 routing table configurations" or "inspecting current DNS state". I know it to be the case for LVM2 debugging†!

Note that these are all active investigation steps, that involve the LLM agent actually doing stuff, but none of it is plausibly destructive.

† Albeit tediously, with me shuttling things to and from an LLM rather than an agent doing things; this sucks, but we haven't solved the security issues yet.

▲

JoshTriplett 3 days ago | parent [-]

The only mention I see of early-career coming up in the article is "matches how I would teach an early career engineer the process of managing an incident". That isn't a claim that only early career engineers learn this way or benefit from working in this style. Your comment implied that the primary people who might want to work in the way proposed in this article are those early in their career. I would, indeed, disagree with that.

Consider, by way of example, the classic problem of teaching someone to find information. If someone asks "how do I X" and you answer "by doing Y", they have learned one thing (and will hopefully retain it). If someone asks "how do I X" and you answer "here's the search I did to find the answer of Y", they have now learned two things, and one of them reinforces a critical skill they should be using throughout their career.

I am not suggesting that incident response should be done entirely by hand, or that there's zero place for AI. AI is somewhat good at, for instance, looking at a huge amount of information at once and pointing towards things that might warrant a closer look. I'm nonetheless agreeing with the point that the human should be in the loop to a large degree.

That also partly addresses the fundamental security problems of letting AI run commands in production, though in practice I do think it likely that people will run commands presented to them without careful checking.

> none of it is plausibly destructive

In theory, you could have a safelist of ways to gather information non-destructively. In practice, it would not surprise me at all if pople don't. I think it's very likely that many people will deploy AI tools in production and not solve any of the security issues, and incidents will result.

I am all for the concept of having a giant dashboard that collects and presents any non-destructive information rapidly. That tool is useful for a human, too. (Along with presenting the commands that were used to obtain that information.)

▲

tptacek 3 days ago | parent [-]

Previous writing, Josh, and I'm done now litigating whether I wrote the "early career" thing in bad faith and expect you to be too.

I don't see you materially disagreeing with me about anything. I read Weakly to be saying that AI incident response tools --- the main focus of her piece --- should operate with hands tied behind their back, delegating nondestructive active investigation steps back to human hands in order to create opportunities for learning. I think that's a bad line to draw. In fact, I think it's unlikely to help people learn --- seeing the results of investigative steps all lined up next to each other and synthesized is a powerful way to learn those techniques for yourself.

▲

jpc0 3 days ago | parent [-]

I’m going to but in here.

I think the point the article is making is to observe the patterns humans (hopefully good ones) follow to resolve issues and build paths to make that quicker.

So at first the AI does almost nothing, it observes that in general the human will search for specific logs. If it observes that behaviour enough it then, on its own or through a ticket, builds a Ui flow that enables that behaviour. So now it doesn’t search the log but offers a button to search the log with some prefilled params.

The human likely wanted to perform that action and it has now become easier.

This reinforces good behaviour if you don’t know the steps usually followed and doesn’t pigeonhole someone into an action plan if it is unrelated.

Is this much much harder, yes it is than just building an agent that does X. But it’s a significantly better tool because it doesn’t have humans lose the ability to reason about the process. It just makes them more efficient.

▲

tptacek 3 days ago | parent [-]

We're just disagreeing and hashing this out, but, no, I don't think that's accurate. AI tools don't watch what human operators in a specific infrastructure do and then try to replicate them. They do things autonomously based on their own voluminous training information, and those things include lots of steps that humans are unlikely to take, and that are useful.

One intuitive way to think about this is that any human operator is prepared to bring a subset of investigative approaches to bear on a problem; they've had exposure to a tiny subset of all problems. Meanwhile, agents have exposure to a vast corpus of diagnostic case studies.

Further, agents can quickly operate on higher-order information: a human attempting to run down an anomaly first has to think about where to look for the anomaly, and then decide to do further investigation based on it. An AI agent can issue tool calls in parallel and quickly digest lots of information, spotting anomalies without any real intentionality or deliberation, which then get fed back into context where they're reasoned about naturally as if they were axioms available at the beginning of the incident.

As a simple example: you've got a corrupted DeviceMapper volume somewhere, you're on the host with it, all you know is you're seeing dmesg errors about it; you just dump a bunch of lvs/dmsetup output into a chat window. 5-10 seconds later the LLM is cross referencing lines and noticing block sizes aren't matching up. It just automatically (though lossily) spots stuff like this, in ways humans can't.

It's important to keep perspective: the value add here is that AI tools can quickly, by taking active diagnostic steps, surface several hypotheses about the cause of an incident. I'm not claiming they one-shot incidents, or that their hypotheses all tend to be good. Rather, it's just that if you're a skilled operator, having a menu of instantly generated hypotheses to start from, diligently documented, is well worth whatever the token cost is to generate it.

	▲	jakebennet89 2 days ago \| parent [-]
		[dead]

▲

3 days ago | parent | prev [-]

[deleted]

▲

mattmanser 3 days ago | parent | prev | next [-]

I know you carry on to have a good argument down thread, but why do you feel the first part defensible?

The author's saying great products don't come from solo devs. Linux? Dropbox? Gmail? Ruby on Rails? Python? The list is literally endless.

But the author then claims that all great products come from committee? I've seen plenty of products die by committee. I've never seen one made by it.

Their initial argument is seriously flawed, and not at all defensible. It doesn't match reality.

	▲	tptacek 3 days ago \| parent \| next [-]
		I just don't want to engage with it; I'm willing to stipulate those points. I'm really fixated on the strange example Weakly used to demonstrate why these tools shouldn't actually do things, but instead just whisper in the ears of humans. Like, you can actually make that argument about coding! I don't agree, but I see how the argument goes. I don't see how it makes any sense at all for incident response.
	▲	jakelazaroff 2 days ago \| parent \| prev [-]
		I know the "what you refer to as Linux is, in fact, GNU/Linux" thing has become a sort of tongue-in-cheek meme, but it actually applies here: crediting Linus Torvalds alone for the success of Linux ignores crucial contributions from RMS, Ken Thompson, Dennis Ritchie and probably dozens or hundreds of others. Ruby on Rails? Are we talking about the Ruby part (Matz) or the Rails part (DHH)? Dropbox was founded by Drew Houston and Arash Ferdowsi. The initial Gmail development team had multiple people plus the infrastructure and resources of Google. I'm not sure why people love the lone genius story so much, but it's definitely the exception and not the rule.

▲

ofjcihen 3 days ago | parent | prev [-]

[flagged]

▲

bubblyworld 2 days ago | parent | next [-]

Let's not do this kind of thing here? There's plenty to engage with in their comments without resorting to ad-hominems or similar.

(your comment is pretty mild, I'm just worried about the general trend on HN)

▲

jgon 2 days ago | parent [-]

It's not an ad-hominem. When people are talking their book, you should know that they're talking their book, and that knowledge doesn't have to negate any sound points they're making or cause you to disregard everything they're saying, it just colors your evaluation of their arguments, as it should. I don't think this is controversial, and seeing that comment flagged is pretty disheartening, adding context is almost never a bad thing.

▲

bubblyworld 2 days ago | parent | next [-]

It is quite literally an ad-hominem, in that it is aimed at the person, not the argument. The issue isn't that more context is bad (I agree with you, it's useful), it's that as a policy for a discussion board I think allowing this kind of thing is a bad idea. People can be mistaken, or lie, and comments get ugly fast when it's personal. Not to mention the fine line between this and doxxing.

(e.g. here, the OP has claimed that they do not in fact have a vested interest in AI - so was this "context" really a good thing?)

▲

ofjcihen 2 days ago | parent | prev | next [-]

I appreciate this response and I’m also as confused as you are. It’s information relevant to the conversation, not an accusation (it would be an odd accusation to make, no?)

▲

tptacek 2 days ago | parent | prev [-]

I don't care, in part because the claim is false, but there's literally a guideline saying you can't do this, so I guess it's worth knowing that you're wrong too.

Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.

▲

ofjcihen 2 days ago | parent [-]

In this case it’s relevant to the discussion as the user was questioning why you were making the points you were.

It’s not an accusation of shilling, it’s context where context was requested.

As a test imagine if you changed the context to something good such as “AI achieves the unthinkable” and the responding user asked why someone was so optimistic about the achievement.

It’s relevant context to the conversation, nothing else.

▲

tptacek 2 days ago | parent [-]

It's false context meant to impeach my arguments. Not a close call.

	▲	ofjcihen 2 days ago \| parent [-]
		I promise that’s not the case but we can let it go for now.

▲

2 days ago | parent | prev [-]

[deleted]

▲

TimPC 2 days ago | parent | prev | next [-]

I think the problem is do you want to give the AI access to prod. See the recent example where AI wiped a DB despite instructions not to (because AI sometimes does things more often when you tell it not to do something because the negative from not is not always reliably picked up)

▲

bubblyworld 2 days ago | parent | prev | next [-]

I know you've got a subthread about this exact idea, but I do think there is some value in manually performing the debugging process if (and perhaps only if) your goal is to improve your overall programming ability.

I guess the chess analogy would be that it makes a lot of sense to analyse positions yourself, even though Leela and Stockfish can do a far more thorough job in much less time. Of course, if you just need to know the best move right now, you would use the AI, and professionals do that all the time.

But as a decently strong chess player I cannot imagine improving without doing this kind of manual practice (at least beyond a basic level of skill like knowing how pieces move). Grandmasters routinely drill tactics exercises, for instance, even though they are "mundane" at that level of ability.

I guess the crux of it - do you think AI+person learns faster than just person for this kind of thing? And why? It's not obvious to me either way (and another question is whether the skill is even relevant any more... I think so, but I know people who don't).

▲

kasey_junk 2 days ago | parent [-]

But you can do that _after_ the incident. When things are not on fire.

You don’t run analysis of your chess game when the clock is ticking.

	▲	bubblyworld 2 days ago \| parent [-]
		Sure, if something is super critical then you should solve the problem as fast as possible. I'm not debating that. But there's probably a middle ground there somewhere for less critical issues. I suspect the process of generating and falsifying hypotheses quickly is the skill, and I don't know if you can effectively train that skill after an incident, when you've already seen the resolution. Chess is maybe not a great analogy, because there are rarely objectively correct answers, only hard trade-offs. For that reason there's still a lot of value in reviewing a finished game.

▲

otterley 2 days ago | parent | prev | next [-]

> There are actions AI tools shouldn't perform autonomously (I certainly wouldn't let one run a Terraform apply), but there are plenty of actions where it doesn't make sense to stop them.

I'm curious as to where you would draw the line. Assuming you've adhered to DevOps best practices, most--if not all--changes would require some sort of code commit and promotion through successive environments to reach production. This isn't just application code, of course; it's also your infrastructure. In such a situation, what would you permit an agent to autonomously perform in the course of incident resolution?

▲

tptacek a day ago | parent [-]

During incident resolution, most of the actions an operator takes are diagnostic commands, not changes.

▲

otterley 20 hours ago | parent [-]

The number one cause of incidents is change, and the number one response to them is to initiate a rollback. Maybe you’re right about investigation, which requires no changes, but resolution requires action, which does.

In any event, you said:

> What Weakly seems to be doing is laying out a bright line between advising engineers and actually performing actions --- any kind of action, other than suggestions (and only those suggestions the human driver would want, and wouldn't prefer to learn and upskill on their own). That's not the right line.

So what’s your quibble exactly? Those suggestions would come from autonomous analyses, would they not? What is the right line, in your view?

	▲	tptacek 19 hours ago \| parent [-]
		I would not in 2025 during an incident response have an agent do speculative changes, or really any changes at all. I would have an agent perform diagnostic steps: dumping devicemapper tables, iproute2 configurations, nftables rules, BGP advertisements, Consul data, and, especially, logs and oTel telemetry. Weakly's article is in large part about not allowing agents to do the things in the second category there.

▲

shrumm a day ago | parent | prev | next [-]

> But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents?

Agreed! I think about this using Weakly's own reference to "standing on the shoulders of giants."

To me, building abstractions to handle tedious work is how we do that. We moved from assembly to compilers, and from manual memory management to garbage collectors. That wasn't "deskilling" - it just freed us up to solve more interesting problems at a higher level.

Manually crawling through logs feels like the next thing we should happily give up. It's painful, and I don't know many engineers who enjoy it.

Disclaimer: I'm very biased - working on an agent for this exact use case.

▲

phillipcarter 3 days ago | parent | prev | next [-]

Lost in a bit of the discourse around anomaly detection and incident management is that not all problems are equal. Many of them actually are automatable to some extent. I think the issue is understanding when something is sufficiently offloadable to some cognitive processor vs. when you really do need a human engineer involved. To your point, yes, they are better at detecting patterns at scale … until they’re not. Or knowing if a pattern is meaningful. Of course not all humans can fill these gaps either.

▲

hiAndrewQuinn 2 days ago | parent | prev [-]

>The fulcrum of Weakly's argument is that agents should stay in their lane, offering helpful Clippy-like suggestions and letting humans drive. But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents?

See also: Tool AIs Want To Be Agent AIs.

https://gwern.net/tool-ai

Predicted almost a decade ago.