▲	whynotminot 8 hours ago
		I would wager the main reason for this is the same reason it’s also hard to teach these skills to people: there’s not a lot of high quality training for distributed debugging of complex production issues. Competence comes from years of experience fighting fires. Very few people start their careers as SREs, it’s generally something they migrate into after enjoying it and showing aptitude for it. With that said, I wouldn’t expect this wall to hold up for too long. There has been a lot of low hanging fruit teaching models how to code. When that is saturated, the frontier companies will likely turn their attention to honing training environments for SRE style debug.
	▲	tetha 3 hours ago \| parent \| next [-]
		> I would wager the main reason for this is the same reason it’s also hard to teach these skills to people: there’s not a lot of high quality training for distributed debugging of complex production issues. Competence comes from years of experience fighting fires. The search space for a cause beyong a certain size can also be big. Very big. Like, at work we're at the beginning of where the powerlaw starts going nuts. Somewhere around 700 - 1000 services in production, across several datacenters, with a few dozen infrastructure clusters behind it. For each bug, if you looked into it, there'd probably by 20 - 30 changes, 10 - 20 anomalies, and 5 weird things someone noticed in the 30 minutes around it. People already struggle at triaging relevance of everything in this context. That's something I can see AI start helping and there were some talks about Meta doing just that - ranking changes and anomalies in order of relevance to a bug ticket so people don't run after other things. That's however just the reactive part of OPS and SRE work. The proactive part is much harder and oftentimes not technical. What if most negatively rated support cases run into a dark hole in a certain service, but the responsible team never allocates time to improve monitoring, because sales is on their butt for features? LLMs can identify this maybe, or help them implement the tracing faster, but those 10 minutes could also be spent on features for money. And what AI model told you to collect the metrics about support cases and resolution to even have that question?
	▲	hosh 6 hours ago \| parent \| prev \| next [-]
		I disagree. AI works as a better tool for teaching humans than to do the work themselves. While someone experienced in fighting fires can take intuitive leaps, the basic idea is still to synthesize a hypothesis from signals, validating the hypothesis, and coming up with mitigations and longer term fixes. This is a learned skill, and a team of people/AI will work better than someone solo. https://hazelweakly.me/blog/stop-building-ai-tools-backwards...
	▲	heliumtera 7 hours ago \| parent \| prev \| next [-]
		There is definitely more to the inability for models to perform well at SRE. One, it is not engineering, it is next token prediction, it is vibes. They could do Site Reliability Vibing or something like that. When we ask it to generate an image, any image will do it. We couldn't care less. Try to sculpt it, try to rotate it 45 degrees and all hell breaks loose. The image would be rotated but the hair color could change as well. Pure vibes! When you ask it to refactor your code, any pattern would do it. You could rearrange the code in infinite ways, rename variables in infinite ways without fundamentally breaking logic. You could make as many arbitrary bullshit abstraction and call it good, as people have done it for years with OOP. It does not matter at all, any result would do it in this cases. When you want to hit an specific gRPC endpoint, you need an specific address and the method expects an specific contract to be honored. This either matches or it doesn't. When you wish the llms could implement a solution that captures specifics syscalls from specifics hosts and send traces to an specific platform, using an specific protocol, consolidating records on a specific bucket...you have one state that satisfy your needs and 100 requirement that needs to necessarily be fulfilled. It either meet all the requirements or it's no good. It truly is different from Vibing and llms will never be able to do in this. Maybe agents will, depending on the harnesses, on the systems in place, but one model just generate words words words with no care about nothing else
	▲	lysace 8 hours ago \| parent \| prev [-]
		> With that said, I wouldn’t expect this wall to hold up for too long. The models are already so good at the traditionally hard stuff: collecting that insane amount of detailed knowledge across so many different domains, languages and software stacks.