This is a little damning of the way Google does things honestly.

>When an app runs on a single machine, you can often trace an error by scrolling through a log file. But when it runs across 50 microservices, that single request gets scattered into a chaotic firehose of disconnected events.

Yep this is about Google. It's painful for humans to debug and it's also an extremely bespoke issue to deal with. No one else has quite the same level of clusterfuck and there's going to be no training for LLMs on this.

▲

youknownothing 8 hours ago | parent | next [-]

isn't that what trace IDs are for?

▲

belval 7 hours ago | parent | next [-]

Yeah I don't know their stack but I have a service that is a collection of microservices and Opus can debug them fine by aggregating the logs tied to the same faulty request ID.

In general for those tasks though the question is more "How would a human do it". If it's impossible for a human because your tooling is so bad you can't even get the logs across services for a single ID, that seems like a pretty serious design issue.

In general looking at the prompt though, this is also not very representative. You don't have an SOP that you can share with your agent? How do you expect new hires to onboard?

	▲	pixl97 6 hours ago \| parent [-]
		>How do you expect new hires to onboard? I've seen some places that pretty much say "Good luck, we hope you can swim. Life preserver not provided"

▲

pixl97 7 hours ago | parent | prev [-]

Much like nested errors, management of trace IDs becomes difficult under scale as you will start getting multiple correlation references in complex systems.

▲

tayo42 8 hours ago | parent | prev [-]

It's bespoke to debug across multiple services?

This seems like typical work in any business that isn't trivial.

	▲	AnotherGoodName 7 hours ago \| parent [-]
		Not to the same extent. Microservices aren't actually about making things better for developers in any way. It's simply a way to address a scaling issue. Eg. Facebook (i've worked at Meta and Google amongst others so a good way to compare extremes) is entirely a monolith. You type a line of code, hit refresh and you see it, running fully in the context of everything else your dev server does. It's still statically typed so a type error is seen quickly in the full context of everything that the server can do and in general there's just no impetus to move to microservices since the deployment of the monolith takes no time. Every server running Facebook runs the exact same image. That's not to say Hack is a perfect language or anything. It's basically PHP made to look and act like Java which isn't great, but the fact is you never ever think of how the code runs and interacts in context of the microservice environment. You don't need to. Everyone who's worked at Meta and Google has the opinion that Meta moves faster and this is part of the reason. Some companies have architectures that can't deploy like this. This is the reason you move to microservices. It's not at all a developer velocity win. It's just needed if you have frameworks that don't allow you to run and deploy "all the code ever written in the company" in a reasonable way. You need to break it up in modular pieces that have defined boundaries so that you only run the parts you need as you develop (defined boundaries are a dev win sure but that can be done without microservices). Google has gotten to the point where things are getting really fined grained and honesty chaotic. Moving to a portion of code to its own microservice is basically a promo bait 6 month project, often done without justification other than "everything should be its own microservice". In my time at Google i never heard "what benefit do we get if this is a microservice?" it's just assumed to always be a good thing. 50 interacting microservices to go through in a trace is at the point where the only place I've seen such a thing is Google.