Remix.run Logo
dcminter 8 hours ago

I've generally found that structured logs that include a correlation ID make it quite easy to narrow down the general area or exact cause of problems. Usually (in enterprise orgs) via Splunk or Datadog.

Where I've had problems it's usually been one of:

There wasn't anything logged in the error block. A comment saying "never happens" is often discovered later :)

Too much was logged and someone mandated dialing the logging down to save costs. Sigh.

A new thread was started and the thread-local details including the correlation ID got lost, then the error occurred downstream of that. I'd like better solutions for that one.

Edit: Incidentally a correlation ID is not (necessarily) the same thing as a request ID. An API often needs to allow for the caller making multiple calls to achieve an objective; 5 request IDs might be tied to a single correlation ID.

loglog 6 hours ago | parent [-]

Java has a solution for the thread problem: Scoped Values [0]. If only the logging+tracing libraries would start using it...

[0] https://openjdk.org/jeps/506

dcminter 6 hours ago | parent [-]

Oh, excellent, these slipped under my radar. Sounds extremely promising and I do mostly work in Java!