Remix.run Logo
hansvm 4 hours ago

If that's an issue (visibility into middle layers) it just means your events aren't wide enough. There's no fundamental difference between log.error(data) and wide_event.attach(error, data), nor similar schemes using parameters rather that context/global-based state.

There are still use cases for one or the other strategy, but I'm not a fan of this explanation in either direction.

yallpendantools 2 hours ago | parent [-]

> If that's an issue (visibility into middle layers) it just means your events aren't wide enough.

I hate this kind of No-True-Scotsman handwaves for how a certain approach is supposed to solve my problems. "If brute-force search is not solving all your problems, it just means your EC2 servers are not beefy enough."

I gotta admit, I don't quite "get" TFA's point and the one issue that jumped out at me while reading it and your comment is that sooner than later your wide events just become fat, supposedly-still-human-readable JSON dumps.

I think a machine-parseable log-line format is still better than wide events, each line hopefully correlated with a request id though in practice I find that user id + time correlation isn't that bad either.

>> [TFA] Wide Event: A single, context-rich log event emitted per request per service. Instead of 13 log lines for one request, you emit 1 line with 50+ fields containing everything you might need to debug.

I am not convinced this is supposed to help the poor soul who has to debug an incident at 2AM. Take for example a function that has to watch out for a special kind of user (`isUserFoo`) where "special kind" is defined as a metric on five user attributes. I.e.,

    lambda isUserFoo(u): u.isA && (u.isB || u.isC) && (u.isD || u.isE)
With usual logging I might find

    <timestamp> : <function> : {"level": "INFO", "requestID": "xxxaaa", "msg": "user is foo"}
Which immediately tells me that foo-ness is something I might want to pay attention to in this context.

With wide events, as I understand it, either you log the user in the wide event dump with attributes A to E (and potentially more!) or coalesce these into a boolean field `isUserFoo`. None of which tells me that foo-ness might be something that might be relevant in this context.

Multiply that with all the possible special-cases any logging unit might have to deal with. There's bar-ness which is also dependent on attributes A-E but with different logical connectives. There's baz-ness which is `isUserFoo(u) XOR (217828 < u.zCount < 3141592)`. The wide event is soooo context-rich I'm drowning.

hansvm 25 minutes ago | parent [-]

Your objection, as I understand it, is some combination of "no true Scotsman" combined with complaints about wide events themselves.

To the first point (no true Scotsman), I really don't think that applies in the slightest. The post I'm replying to said (paraphrasing) that middle-layer observability is hard with wide events and easy with logs. My counter is that the objection has nothing to do with wide events vs logs, since in both scenarios you can choose to include or omit more information with the same syntactic (and similar runtime overhead) ease. I think that's qualitatively different from other NTS arguments like TDD, in that their complaint is "I don't have enough information if I don't send it somewhere" and my objection is just "have you tried sending it somewhere?" There isn't an infinite stream of counter-arguments about holding the tool wrong; there's the very dumbest suggestion a rubber duck might possibly provide about their particular complaint, which is fully and easily compatible with wide events in every incarnation I've seen.

To the second point (wide events aren't especially useful and/or suck), I think a proper argument for them is a bit more nuanced (and I agree that they aren't a panacea and aren't without their drawbacks). I'll devote the rest of my (hopefully brief) comment to this idea.

1. Your counter-example falls prey to the same flaw as the post I responded to. If you want information then just send that information somewhere. Wide events don't stop you from gathering data you care about. If you need a requestID then it likely exists in the event already. If you need a message then _hopefully_ that's reasonably encoded in your choice of sub-field, and if it's not then you're free to tack on that sort of metadata as well.

2. Your next objection is about the wide event being so context-rich as to be a problem in its own right. I'm sympathetic to that issue, but normal logging doesn't avoid it. It takes exactly one production issue where you can't tie together related events (or else can tie them together but only via hacks which sometimes merge unrelated events with similar strings) for you to realize that completely disjoint log lines aren't exactly a safe fallback. If you have so much context-dependent complexity that a wide event is hard to interpret then linear logs are going to be a pain in the ass as well.

Mildly addressing the _actual_ pros and cons: Logs and wide events are both capapable of transmitting the same information. One reasonable frame of reference is viewing wide events as "pre-joined" with a side helping of compiler enforcement of the structure.

It's trivially possible to produce two log lines in unrelated parts of a code base which no possible parser can disambiguate. That's not (usually) possible when you have some data specification (your wide event) mediating the madness.

It's sometimes possible with normal logs to join on things which matter (as in your requestID example), but it's always possible with wide events since the relevant joins are executed by construction (also substantially more cheaply than a post-hoc join). Moreover, when you have to sub-sample, wide events give an easy strategy for ensuring your joins work (you sub-sample at a wide event level rather than a log-line level) -- it's not required; I've worked on systems with a "log seed" or whatever which manage that joinability in the face of sub-sampling, but it's more likely to "just work" with wide events.

The real argument in favor of wide events, IMO, is that it encourages returning information a caller is likely to care about at every level of the stack. You don't just get potentially slightly better logs; you're able to leverage the information in better tests and other hooks into the system in question. Parsing logs for every little one-off task sucks, and systems designed to be treated that way tend to suck and be nearly impossible to modify as desired if you actually have to interact with "logs" programatically.

It's still just one design choice in a sea of other tradeoffs (and one I'm only half-heartledly pursuing at $WORK since we definitely have some constraints which are solved by neither wide events nor traditional logging), BUT:

1. My argument against some random person's choice of counter-argument is perfectly sound. Nothing they said depended on wide events in the slightest, which was my core complaint, and I'm very mildly offended that anyone capable of writing something as otherwise sane and structured as your response would think otherwise.

2. Wide events do have a purpose, and your response doesn't seem to recognize that point in the design space. TFA wasn't the most enjoyable thing I've ever read, but I don't think the core ideas were that opaque, and I don't think a moment's consideration of carry-on implications would be out of the question either. I could be very wrong about the requisite background to understand the article or something, but I'm surprised to see responses of any nature which engage with irrelevant minutea rather than the subset of core benefits TFA chose to highlight (and I'm even more surprised to see anything in favor of or against wide events given my stated position that I care more about the faulty argument against them than whether they're good or bad)..