Remix.run Logo
hinkley a day ago

Once a bug is closed the value of those logs starts to decay. And the fact is that we get punished for working on things that aren’t in the sprint, and working on “done done” stories is one of those ways. Even if you want to clean up your mess, there’s incentive not to. And many of us very clearly don’t like to clean up our own messes, so a ready excuse gets them out of the conversation about a task they don’t want to be voluntold to do.

hnlmorg 21 hours ago | parent | next [-]

In DevOps (et al) the value of those logs doesn’t decay in the same way it does in pure dev.

Also, as I pointed out elsewhere, modern observability platforms enable a way to have those debug logs available as an archive that can be optionally ingested after an incident but without filling up your regular quota of indexed logs. Thus giving you the best of both worlds (all logging but without the expense and flooding your daily logs with debug messages)

hinkley 2 hours ago | parent [-]

> In DevOps (et al) the value of those logs doesn’t decay in the same way it does in pure dev.

I’ve been on-call, and I think you’re cherry picking. The world has too many devs who still debug with log statements. Those logs never had any value to anyone but the original author.

I’ve also seen too many devs who are perfectly happy trying to write vastly complex Splunk queries to generate charts, and those charts tend to break in a production incident becausea bunch of people load them at once and blow up Splunk’s rate limiting. I’ve almost never had this problem with grafana. It’s true that you can make a dashboard with long-term trends that will fall over, but you wouldn’t use that dashboard for triage, unless you make one that tries to do both and the solution is split it into two dashboards.

If you want to make a successfully scaling organization, you need a way for new members to join your core of troubleshooters, without pulling resources away from solving the trouble. So they can’t demand time, resources or attention that are in short supply from the core group.

Grafana fits that yardstick much better than log analyzers.

hnlmorg an hour ago | parent [-]

You’re arguing a different argument.

You’re making a case that cryptical logs messages are bad. And I agree.

You’re also making a case that logs are only piece of the telemetry ecosystem. And I agree there too.

What I’m arguing is that there isn’t a need to filter logs based on cost because you can still work with them in observability platforms in a cost effective way.

Lastly, I didn’t say everything should be instantly available. Long term logs shouldn’t be in the same expensive storage pool as recent logs. But there should be a convenient way to import from older log archives into your immediate log querying tools (statement here is intentionally vague because different observability platforms will engineer this differently and call this process by different names)

As for complex queries, regardless of how easy to use your observability platform is, however many saved queries and dashboards you have built, there’s always going to be a need for upskilling your staff. That’s an inescapable problem.

pstuart a day ago | parent | prev [-]

My approach for this is to add dev logging IN ALL CAPS so that it stands out as ugly and "need adjusting", which is to delete it before merging to main.

hinkley 2 hours ago | parent [-]

On my last project I was able to convince the team to clean up feature toggles before closing out epics. But I didn’t make much headway on logs. I came at them sideways and got all but one of my coworkers to stop trying to generate charts from Splunk and use Grafana instead. And I squeezed him by adding stats for things he liked to look at b