Remix.run Logo
deathanatos 2 days ago

You expect to not be responsible for what happens to the software you put into production?

(… and I'd like to avoid distracting arguments that amount to "my company does on-call badly" — yeah, those problems do exist and we should strive to fix them. But if I'm to not categorize the argument here as the baby with the bathwater, then we need something to replace on-call with. Prod goes down on a Saturday afternoon; are you going to tell management "tough cookies" until Monday?)

ggeorgovassilis 2 days ago | parent | next [-]

> You expect to not be responsible for what happens to the software you put into production?

First: IT seems to be rather the exception - most professions have no on-call. Eg. even if my car mechanic screws up a service job, they'll have me bring the car back into the garage during their normal working hours, regardless of how and where stranded I am in the middle of the night.

A second comment: I'll be responsible for anything I have created in my own way. The reality of software development is that we implement functional requirements we've been given with which we disagree, we implement non-functional requirements which don't achieve the goal, we are made to use frameworks and tools we're not familiar with, on a short timeline, a low budget and inadequate infrastructure and we're supposed to take responsibility for code our co-workers wrote.

mikeocool 2 days ago | parent | next [-]

> IT seems to be rather the exception

I think there’s actually a fair number of jobs where some level of this is expected.

Doctors are one obvious example — they have on call responsibilities often more onerous than IT, and depending on the situation don’t always receive additional compensation for it.

If you manage people who work different hours from you, in a lot of jobs it’s not uncommon to be called in if shit hits the fan when you’re not working (for example if you’re a hotel manager, to just name one).

I’ve found that any good lawyer I’ve worked with will answer my calls and help me work through things at basically any time of day (their firm might be billing me for the time, but that doesn’t necessarily directly translate to their comp).

Lots of reporters are expected to cover news that breaks on their beat, no matter when it happens.

quicklime 2 days ago | parent | next [-]

> Doctors are one obvious example — they have on call responsibilities often more onerous than IT, and depending on the situation don’t always receive additional compensation for it.

My doctor (primary care physician) doesn’t work outside of business hours. In an emergency the recorded message says to call an ambulance and go to the emergency department at the hospital, which is staffed by a different set of people.

So it seems they do have at least some separation of the oncall aspect?

Lawyers are another story, there’s a lot of things wrong with that profession and we shouldn’t be trying to copy them.

mikeocool 2 days ago | parent [-]

If you go to most hospitals at 2AM and need a specialist of some kind (say a specific type of surgeon), there’s going to be someone in that specialty on call whose going to get paged to wake up, come in, and see you.

Even in family practice, it’s not uncommon to be able to get a call back from the on call doctor at the practice on weekends or off hours — if you’ve got a situation that maybe doesn’t warrant the ER, but you’re not sure if it can wait until Monday.

bsder 2 days ago | parent | next [-]

> If you go to most hospitals at 2AM and need a specialist of some kind (say a specific type of surgeon), there’s going to be someone in that specialty on call whose going to get paged to wake up, come in, and see you.

Only if you're dying.

Come in late Friday and you're going to be sitting in a bed until Monday even if your gall bladder is about to explode.

chipsa 2 days ago | parent | next [-]

I went in to a hospital at just after midnight, and had my gall bladder out by noon. No, the surgeon wasn’t called in early, but the radiologist who diagnosed the gall bladder was.

mikeocool 2 days ago | parent | prev [-]

Sorry, you’re right. Doctors have it way easier than software engineers.

quicklime 2 days ago | parent | next [-]

I definitely don’t think they have it easier. They work hard and the stakes are much higher.

But what you’re talking about is a person whose job it is to be oncall. It’s the equivalent of an SRE, rather than a SWE. They’re not doing it because they believe in “you build it, you run it” or anything like that.

bsder 2 days ago | parent | prev [-]

Sarcasm simply serves to undermine any valid points that you have.

The point was that "on call" is specifically confined as an expectation only to certain types of doctors or under very urgent circumstances.

In addition, doctors have extra special dysfunctions like "too many hours in a shift".

However, many of these are because doctors also have been fighting various efforts to teach more of them which would enable distributing the required extra labor across more people.

doubleg72 2 days ago | parent [-]

Funny, my wife is primary care yet does on call via answering service. You clearly don’t know what you’re talking about.

bsder a day ago | parent [-]

Funny, I chose that gall bladder example precisely because I had it happen to me.

In addition, I had something similar happen where I wound up needing Interventional Radiology to insert a drain--again wound up waiting from Friday to Monday morning.

I'm sure some of those doctors were "on call". However, nobody was calling them in unless I started dying.

happymellon 2 days ago | parent | prev [-]

This is wrong on so many levels.

No they don't.

I know plenty of people who have had to sit around for 8+ hours because the particular type of doctor is not available. The on call only really applies if you're bleeding out.

In my 20+ years of development and support, there has only been once that I was paged due to an actual catastrophic failure. Most are because shitty "SREs" wants monitoring on everything, even if its stuff that I have no control over.

shafyy 2 days ago | parent | prev [-]

> Doctors are one obvious example — they have on call responsibilities often more onerous than IT, and depending on the situation don’t always receive additional compensation for it.

I mean.... On call doctors literally save lives. Most on-call software engineers don't. So.

bloppe 2 days ago | parent [-]

But think of the shareholders!

bloppe 2 days ago | parent | prev [-]

Doctors, firemen and cops are obvious examples, but I've called plumbers at 2am because of a burst pipe flooding the basement. I've called locksmiths well past closing time due to lockout. I've called landlords at all hours for apartment emergencies. Society needs on-callers of all kinds. It's not surprising that some people are vociferously against holding the pager, and I sincerely wish those people success in avoiding it. But someone will always have to step up and they should be appropriately rewarded for it (I've been on-call and was considered lucky to have gotten overtime for it, which I think is strange because it's just a well-aligned incentive structure that any smart company should have)

al_borland 2 days ago | parent | prev | next [-]

My boss recently started an on-call rotation for us. None of the code I have written is customer facing. If everything I wrote breaks at 5:01pm on Friday, external customers will feel 0 impact if I wait to fix it until I show up again on Monday. Worst case, someone internal has to wait to work on something they’ve probably been putting off for months anyway. There are other things they can work on. If it was a constant problem, I’d get it, but a rare instance can be forgiven when no outside impact is felt.

I am responsible for my code, but we need to be realistic about the impact. Not all outages are created equal.

I used to work nights watching over the hardware, operating systems, and applications running in it. We’d do upgrades and break/fix stuff. Some things were worth waking someone up for, but a lot of things weren’t. We’d do what we could do fix it on our own, but for a non-prod environment, it could wait until morning if we couldn’t do it on our own. This idea seems to be lost on people now. I get that 100% uptime of 100% of the systems would be nice, but not at the expense of your employees sanity.

I haven’t actually been called yet with the new rotation, but any week I’m on-call I’m a bit on edge. In the past I had some pretty horrible on-call experiences that pushed me close to quitting, which I won’t get into, so I’m preparing for the worst. I worked my ass off to get into a position where I didn’t need to be on-call and put in my time working nights so other people could sleep. Being back on-call feels like a demotion.

Retric 2 days ago | parent [-]

It is a demotion.

Spooky23 2 days ago | parent | prev | next [-]

“Devops” traded less bureaucracy for more accountability.

Have a generalist ops team that is staffed 24x7, or has paid on call as part of the job. They get run books to respond to whatever goes on.

I’ve set this up twice. The first time, we had a team in the Philippines that would cover overnights.

They could start and rollback deployments and do most stuff via the runbook they were provided. Most callouts (5% of escalations) to product teams were due to bad or missing documentation.

The US based team did similar work, just during the day. Both could escalate quality issues for the product team to fix.

The other model was all US, on-call based. We used junior and low-skill folks, who had rotating on-call. They were paid 20% of hourly rate for standby pay and had a minimum pay threshold when they got called. All of that hit the cost center of the offending product or service, so there was both a financial incentive to not get calls, and a human incentive as the engineers didn’t want to get called for escalations. Again, documentation is key.

rufus_foreman 2 days ago | parent | prev | next [-]

>> You expect to not be responsible for what happens to the software you put into production?

I'm responsible for the software I put into production from 9 AM to 5 PM for about 200 days a year. At 3 AM, I am responsible for taking care of myself by getting a good night's sleep.

If you need 24 hour coverage, taking into account vacations and weekends, you need 5 or 6 people.

hn_go_brrrrr 2 days ago | parent [-]

"you need 5-6 people" is moving the goalposts. The root comment said nothing about minimum team size.

mikedelfino 2 days ago | parent [-]

If the company has enough people in the team, someone just works the night shifts or on scheduled weekends. No one needs to be on-call because there would be someone taking care of it already.

nosefurhairdo 2 days ago | parent [-]

Is the argument here that every software team should have engineers whose normal working hours have 24/7/365 coverage?

SuperNinKenDo 2 days ago | parent | next [-]

If you expect your team to provide 24/7/365 assurance, then it's hard to see how that isn't a perfectly reasonable idea. The only counter to it is that keeping people on call shifts financial cost off the business in the form of psychological cost to its employees. Not very convincing.

SpicyLemonZest 2 days ago | parent [-]

Would you take the night shift? Everyone I've seen promote this idea seems to expect that they'll be the lucky ones who get to keep a normal schedule. If you have a service that needs 24/7 uptime, and you transition from an oncall model to a shift model, at least 2 out of every 3 engineers on the team are going to have to change shifts or quit. If the entire industry shifts, high-availability software would simply join the ranks of fields like nursing or manufacturing where many people have no realistic option to work normal hours.

attendant3446 22 minutes ago | parent | next [-]

I'm the one who wants to do the night shifts. I miss the time when I worked with a 13-hour time difference due to time zones. But now I don't have the option of working at night, everyone has to be at work during 'business hours', and yet the company I now work for has an on-call policy, and they only pay a tiny bonus to people who join the initiative.

andreasmetsala 2 days ago | parent | prev [-]

The sane way to solve that problem is to hire people in different time zones to get coverage. Some still need to do weekends but even those are not the same in every country (e.g. Israel).

2 days ago | parent | prev [-]
[deleted]
Retric 2 days ago | parent | prev | next [-]

I don’t put things in production, the company does. And it’s the companies responsible to deal with problems that show up.

24/7 coverage is expensive and mandating someone is on call 24/7 don’t actually provide it.

joshuamorton 2 days ago | parent [-]

This is "companies do on-call badly".

For the purposes of this exercise presume that our theoretical on-call process is no worse than Google's SRE structure: You are on-call for a 12 hour shift that is more or less aligned with your waking hours, and you are compensated extra for the time you are on-call outside of normal working hours, whether or not you are called in. You are on-call at most one week per month, on average, and usually less.

tharkun__ 2 days ago | parent [-]

    You are on-call for a 12 hour shift that is more or less aligned with your waking hours
I suppose if you're Google they can theoretically make it so it's more aligned with your waking hours? Do they do it? Most companies don't or can't. I.e. it's _less_ aligned.

    you are compensated extra for the time you are on-call outside of normal working hours, whether or not you are called in
How much? Way too many on-call processes in which this is nothing but a few dollars to be able to say "see, we do pay for this, even when you're not called!". As in, way not enough for the number being on-call does to how you go about your day. Always on edge, always awaiting that call / alert that requires you to drop whatever you are currently doing. Preventing you from actually doing/starting certain things.

You haven't even mentioned the expected reaction and resolution time and that alone can make a huge difference.

    You are on-call at most one week per month, on average, and usually less.
Great, only one week out of four /s That's crazy if you ask me. Going back to preventing you from going about your day in a normal way. There's no "doing on-call well" in how you describe it.
ksmith14 2 days ago | parent [-]

Google staffs SRE teams as either 8 in one location/TZ or two geographically distributed teams of 6 -- often some pairwise combination of U.S., Europe, and Australia to accommodate reasonable on-call shifts.

The on-call compensation varies depending on what tier of service they're offering. Tier 1 (5 minute response time) is 2/3 of your effectively hourly pay for on-call time outside of local business hours and 1/3 for tier 2 (30 min response time). Or time off in lieu.

joshuamorton a day ago | parent [-]

Note that this is at a minimum, I know some teams with 10-12 folks per location. That just also has downsides since you can end up oncall once a quarter which most people in the role don't like since the extra vacation is nice.

dheera 2 days ago | parent | prev | next [-]

I am capable of writing very good software, testing it, and putting it into production, but I am not capable of being responsible for what happens at 3am on a Sunday. Whether that deal is okay is up to you. I'm okay if you don't want to pick me. There are other jobs I can get. I write good software though.

If the customer is awake at 3am on a Sunday, it's the customer's problem that they were awake at 3am on a Sunday. If it's a social network, I frankly couldn't care; the customer should go to bed. If it's going to be deployed in the emergency room, fine, we should care, but YOU, management, should find people who are actually willing to take that shift (for extra money, or are based in other time zones).

deprecative 2 days ago | parent | prev | next [-]

Why do I give a single shit about the software having an issue at 2am? I don't own the company. I don't care. If they care they can hire night shift triage.

uuddlrlrbaba 2 days ago | parent | prev | next [-]

I expect to be very responsible during my working hours.

RandomThoughts3 2 days ago | parent | prev | next [-]

I have never worked for a company where people building the software and people supporting it when it is critical were the same. The idea is weird to me.

Plus any large enough company should have team in spread out timezones eliminating the need for on call if it’s correctly managed.

chairmansteve 2 days ago | parent | prev [-]

Only a disfunctional company would rely on the programmer who wrote the code.