nik282000 5 hours ago

I work at a plant with a site wide SCADA/HMI (Siemens WinCC) system, every alarm is displayed on every HMI regardless of its proximity to the machine or even its ability to address the issue. And any given minute a hundred or more alarms can be generated, the majority being nuisance messages like "air pressure almost low" or my favorite " " (no message set) but scattered among those is the occasional "no cooling water - explosion risk".

This plant is operated and deigned to the spec of an international corp with more than 20 factories, it's not a mom-and-pop operation. No one seems to think the excessive, useless, alarms are an issue and that any damage caused by missed warnings is the fault of the operator. When approaching management and engineering about this the responses range from "it's not in the budget" to " you're maintenance, fix all the problems and the alarms will go away".

The only way for this kind of issue to be resolved is with regulation and safety standards. An operator can't safely operate equipment when alarms are not filtered or sorted in some way. It's like forcing your IT guy to watch web server access logs live to spot vulnerabilities being exploited.

▲

terminalshort 4 hours ago | parent | next [-]

This is a fundamental organizational and societal problem. An engineer would look at the situation and think "what is the best way to get the failure rate below a tolerable limit?" But a lawyer looks at the situation and thinks "how do I minimize liability and bad PR?" and a bureaucrat thinks "how can I be sure the blame doesn't land on me when something goes wrong?" And the answer to both of those questions is to throw an alarm on absolutely everything. So if there is a problem they can always say "our system detected the anomaly in advance and threw an alarm." Overall the system will be less safe and more expensive, but the lawyer's and bureaucrat's problems are solved. Our society is run by lawyers and bureaucrats, so their approach will win out over the engineer's. (And China's society is run by engineers, so it will win out over ours.)

▲

gopher_space 2 hours ago | parent | next [-]

Up to a certain point society is run by actuaries. Finding someone at your insurance company who both understands the problem with excess errors and appreciates how easily enumerable they are would be an interesting "whistleblowing" target.

▲

terminalshort an hour ago | parent [-]

But the actuaries too are constrained by the same societal constructs. Let's say you work for a large car company and invent a self driving system that is 10x safer than the average human driver, and the cost is minimal. This system would likely save 36,000 lives annually in the US. The actuary will calculate that with human drivers, the accident liability for your car company is $0, but then with the self driving system, the liability is potentially in the billions, and it doesn't make sense to include this system in your vehicles. You can argue that since the error rate is 10x safer than the average human driver, that clearly this is an acceptable safety level. But your argument will fall on deaf ears because there is no such legal concept of an acceptable level of death. You could also say, well then the company can buy insurance. The cost will be minimal since there will only be 10% the rate of accidents as a normal car. The societal calculation for liability in a death is based mainly on:

1. How much money does the responsible party have

2. How much media attention and outrage does it generate

And since now instead of a single human driver, the liability will rest on a massive corporation, and the media attention will be massive vs nonexistent for a normal human crash, the liability will be massive. The accident rate may be 10x less, but the cost per death may be 1000x of a normal human driver crash.

▲

lazyasciiart 35 minutes ago | parent [-]

Ah, so that’s why nobody has built self driving cars.

	▲	terminalshort 31 minutes ago \| parent [-]
		Go to dictionary.com and look up "hypothetical"

▲

bluGill 32 minutes ago | parent | prev | next [-]

Courts do accept alarm fatigue and if there is an injury/death and there were many alarms you can bet that whatever lawyers' side benefits will bring in experts to explain the issue.

if there are a lot of issues the lawyers will also ask why they were not corrected first: using that to establish a pattern of bad maintenance.

▲

renewiltord 2 hours ago | parent | prev | next [-]

Is it though? Engineer can optimize on different manifold. Company can succeed/fail for different reasons. Getting destroyed for legal suit because didn’t place alarm is small peace when you did better engineering.

After all, read any post-mortem comments on HN. Many of those people can be hired as expert if you like. They will say “I would have put an alert on it and had testing”. You will lose the case.

“Oh but we are trying to keep error rate low”. Yes, but now your company is dead when high error rate company is alive.

In revealed preferences, most engineers prefer vendors who have CYA. This is obvious from online comments. This is not because they are engineer. It’s because most people want to believe that event is freak accident.

Building system for error budget is not actually easy. Even for engineer who say they want it. Because when error happens, they immediately say it should not have happened. Counterfactual other errors blocked, and business existing are not considered. Every engineer is genius in hindsight. Every person is genius in hindsight.

Why these genius never make failure proof company? They do not. Who would not pay same price for 100% reliable tech?

	▲	terminalshort 2 hours ago \| parent [-]
		> Getting destroyed for legal suit because didn’t place alarm is small peace when you did better engineering. Indeed it is. That's why I said it's a larger societal problem in how we manage risk and react to failures. > Why these genius never make failure proof company? Because this is mostly a matter of unknown unknowns and predicting the future, so even a founder who makes zero mistakes is more likely than not to fail.

▲

pstuart 3 hours ago | parent | prev | next [-]

> This is a fundamental organizational and societal problem

Absolutely, and we'd collectively be better served if we had tools to deal with it.

I think of it as "incentive ecology" -- as noted, everybody has their own incentives which shapes their behavior, which causes downstream issues that begin the process anew.

Obviously there's no simple one-shot solution to this, but what if we had ways to simplify and model this "web of responsibility" (some sort of game theory exposed as an easily consumed presentation, with computed outcomes that show the cost/ROI/risk/reward) that could be shared by all stakeholders?

Obscurity and deniability are the weapons wielded in most of these scenarios, so what if we could render them obsolete?

Sure, those in power would not want to yield their advantages, but the overall outcomes should reward everybody by minimizing risks and maximizing rewards for the enterprise and everybody wins.

Yes, I'm looking at it as a an engineer and a dreamer, but I think if such a tool existed that was open source and easily accessible that this work could be done by rogue participants that could put it out there so it's undeniable.

▲

mmooss 3 hours ago | parent | prev [-]

The first step in problem solving is to look in the mirror. It's not surprising that in an engineering community, the instinct is to blame outsiders - lawyers, bureaucrats, managers, finance, etc. - because those priorities are more likely to conflict with engineering, because it is harder to understand such different perspectives, and because it is easier to believe caricatures of people we don't know personally.

Those people have valuable input on issues the engineer may not understand and have little experience with. And engineers are just as likely to take the easy way out, like the caricature in the parent comment:

For example, for the manufacturer's engineering team it's much easier, faster and cheaper to slap an alarm on everything than to learn attention management and to think through and create an attention management system that is effective and reliable (and it had better be reliable - imagine if it omits the wrong alarms!). I think anyone with experience can imagine the decision to not delay the project and increase costs for that involved subproject - one that involves every component team, which is a priority for almost none of them, and which many engineers, such as the mechanical engineer working on the robotic arm, won't even understand the need for.

> And China's society is run by engineers, so it will win out over ours.

History has not been kind to engineers who do non-engineering, such as US President Herbert Hoover who built dams and but also had significant responsibility for the Great Depression. It's not that engineers can't acquire other skills and do well in those fields, but that other skills are needed - they aren't engineering. Those who accept as truth their natural egocentric bias and their professional community's bias toward engineering are unlikely to learn those skills.

▲

terminalshort 2 hours ago | parent [-]

Your own answer circles right back to the problem I'm talking about:

> and it had better be reliable - imagine if it omits the wrong alarms!

This is entirely based on the premise that an error due to omitting the wrong alarm is worse than an error based on including too many alarms. That right there is lawyerthink. Also, these priorities don't conflict as you say, they just take different sides of a tradeoff. Managers and finance people are balancing a tradeoff of delivery speed, cost, and quality to maximize business value. And the bureaucrats and lawyers are choosing more expensive and less reliable systems because they better manage the emotions of panicky anxious people looking for a scapegoat in a crisis. This has a cost.

Besides having bad luck in timing to be president when the stock market crashed, and therefore scapegoated for it, Herbert Hoover was well regarded in everything he did before and after his term, including many non engineering related things. So I think he is a particularly poor example of this. Public blame for things like that tends to be exactly as rational as thinking a hangover has nothing to do with last night.

▲

mmooss 2 hours ago | parent [-]

I don't see how it's 'lawyerthink' at all; engineers also want to prevent bad outcomes, especially from their own work, as does everyone else.

Also, I think this ignores the rest of my point to nitpick one part of a complex system, which was part of a larger point.

	▲	terminalshort 31 minutes ago \| parent [-]
		To your larger point I don't think engineering logic is necessarily superior to financial logic, or manager logic. The problem is that because of the way we have built our society, engineering (and all other fields) must comply with and be subservient to bureaucrat and lawyer logic. The legal defense against an engineering failure is not to prove that your overall failure rate is low and within acceptable limits, but rather to come up with as long a list as possible of safety measures and policies that you followed without any regard to whether they actually have any effect at all.

▲

anonymousiam 4 hours ago | parent | prev | next [-]

The criticality of the alerts should be classified, and presented with the alert. Users should have the ability to filter non-critical messages on certain platforms.

Unfortunately, some systems either don't track criticality, or some of the alerts are tagged with the wrong level.

(One example of the latter is the Ruckus WAP, which has a warning message tagged at the highest level of criticality, so about two or three times a month, I see the critical alert: "wmi_unified_mgmt_rx_event_handler-1864 : MGMT frame, ia_action 0x0 ia_catageory 0x3 status 0x0", which should be just an informational level alert, with nothing to be done about it. I've reported this bug to Ruckus a few times over the past five years, but they don't seem to care.)

	▲	varjag 2 hours ago \| parent [-]
		In reality users will keep everything on default.

▲

miki123211 3 hours ago | parent | prev | next [-]

Useless warnings are a great CYA tactic.

THe more of them you have, the more likely it is that there's a warning if something happens. Whether the warning is ever noticed is secondary, what matters is the fact that there was a warning and the operator didn't react to it appropriately, which makes the situation the fault of the operator.

	▲	cucumber3732842 2 hours ago \| parent [-]
		This is partly a problem with our workplace laws. In the eyes of the regulators and courts individual low level employees can not take responsibility. This is the logic by which they fine the company when someone does something you shouldn't need to be told not to do on a step ladder or whatever. What this means is that low level employees become liability sinks. Show them all the warnings and make them figure it out. Give them all sorts of conflicting rules and let them sort out which ones to follow. Etc, etc.

▲

varjag 4 hours ago | parent | prev | next [-]

I think it's regulated in places, as it was certainly an HMI concern ever since Three Mile Island. Our customer is really grilling vendors for generating excessive alarms. Generally for a system to pass commissioning it has to be all green, and if it starts event bombing after you're going to be chewed.

▲

nik282000 2 hours ago | parent [-]

I have never seen a piece of new equipment that ever gets to an All Green state, before, during or after commissioning. I frequently recommend that we do not allow the commissioning team to leave until they can get it to that state but it has yet to happen.

	▲	varjag 2 hours ago \| parent [-]
		I guess it's the matter of setting the expectations, both on SCADA and equipment side. Spent this weekend getting rid of that last sporadic alert…

▲

CamperBob2 4 hours ago | parent | prev | next [-]

The only way for this kind of issue to be resolved is with regulation and safety standards.

Are you sure that's not what caused the problem in the first place? Unqualified and/or captured regulators who come up with safety standards that are out of touch with how the system needs to work in the real world?

	▲	AlotOfReading 4 hours ago \| parent [-]
		Do regulators come up with SCADA safety standards? I would have assumed it was IEC. Regulators coming up with engineering standards is pretty rare in general. Usually they incorporate existing professional standards from organizations like SAE, IEEE, IEC, or ISO.

▲

lostdog 4 hours ago | parent | prev [-]

I wonder if you could calculate a "probability of response to major alert" and make it the inverse of the total or irrelevant alerts. Then you get to ask "our probability of major alert saliency is onlt 6%. Why have the providers set it at this level, and what can we do to raise it?"