Safety and Embedded

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001A1t
This helps explain what I believe to be true that because of the nature of digital computing and embedded systems and its tendency for failure, critical functions are never left strictly to their use.
Unfortunatly a lot of people who feel they can discuss the possibilities of Y2K failures in these sensitive devices lack the basic knowledge that would help them know that the systems would not be dependent on these "digital" systems alone.
The reason is simple. Where safety is an issue, digital computers, embedded systems and even individual non embedded chips can and do fail for any number of reasons. None of which has anything to do with Y2K. Where a stray bit of static electricity can "blow up" an individual chip (internally), nothing has been left totally to their responsibility.
In otherwords, if it could fail due to Y2K then it could fail due to other reasons and would never have been put in a position where failure is unacceptable.
Period. Especially in industries that were in place before the "technology boom".
Same with the so-called unreachable stuff that is discussed. If it cannot be reached for Y2K faults, it would not be reachable for "normal" faults-so it would not be used in the first place.
I am in no way discussing "information technology". It is a completely different animal and the situations found in the "information" world do not apply to computing where safety and lives are at sake.
Cherri
***********************
It occurs to me, as we get further and further into this issue, that a minor contribution from my un-technical self may assist in illuminating a little "big-picture" perspective to this debate, particularly as it applies to the layman.
On late-night TV the other night, (which over here consists of educational programming, mostly output from the Open University), a documentary was running for a post-graduate technology course. The programme was dealing with certain issues involved in the integration of technology into real life scenarios. It was particularly interesting when viewed by someone with a degree of Y2K interest, although the topic was not touched on directly. A few of the statements by some of the egg-head experts did give me pause for Y2K related thought though.
Principally, the part which caught my eye was a technical discussion of the safety, accuracy and fallability of computer programmes when applied to life-critical applications in the real world. Principally, the discussion dealt with the issue of software, but the general message seems to be appliccable to hardware concepts also. An example was given of a piece of medical equipment, namely a 3 part multi-purpose scanning device, incorporating a low-intensity electron- beam scanner, coupled with an X-Ray device, and a light-sensitive scanner. The software was written so that a patient could be located on the scanning table and positioned accurately using the light- scanner (which uses no accelerator beam and is therefore 100% safe)>
The software was then designed to rotate the patient on a turntable, to place them accurately in front of the low-intensity electron scanner, where they received a scan. Finally, the turntable was moved again and the patient would be X-Rayed. Take into account that the X- Ray device is the most dangerous, as it uses high-intensity accelerated beams, and if used wrongly can harm the patient.
Despite extensive logical testing in the lab, the software proved glitchy. In some cases it wrongly positioned the patient in front of the wrong device, and then sent the intensity setting to that device based on where it *thought* the patient was, and not where they *really* were. This led to some patients receiving an electron scan at X-Ray intensity. This is a BAD idea. The system was decommissioned and the software re-written. There were other examples.
Commenting on this, an eminent professor from the OU stated that . .
{quoting loosely) "Computer systems are rarely, if ever allowed to operate without a practical level of human participation and intervention. Where such systems ARE permitted, their function tends to be deemed non-critical to life-supporting processes. In most cases of computer related failure leading to injury or loss of life, the fault can be traced to a degree of human error in the supervision or intervention in the process" {this kind-of echoes what Cherri was saying about people not bothering to check mechanical valves etc)
Basically, the concept being that people should not "trust" computer systems to perform perfectly every time. Thats not the way they are designed, and few designers or engineers would attempt to claim a 0% failure rate for ANY system. The prof actually went on to state that . .
"In any computer software program or application developed nowadays using modern, complex programming languages, it is generally accepted as an engineering principle that some kind of informational error will occur within the data at least once in every 200 instructions."
The point being that this kind of error rate is absolutely acceptable, because systems are designed to incorporate a degree of tolerance of inevitable minor errors. Furthermore, it underlines the fact that all modern systems are designed with just this kind of tolerance in mind. This explains the need for human participation in computerised processes. Even the engineers and designers know that nothing is ever 100% perfect.
This all sounds like support for the position that Y2K errors have the power to destroy the systems we use and benefit from today, but in fact it's just the reverse.
Bearing in mind that most, if not all computerised processes operate with a "1 in 200 instruction" error rate tolerance (or worse), how can we argue logically that one more source of data error could cause irrevocable failure ? If computerised systems are designed to be checked, monitored, and maintained by human beings NOW, that suggests to me that most if not all Y2K-based errors should fall within the remit of this safety sequence, and it is only in circumstances where the human factor has been irresponsibly removed from the process that we face a real risk to the continuity of service. This kind of in- built lack of the security of human intervention it seems would, in the industry's own rule book, represent the unneccesary imposition of an irresponsible risk.
Bizarre isnt it, that information pertaining to the inevitable and inherent imperfections, fallabilities and unpredictabilities of complex computer systems can actually make you feel SAFER.
I hope all this was in some way relevant.
Kindest Regards W

-- Cherri (sams@brigadoon.com), September 04, 1999

Answers

Cherri:
I think that we've had this discussion before. I can only deal with my personal experience. We have had no date failures. We have had a load of problems with the update patches. Just got another one this week. I am now deciding whether it is better to go with the rollover problems rather than try to debug the patches.
Best

-- Z1X4Y7 (Z1X4Y7@aol.com), September 04, 1999.

Cherri:
FOF is looking like the way to go.
Best,
Z

-- Z1X4Y7 (Z1X4Y7@aol.com), September 04, 1999.

Cherri, dear, would you please post some links? I asked you this on the other thread.
http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001Lny

I came here to your new thread hoping to find a position paper or something along those lines that would help us validate your position that Paula Gordon is wrong. You are expressing your opinions here, and while that is a perfectly valid approach, Paula Gordon today went on national television, confirmed her identity and her position, and provided source documentation that she believes to be evidentiary. Please--links to some source documents? Thanks. :)

-- FM (vidprof@aol.com), September 04, 1999.

I am reminded of an incident where I checked into a hotel in the evening, went to the assigned room and found myself next to a hospitality suite where there was a disturbing amount of noise. Turned out that they had booked and paid for the room I was assigned, so that it would be empty. Naturally, we headed for the registration desk, and discovered that when this other party had booked the "extra" room, this was written on the room status card but not entered into the computer. When I checked in, the clerk saw the discrepancy between the computer (which marked the room as vacant) and the card, and believed the computer.
A system may be designed for human intervention in the event of irregularity, but people seem to be so conditioned to believe the computer that I wonder how often the desired intervention fails to occur.

-- David L (bumpkin@dnet.net), September 04, 1999.

The first thing a man [or a woman] will do for his [or her] ideals is lie.
-Joseph A. Schempeter

-- Stan Faryna (info@giglobal.com), September 04, 1999.

"Bizarre" pretty much describes it, Cherri. I have to ask: are you a blonde?

-- King of Spain (madrid@aol.com), September 04, 1999.

"In otherwords, if it could fail due to Y2K then it could fail due to other reasons and would never have been put in a position where failure is unacceptable."
Oh, really?
Does that include, say, geosynchronous satellites, mars rovers, and *earthbound* blackboxen that people have simply *forgotten* about -- several hire-generations after the installers have "moved on"?

-- Ron Schwarz (rs@clubvb.com.delete.this), September 04, 1999.

Off the top of my head, here are some categories of embedded system failures to worry about:

Process control systems that fail and cause some kind of physical disaster. This is the case everyone talks about, and I'm willing to believe that there are few of these systems. Not none, however.
SCADA and other monitoring systems that fail and cause operators to shut down plants, since otherwise they are operating blind, or are in violation of safety codes. Very likely.
Maintenance date recorders which shut down a system if maintenance has not been performed. These have been mentioned in things like fire engine ladders, airport equipment, medical equipment. I guess the idea was that you can avoid keeping track of maintenance dates, or of having liability problems when a device has not been maintained. Just let the device do it for itself. The problem is that if they have Y2K errors, the device becomes unusable.

The first category impacts safety directly, but the others can indirectly. You could have operator mistakes due to bad info from a SCADA system. You can have a disaster because a device shuts down due to maintenance date problems. All three impact the economy. Apparently, it can take months to restart some of these refineries after a complete shutdown.

-- You Know... (notme@nothere.com), September 04, 1999.

Useful sites on chemsafety issues:
This is an email from Leon Kappleman:
==========
"First, in case you have doubts that y2k poses chemical safety risks of potentially grave consequence, there are several actual cases of chemical-processing-specific embedded or computer system failures documented in the casebook developed in the UK by their Institute of Electrical Engineers (IEE) and Action 2000 (basically the equivalent of our President's Council). These include case numbers:
#7 - "near catastrophic", #58 - "health and safety implications", #82 - "dangerous chemical spill"; #84 - "inability to treat acid, resulting in shutdown".
[My note: also see #70 -- radiation dose log system and compare to noncompliant systems at US nuke plants. See www.nrc.gov/NRC/Y2K/plantstatus.html (includes "personnel radiation exposure tracking system" & other systems.)]
Details on all these and much more at http://business.bug2000.co.uk/news/index.shtml. To see the actual cases, click on "Research Surveys" then "Embedded systems fault report" and then either the "Non Computer Based Systems" or the "Systems which contain a 'computer'" links.
With the help of the President's Council, the US Chemical Safety Board (CSB), Action 2000, the IEE, and others, further research is now underway to secure and publicize additional examples in order to help late starters focus in on their greatest chemical safety risk areas.
==========
Item 2: The CSB, EPA, and several chemical industry trade associations developed an excellent guide for "Addressing Year 2000 Issues in Small and Medium-Sized Facilities that Handle Chemicals." If provides information on how to prevent Y2K problems at chemical processing plants and Appendix A contains information on what plant systems and equipment can fail. You can get a copy at http://www.csb.gov/1999/news/smefinal.pdf.
Those at the August 30th round table overwhelmingly agreed that this document needs to be in hands of every chemical processor and distributor, hazardous materials handler, emergency manager, environmental and work-safety regulators, and related labor organizations. Do your local government's officials and local chemical companies have a copy yet?
==========
Item #3: It is good to see environmental groups stepping forward on related Y2K concerns. Three cheers for the Environmental Defense Fund!! At http://www.edf.org/programs/PPA/y2kchecklist.html you will find:
Checklist #1: To Help Identify Hazardous Chemical Processing Plants That Might Have Y2K Problems Which Can Increase the Risk of a Release
Checklist #2: Checklist to Help Evaluate the Severity of a Potential Hazardous Chemical Release Caused by Y2K Problems"
========= The EDF checklists have a lot of great links & info.
To locate chem plants near you, go to http://www.epa.gov/enviro/zipcode_js.html http://www.epa.gov/enviro/index_java.html
Also see summaries of facilities' "Risk Management Plans" at www.rtk.net
Find your Local Emergency Planning Committee (LEPC) at www.rtk.net/lepc/webpage/lepc.html.
Hope these links work.

-- d (longtimelurker@firsttimeposter.com), September 05, 1999.

* * * 19990905 Sunday
Re: Paula Gordon and her Y2K "position" ...
I was (pleasantly) astounded to hear Paula Gordon announce on C-SPAN that on the scale of 0-10, she's at "~9.5"--versus Jim Lord, at "~8.5"!
Ms. Gordon was a relative "late comer"--mid-1998, I believe--into the Y2K mix for trying to determine rational strategies and solutions to the inevitable problems. ( She'll correct me if my memory has failed. ;-) )
I was impressed with her quick pick-up on Y2K ramifications; her excellent background has served her well on that count!
It's too bad she's being relegated to the "background" on Y2K. A lot of people could benefit from her sage advice.
Regards, Bob Mangus
P.S.: I've been an "11" on the Y2K event scale since 1996!
* * *

-- Robert Mangus (rmangus@hotmail.com), September 05, 1999.

Cherri:
I have read many of your posts and it appears that you have much more experience in chemical production, refineries, and heavy industry than I do [I left the fields years ago]. My concerns are not with a large failure that will cause explosions. My concern is with peripherials that will shut down the operation. My experience is that the operations become very unstable during shutdown and startup. What is your opinion on that matter?...
Best

-- Z1X4Y7 (Z1X4Y7@aol.com), September 05, 1999.

Moderation questions? read the FAQ