President's Commission's view on Embedded Systems

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

I got this from John Koskinen today in response to an email. I will leave its analysis to those brighter than I.

(To the extent it says in part, that problems exist in some embedded system even though dates aren't expressly used, it undermines the suggestion that untested embedded systems in satellites are ok "because they don't use dates".)

Memorandum from John A. Koskinen

Subject: Summary of Discussion with Embedded Systems Technicians

Attached is a summary of a recent discussion that I organized among technical experts that have been working on testing and fixing embedded systems across the manufacturing, electric power, oil, gas, telecommunications, shipping, bio-medical device, and defense industries. It outlines the discussion at the meeting and makes a number of generally agreed to statements concerning the types of embedded systems that have Y2K risk, when embedded systems will fail, difficulties in testing for Y2K problems in embedded systems, and risks of not fixing the problem in advance. Those at the meeting reported that the organizations they are working with have addressed all these facets of testing and fixing the problem, but that they remain vigilant about changing manufacturers statements of compliance.

I believe that the discussion and agreed to statements are important for all those working on the embedded system problem to hear. Therefore, I am forwarding this summary to you and the other working groups. Feel free to disseminate it further.

Attachment

PRESIDENTS COUNCIL ON YEAR 2000 CONVERSIONS

MEETING ON Y2K EMBEDDED SYSTEMS

Tuesday, November 9, 1999

American Society of Association Executives Building 1575 I Street, Washington, DC

Participants in the meeting included technicians that had done work in the bio-medical, defense, electric power, gas, manufacturing, oil, shipping, and telecommunications industries. To help with the discussion, an agenda was provided with discussion statements concerning the types of embedded systems potentially atY2K risk, difficulties in testing for such embedded systems and fixes for problems found. Those statements were revised during the meeting and the agreed upon final statements are presented below, along with a brief summary of the discussion that led to the final statement.

Types of embedded systems found to have a Y2K risk:

Final Statement: Embedded systems are at risk of problems during Y2K rollover if they conduct a calculation that depends on a representation of the date. The date could be in relative or absolute form. The participants presented a number of specific cases where they had found Y2K problems in embedded systems. Several of these involve calculations of time increments inside an embedded system without the date being displayed or apparently used. In these instances an embedded system calculates the time interval by subtracting seconds from seconds, minutes from minutes, hours from hours, and calendar dates from calendar dates.

All except one of the examples were large, complex processes where embedded systems inter-relate with each other and, in some cases, with external computer systems. The one example was of a stand-alone embedded system that was unconnected to others that did not apparently involve dates. That example lead to a discussion about the need for a continuous power source being available for any such devices to function, and it was pointed out that in some sectors there are many such devices, but that few problems had been found in them.

There was considerable discussion of potential failure rates of embedded systems. Estimates ranged from a 1 - 2% potential failure rate of processes containing embedded systems in some sectors to 4 - 6% in others, but no conclusion was reached. An important distinction was made between failure of an embedded system, which may not cause a process or device to fail in operation, and failure of a process or device due to an embedded system. The former represents the estimates above, and the latter is much less prevalent.

The remainder of the discussion during the meeting focussed on large, complex processes that contain embedded systems. The question of having a real time clock or access to a clock was discussed and examples were presented where the time was set by a process controller and transmitted to other embedded processors involved in the process. Other examples of problems were discussed where time was used apparently to calculate relative increments (e.g. day of the week) as opposed to absolute dates.

When embedded systems will fail:

Final Statement: Where possible, all mission critical systems should be tested end-to-end, whether or not the systems appear to have date sensitive functions. Failure to do so means a small level of risk has been assumed that, at minimum, should be addressed with a contingency plan. The discussion that lead to this statement began with a presumption that embedded systems involved in calculating time increments, as well as those that apparently computed dates, are at Y2K risk. During the discussion the statement to test mission critical systems whether they have a date function or not was almost agreed to, until it was pointed out one can only test those types of devices with end-to-end testing.

This statement was focussed on mission critical systems because it is difficult and expensive to conduct such testing. The term mission critical systems was used to include safety critical systems as well as other systems where the cost of failure would be high. Therefore, while the statement says the risk of failure is low, the impact of any such failure would be high. The statement also recommends a contingency plan to help mitigate risk -- such a plan should not be viewed as an alternative to testing because detection of a failure may be difficult and a failure could cause substantial collateral damage before it is detected.

Final Statement: The majority of failures of embedded systems are expected to occur on or about December 31st through January 1st. However, simply turning a system off during that time frame is generally not a solution. The discussion explored the question of whether the time of primary risk of failure was during the rollover time. It was generally agreed that the vast majority of failures in embedded systems are likely to occur over that period. On the specific question of whether Greenwich Mean Time would be a time of high failure, it was stated that most failures would likely occur at 12:00 local time, although some would also occur on Greenwich time. During the discussion, there was a concern raised that the statement may lead to the ineffective solution of turning off systems during the rollover period. Therefore, the specific admonition not to rely on that work-around was included in the statement.

Final Statement: One can have two apparently identical systems of which one will not have a Y2K problem but the other will have operating difficulties. However, the chances of this are small. The likelihood of failure of one of two identical systems, as described in this statement was considered to be very small, but, again, it was agreed that all mission critical systems needed to be tested.

Difficulties in testing for embedded systems at risk:

Final Statement: Organizations that have relied on a device manufacturers declaration of Y2K compliance are at risk if they do not keep up with the most recent manufacturers statements.

The discussion concerned cases where testing had brought into question manufacturers statements of the readiness of their products. A number of instances were cited where problems had been found both externally by users that had tested and by manufacturers themselves. While the changes needed to remedy such problems have normally been made quickly available, the concern was expressed that many organizations were not aware of or taking advantage of those fixes.

Final Statement: Some interconnection problems among embedded systems can only be revealed by end-to-end testing.

The discussion concerned how to test for problems in embedded systems. There was considerable discussion of difficulties of testing in operational environments and the risks and complexities of end-to-end testing. However, a number of examples were cited to show that one could not find all potential problems in complex, interconnected embedded processes without end-to-end testing.

Fixes:

Final Statement: Anyone taking a fix-on-failure approach for Y2K, particularly with embedded systems, runs a significant risk of collateral damage and a difficult recovery. There was little discussion leading to this statement. Remedying the kinds of Y2K problems participants had found in embedded systems was difficult and time-consuming.

Final Statement: After a full and careful technical assessment, there may be administrative or operational workarounds to many Y2K problems involving embedded systems.

While simply turning a system off during the rollover is not normally an effective administrative work-around, in some instance it could be. Similarly, setting the year back so that Y2K does not occur may be a work-around in some instances. However, before using these or any other ways to work-around the Y2K problem, all agreed that a thorough assessment of the full implications of the work-around was necessary.

Final Statement: Even those that have conducted thorough testing need to develop contingency plans for mission critical processes and exercise them.

There was little discussion of this statement, in light of the earlier statements that indicate the risk of Y2K problems.

-- Anonymous, December 02, 1999

Answers

F. Snyder, An excellent post, thanks. It is refreshing to see Koskinen using the expertise cited. FYI, I find this information very credible overall, I only wished that it offered more explanatory details on several of the subjects (i.e., the basis for some of the statements) for those who aren't quite as familiar with embedded systems. Interestingly enough, I have already seen some of the statements from the above work misinterpreted and exagerated. I would like to go over this item by item when I get a chance, and add my two cents, for what they are worth...;)

Regards,

-- Anonymous, December 02, 1999


Formatting fixed...

-- Anonymous, December 02, 1999

Snyder - thanks for forwarding this communique from the K man.

As with FF, I want to digest this a bit. At first blush, I see very little that is inconsistent with the recent NIST embedded systems alert (which I admit to not having thoroughly read yet...), Dale Way of IEEE's recent missives, or much of Dr. Mark Frautschi's work. All of these works are consistent with something I've said for a long time:

The embedded systems Y2k issue ain't about chips.

It's about large, complex, multi-level control systems, from ISA level 1 to 4 (which I'm not going to explain again, you can search the archives...;-). Too many people have had a microwave oven or VCR mentality about embedded systems.

Anyway, I'll analyze Snyder's post a bit more, and try to incorporate some comments about this, the NIST alert, and Dale Way's stuff into either a post here or a column on energyland.net in the next few days.

Time grows short.

-- Anonymous, December 02, 1999


To the cleanup crew regarding formatting changes - Actually, the formatting I had was in the original. I believe there were separations between the general statement, and explanatory comments which followed. (I tried emailing you the original again, but it bounced back.) Snyder Gokey

-- Anonymous, December 02, 1999

"Therefore, I am forwarding this summary to you and the other working groups. Feel free to disseminate it further...."

Can't help but wonder who this memo was originally distributed to. The whole Senior Advisors Group?

What did they know and when did they know it...

-- Anonymous, December 03, 1999



Moderation questions? read the FAQ