companion document for "The "Real" Y2K Problem - Is it Embedded Systems or is it Software?"greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread |
Fair use : Education and Research onlyI am posting this as a companion document for the thread below;
The "Real" Y2K Problem - Is it Embedded Systems or is it Software?
Strategic Analysis Report
16 August 1999
Year 2000 World Status, 2Q99: The Final Countdown
L. Marcoccio, J. Duggan, M. Hotle, A. Kyte, D. Vecchio, A. Di Maio5.0 Embedded-System Failure Rates
From the link
Year 2000 World Status, 2Q99: The Final Countdown
GartnerGroup
We present our final assessment of progress toward year 2000
readiness, covering enterprises from many different industry sectors
and geographies. During the two years that we have been performing
this survey activity, we have seen many respondents move from naivete
and denial to realization and self-promotion. Fortunately, we have been
able to make full use of GartnerGroup's extensive research network
vendors, clients and associates to enable us to make a broad and
comprehensive assessment of the true state of progress.
5.0 Embedded-System Failure Rates
Return to TOC There are distinct and separate year 2000 problems for information systems and embedded systems. Information systems' problems affected the vast majority of business applications and required that significant percentages of code be remediated or replaced. It is easy to understand why information systems process dates, and therefore why they might have problems. Furthermore, IS problems have already started: they will happen over an extended period of time.
The contrast with embedded systems is marked. Year 2000 problems only affect a small percentage of embedded systems. Many people find it difficult to understand why embedded systems process dates, and therefore why they can be vulnerable. Also, embedded systems' problems have not started occurring in any statistically significant numbers: since embedded systems are largely "real-time" systems, those that do have problems are most likely to experience them at or around midnight on 31 December 1999.
In nearly every enterprise, year 2000 awareness and action start with information systems. This tends to lead to a position where the business believes that the IS department is responsible for year 2000 in general. However, IS staff have very little previous experience of real-time control systems, and, where IS staff have been given responsibility for running embedded-system activities, the projects almost inevitably stall. To emphasize that IS staff are unlikely to be able to make a substantial contribution to the embedded-system project, GartnerGroup defines an embedded system as "any electronic system not acquired with the IS budget."
5.1 Real-Time Clocks
Return to TOC Although there are many different types of electronic devices that can be considered embedded systems, one common feature links all the embedded systems that suffer from year 2000 problems: they must have access to a persistent source of date information. This is almost universally supplied by a real-time clock (RTC). An RTC is a device that uses a battery to oscillate a crystal and then counts the oscillations to maintain time and date. An embedded system may have its own RTC, or it may have access to date information by virtue of being network connected to another device that itself has an RTC. Any device that does not have an RTC and is not connected to another device is incapable of suffering a year 2000 failure.
Most RTCs provide a two-digit year. This is in itself not a problem. The function of the RTC is to supply date and time information, but in order for useful work to be done the information must be interpreted by a program. A program may read a two-digit year and quite correctly interpret "00" as "2000." Equally, a program reading a four-digit year may choose only to access the last two digits and misinterpret "00" as "1900." The key point to note is that there is nothing inherently wrong with RTCs that only provide two-digit year data.
5.2 Microcontrollers
Return to TOC The most numerous ES devices are microcontrollers. The particular characteristic of these devices is that they are not programmable: the program is burnt onto the chip at the point of manufacture. While there are billions of these devices in existence, they are very simple devices and are generally not capable of processing complex data like date and time. When people say that there are "chips" in domestic appliances like coffee machines, toasters and irons, what they actually mean is that there are microcontrollers in some of these machines. They are not at risk of year 2000 failures. Based on information received from many clients who have undertaken extensive research in this area, we believe that, at the century boundary, free-standing microcontrollers will experience a year 2000 failure rate of less than one in 100,000 (0.8 probability).
5.3 Microprocessors
Return to TOC Whereas microcontrollers are pre-programmed devices, microprocessors are considerably more complex. These are effectively "computers on a chip": they provide the ability to execute instructions contained in a program that comes from somewhere else. Therefore, microprocessors are neither compliant nor noncompliant: they are passive devices that need a program in order to become active. The typical configuration for a microprocessor is as the heart of a programmable logic controller (PLC). The program the microprocessor will execute will typically be found on a co-mounted chip such as a programmable read-only memory (PROM). It is the program that must be assayed for year 2000 compliance, not the microprocessor.
A PLC with no RTC and no connection to any other device with an RTC cannot generate a date from thin air and should be considered to have the same potential for year 2000 problems as a microcontroller: one in 100,000.
A PLC with no RTC but that is connected to another device (typically a PC) that does have an RTC and that can therefore theoretically pass date information to the PLC in a network message is slightly vulnerable to year 2000 anomalous processing. Information garnered from many clients suggests that, although it is unusual, some such devices can have problems. The numbers are small: through 2001, fewer than 0.25 percent of microprocessors not co-mounted with RTCs will demonstrate year 2000 anomalous processing (0.8 probability).
We use the term "anomalous processing" advisedly. It simply means that some function or process that should be supported by the device will not be supported in the expected manner. This is very different from "fail." A great many problems with embedded systems are cosmetic or minor in nature. For example, a PLC may be connected to a pressure sensor, and one of its functions may be to open a valve if the pressure reaches a certain threshold. A secondary function could be to write an audit record of the event, where the audit record has date and time as part of the information. If such a PLC has a year 2000 problem, it may well still open the valve under the correct conditions but write the date information in an incorrect format. The device is noncompliant, but it is questionable whether it has "failed." To make such a judgment, it would be necessary to discover the ramifications of the incorrect date format in the audit record.
The question inevitably arises: Why would a microprocessor process dates? The most common date-processing function in real-time control systems is "interval timing" that is, calculating the interval in time between two events. For instance, a train goes through a set of points at Time A. Another train goes through a set of points at Time B. The PLC has to make a decision based on the time interval between the two events e.g., if less than 20 minutes, perform Action X; otherwise, perform Action Y. There are many ways of programming this function. Where an RTC is available, the time of Event A could be captured and stored in a register, so that the time of Event B can be captured and the calculation made. Because nearly all RTCs support date as well as time, the programmer may store date and time in the registers.
Why is this of particular interest when considering year 2000 problems in PLCs? Because GartnerGroup has identified that many of the year 2000 problems in interval-timing algorithms can only ever occur if the first event (Event A) occurs in "99" and the second event (Event B) occurs in "00." We call this type of problem "transient noncompliance," because, although the PLC program may be noncompliant, the noncompliance can only happen once. In such cases, if the system is inactive at midnight i.e., there is no Event A with a "99" date waiting for an Event B the noncompliance will not be activated and the algorithm will function satisfactorily for another 99 years. Transient noncompliance is the most common form of noncompliance in microprocessor devices: at least 7 percent of microprocessors co-mounted with RTCs will demonstrate transient year 2000 anomalous processing at the century boundary (0.7 probability).
It is important to note that there are many other miscellaneous reasons why microprocessors can suffer year 2000 problems. Some of these problems will be persistent. However, our research shows that, through 2001, fewer than 2 percent of microprocessors co-mounted with RTCs will demonstrate persistent year 2000 anomalous processing (0.8 probability).
5.4 Large-Scale Embedded Systems
Return to TOC While microcontrollers and microprocessors correspond to the conventional view of embedded systems as "chips," large-scale embedded systems (LSESs) generally look like much more traditional computers. LSESs are typically PCs or other dedicated computers with traditional configurations involving screens, keyboards, processors and disks. The family of LSESs incorporates supervisory control and data acquisition (SCADA) systems on the factory floor, distributed control systems (DCSs) at the heart of process control, and building management systems (BMSs) controlling heating, ventilation and air conditioning, lighting and security access systems in commercial property.
These systems are typically the hub of a network of lower-level devices, such as PLCs. Since they are based on conventional information systems architecture, with an operating system loaded from disk, with multiple programs that can also be loaded from disk, and complex data stores on disk, they have considerably greater complexity than the two other main families of embedded systems, microcontrollers and microprocessors. All LSESs are date sensitive. Our research shows that, through 2001, at least 35 percent of LSESs will demonstrate anomalous date processing (0.8 probability).
5.5 Embedded-System Comparative Failure Rates
Return to TOC The decomposition of the embedded-systems problem into the three families microcontrollers, microprocessors and LSESs shows the futility of attempting to provide a percentage failure rate for embedded systems as a whole. There are tens of billions of microcontrollers, but only tens of millions of microprocessors and only millions of LSESs.
Figure 12 illustrates some of the essential differences between the three families of devices.
Source: GartnerGroup
Figure 12. Differences Between Microcontrollers, Microprocessors and LSESs
Return to TOC
- Data Storage indicates the mechanisms available for persistent retention of data. LSESs can store data on disks and frequently use databases and complex file systems that are highly likely to contain dates.
- Programmable shows that LSESs load programs from disks, so they can potentially run a large number of different programs.
- Manufacturer's Contribution shows that, whereas the microcontroller manufacturer knows all there is to know about the year 2000 compliance status of the device, the microprocessor manufacturer knows very little because the compliance status is determined by the program that is run on the PLC, not by the basic device itself. LSESs can run many programs from many different sources. The manufacturer may have some information about the basic operating system or firmware compliance, but at the application layer the manufacturer has zero contribution to make.
- Date Sensitive shows that all LSESs have RTCs and process date-related information.
- Real-Time Links Up To shows the integration hierarchy. Microcontrollers tend to communicate with microprocessors, which in turn tend to communicate up to LSESs, which in turn communicate with information systems.
- Inventory shows that, while it is impossible (or, at best, very tedious) to create an inventory of microcontrollers and microprocessors, it is comparatively easy to inventory LSESs. They are large devices, and their whereabouts are known.