Factfinder on Hyattgreenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread
Thanks for your critique of Hyatt's article on Embedded chips.
As I said before on this forum, the standard design on these chips is to calculate time elapsed based on the real time clock. Dates are for humans, the chip isn't interested in what year it is when it is doing calculations based on millisecond and second time intervals.
Let's be realistic. It is easier to subtract two 8 or 16-bit values than muck about with subtracting the years, then the months, then the days, then the hours, then the minutes, then the seconds and then the milliseconds and THEN convert that into a single interval value. Give us software engineers some credit please especially when memory and CPU not exactly plentiful on these small embedded devices.
I have also learned from this forum that most of the date processing stuff is done from the main control centre computers which are much more remediable.
And saying that date chips were included does not wash - such programs should also provide access to real time parameters as well as calendar time.
I will readily admit that there may have been some engineers stupid enough to take this approach and someone may even cite these lone and unique examples but attempting to apply it to the majority of devices is a different story.
So if there are problems, please do not blame these little critters.
8 days to go, the time for talking is almost at an end.
Have a Merry Christmas.
-- Shuggy (firstname.lastname@example.org), December 23, 1999
Typical polly spin. Ignore.
-- (email@example.com), December 23, 1999.
"Give us software engineers some credit please"
Any word from the Mars Lander yet?
-- a (firstname.lastname@example.org), December 23, 1999.
For the benefit of the newbies that stumble in here... You should know the latest word on the whole embedded chip debate is from the US Department of Commerce and the National Institute of Standards and Technology along with the Century Corporation (a premier Y2K embedded systems consulting firm). The NIST report indicates that the Pollyana thinking by folks like "Factfinder" are incorrect and invalid. The NIST report, released November 24th, 1999 indicates that proper testing on embeddeds has not been done. Too many false assumptions (like those taken by Factfinder, etc) have been made providing a false sense of security. This is also the conclusion of other top embeddeds experts such as the anonymous expert that Jim Lord refers to. John Koskinen and the Presidential council on Y2K also held a meeting on this topic on November 9, 1999. This meeting provided the same conclusions as the NIST report.
Therefore, we can conlude that Factfinder simply either doesn't understand the embeddeds problem OR Factfinder is just another disinformation agent bent and twisting, manipulating and distorting the very thing he claims he was to find. Here is the NIST Report Summary:http://www.nist.gov/y2k/embeddedarticle.htm
BACKGROUND There are two primary areas where embedded systems typically have date problems: 1) the calculation of elapsed times, and 2) date information transmitted from one embedded system to another or to an external system. The elapsed time problem centers on two methods for performing these calculations. One may use date information and the other may not. In method one, elapsed time is computed by subtracting a start time from an end time, e.g., 12:15 p.m. 12:00 noon = 15 minutes. If the elapsed time rolls over midnight, then the start and end dates are also required to complete the calculation, e.g., 12/30/99 12:15 a.m. 12/29/99 12:00 midnight = 15 minutes. At midnight on December 31, 1999, the rules change. In a 2-digit year, 99 becomes 00 and the computation is now 01/01/00 12:15 a.m. 12/31/99 12:00 midnight = ? One of several answers may result depending on how the date computation was carried out. On a system programmed to recognize 00 as 2000, the computation may be performed correctly. On many embedded systems, this may not be the case. It is impossible to say what the answer will be since the program controlling the embedded device may not be available.
The second method of elapsed time calculation relies on an epoch, or base date, and a time counter, which is added to the epoch to arrive at a particular date and time. An example of this method is presented elsewhere in this report.
Another problem of Year 2000 testing stems from three areas where date information is transmitted between devices. The first area concerns proprietary data encoding used in many embedded devices built as single units operating with other embedded devices from the same manufacturer. In these cases, the data passed from one embedded device to another cannot be read by testing instruments that were not specifically made by the same manufacturer for testing the devices in question. Third party testing instruments often do not detect the presence of dates in data transmissions that are encoded in proprietary codes. Hence, if a date is not detected, the embedded device may not be tested according to the testing policies used by some large organizations.
The second area involves the windowing solution used in remediation. In windowing, a pivot year is defined to express the interpretation of 2-digit years that belong in the 20th or 21st centuries. For example, a pivot year of 1950 states that all 2-digit years in the range 50 to 99 belong in the 20th century and all 2-digit years in the range 00 through 49 belong in the 21st century. If a sending device uses 1950 as the pivot year, but a receiving device uses a different pivot year, say 1990, then problems arise in the interpretation of the 2-digit years transmitted as part of a date. The sending device may see 89 as belonging in the 20th century, but the receiving device may decide that it belongs in the 21st century, thus throwing any date and time calculations off by a wide margin. The effects of this problem may be exhibited immediately as a failure if future dates are involved in the computations.
The third area is based on the premise that repairs may have been made to an embedded device to correct its date and time processing, but the repairs may not have been made or may not have been made in a compatible way to a device receiving information from the repaired device. This can happen in systems where one embedded system controls or synchronizes the operation of other embedded devices, even if not all of the embedded devices have real-time clock calendars.
The effect is that data properly formatted to correct for the Year 2000 problem do not line up with the format expected by the receiving embedded device. For example, an embedded device may transmit a 4- digit year to a device that only understands 2-digit years. A corollary to this problem is the data offset problem whereby the last 2 digits of the year may appear to be valid to the receiving device, but the rest of the information is pushed off alignment by the 2 extra digits representing the century in the expanded date. This situation may not be caught immediately and may have long-term consequences weeks or months after the data becomes corrupted.
There are several layers in which date and time can come into play in an embedded system. These might include the application software controlling the embedded device, the interface between the real-time clock calendar and the operating system on the embedded device, and data transfers between embedded systems and other devices or external systems. An exhaustive check would include each standalone embedded device and all connected embedded devices or external sources of dates in an on-line end-to-end test. In some cases, such as in the interface between the real-time clock calendar and the operating system, special-purpose testers may be required.
Embedded systems testing is not an easy task to accomplish. Various factors play into this including the following:
Unknown embedded devices located in sealed units and components within components. Devices with known problems that have not yet been remediated. Difficulty in working on embedded devices because of the environment, such as those located within hazardous areas. Hard-wired embedded components that cannot be replaced due to design issues or lack of replacement parts. Firmware or software that has been patched, but not documented. Lack of source code for software used in the embedded device. Lack of a means of setting date and time, i.e., no apparent real-time clock calendar or no data entry mechanism. Date usage that is not apparent and consequently overlooked. This last factor is especially pernicious since many embedded devices use real-time clock calendars that were developed during the 1960s, 70s, 80s, and early 90s when the date and time were used as a single string consisting of year, month, day, hours, minutes, seconds, etc. in the form YYMMDDhhmmss. Using this type of embedded device, calculating elapsed times required the use of the date in addition to time.Later devices provided the date in the form of an epoch or base date, and a counter of elapsed units of time, typically seconds, since the epoch. For example, with a base date of 01/01/80 and a counter with the value 31,536,000 seconds, we could compute the date as 01/01/81. Elapsed time calculations using the counter were performed with a straightforward subtraction of the start count from the end count. The date played no part in elapsed time calculations.
Embedded devices that do not apparently use dates in elapsed time calculations are being ignored in some embedded systems testing. This is a major oversight in the testing process since there are still embedded devices in existence that do not use the epoch and counter method, but the older date and time method.
TESTING EMBEDDED SYSTEMS The elapsed time problem and the data transmission problems often cannot be detected in standalone device testing. Unless the tester has designed cases to test specifically for these situations, there is no guarantee that experimental end-to-end testing will detect these problems. The most accurate way to find Year 2000 problems in embedded systems is to perform on-line end-to-end testing. This is not likely to happen for several reasons.
Because there are so many embedded systems in existence, not every system can be tested before January 1, 2000. In addition, testing individual or connected embedded systems and external sources of dates is a very complex proposition. The fear of damaging systems in an on-line test is probably the greatest deterrent to performing embedded systems tests, but there are methods that can be used to provide a sense of the risk involved in not testing. A suggested method entails triage by assigning a very high priority to those embedded systems in mission-critical or safety-critical applications.
A confusing aspect of the embedded system problem is that embedded systems with real-time clocks are often used to monitor and control other embedded systems that may or may not have their own RTCC. If the controlling system fails because of a Year 2000 problem, then the controlled devices fail by definition. The question remains, did the controlled device fail due to the controlling device failure, or did it fail due to a problem in its own RTCC? There is no way to know without testing.
Some guidelines to use in testing embedded devices and sources for dates include the following:
Test embedded systems individually and also in concert with other embedded systems and external sources of dates in on-line end-to-end tests. In end-to-end tests, be aware of synchronization issues where multiple embedded devices are controlled by an external device or other embedded system. If complete end-to-end testing is not possible, individual subsystems can be tested to minimize any risk of system downtime. Physically check for the existence of a real-time clock calendar. We recommend that manufacturers' statements of Year 2000 compliance or readiness be used only as a last resort to make any determinations since a manufacturer's definition of compliance or readiness may not meet the requirements of a particular environment. Physical testing can be accomplished through several means appropriate to the device in question. This may be accomplished through external testing instruments, signal analyzers, or test software designed specifically to look for date problems. An indirect method of end-to-end testing involves setting machine test parameters and observing how the functioning of the machine changes after the embedded system senses these settings. If unexpected results occur, one or more devices may have problems. Embedded systems can communicate or interoperate with other devices or with external date sources, such as PCs, workstations, databases, user input, or LANs and WANs. The data transfers between the embedded system and the external devices, users, or systems must be checked to determine if dates are being sent to or from the device in question. Both mainframe and embedded systems should be tested including devices with identical model numbers, even if they were manufactured recently. Mainframe and embedded system date length compatibility should be tested though 10/10/2000. This is a primary situation in which modifying an embedded system by moving from a 2-digit to 4-digit year may cause problems in alignment of the data read by an application program. Experience in conducting embedded systems repairs found this situation to be one of the major causes of problems after fixes were effected. Different platforms may use different time and date formats and different methods of computing date/time measurements. Therefore, interactions between different types of platforms should be tested. In any of these cases, if a date or real-time clock calendar, or access to either, is found, the next step is to proceed to remediation.
CONCLUSION The task of finding, testing, and fixing embedded systems with Year 2000 problems is a complex issue. If an organization waits to perform these tasks after December 31, 1999, then the costs can be much greater. These costs can include repairing collateral damage to systems and equipment from cascading problems and the expense in time and resources needed to find the real cause of the problem. Since there is no way to determine what combinations of factors will actually cause a failure, it may be difficult to determine when a failure has actually occurred. If a determination can be made, it may be possible to fix the problem if repair parts and technicians can be located, and the environment is amenable to making the repairs.
Testing embedded systems can be costly and time consuming, but it must be done. Not all systems have to be tested immediately. The priority should be placed on mission-critical and safety-critical systems. Each embedded system should be tested individually and in concert with interoperating systems and external sources of dates. Testing can be accomplished by looking for and physically testing existing real-time clock calendars, date processing routines in application software, device drivers that process dates, and dates from other external sources that may be communicating with the device under test through local and wide area networks. Applying the guidelines described in this article may give organizations a means of achieving a high degree of confidence in their systems.
-- R.C. (email@example.com), December 23, 1999.
The problem with your article is that is does not "actualise" the problem. The authors do not state how many embedded chips use dates and which use epoch time (my scenario).
If there was only one mission critical, inaccesible embedded chip in the whole world then their article would hold true theoretically but zero in practise.
As for someone's comment on the Mars Lander. Did we claim to be the only perfect profession?
-- Shuggy (firstname.lastname@example.org), December 23, 1999.
Somehow I doubt Factfinder has "misunderstood" anything, nor do I think you honestly believe he has.
As you make clear, this report was prepared by Century, an outfit in the business of testing and remediating embedded systems. While this gives them unquestionable expertise, it also means they are in the business of selling a service. This is important.
Let's say someone is in the business of inspecting and repairing possible fire hazards in your house. They lay out a long list of possible causes of fires, all of them accurate. They explain that few have made the effort, or are qualified to make the effort, to find and address every possible cause. This is also accurate. They point out that if such a cause exists in your house and you don't do anything about it, your house may burn down. Yup, that's true as well.
NOW, along comes Factfinder to point out that houses very very rarely burn down. Why not? Because all of the possible causes of such fires seldom exist, and even where they do exist they seldom lead to fires, and in those rare cases where they do lead to fires, most such fires are noticed and extinguished rapidly and with minor damage. Factfinder goes on to explain WHY the causes of house fires are so rare.
You will notice, I hope, that there is no contradiction between Factfinder and the people selling the anti-fire service. The latter are explaining what CAN happen, while the former is explaining what DOES happen.
The focus of this forum, on the whole, isn't so much on HOW things will go wrong, as on WHETHER they will go wrong. Just how much trouble are we facing? It ought to be clear that the *frequency* of bad events is far more critical to us than the *mechanisms* of those events. And you won't find a word in the NIST/Century report about how common such problems are. This is a highly noteworthy omission! After all, if you're selling a service to prevent something that almost never happens, you aren't about to mention this, are you? Very bad for sales.
-- Flint (email@example.com), December 23, 1999.
Flint, you're prepped to the max because you believe all the polly spin that you have been spewing at this forum all year? Yeah, right.
-- Yeah, Right Flint!!! (firstname.lastname@example.org), December 23, 1999.