Are power plants (and others) testing applications or the embedded chips themselves for y2k problems?

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

Would someone please clear up a confusing point?

If a non-compliant chip is used in an application that doesn't need to know the date (i.e. the chip has greater capabilities that are not being used), will the y2k roll-over cause the chip to "freeze" thereby freezing the entire application? In other words, if the chip has an RTC that it is not being used by the application because it doesn't need to know the date, would the "00" cause the firmware in the chip to freeze, which in turn would cause the chip to stop servicing the application, which in turn would/may crash the application? If the firmware spews out a bad date only without freezing, then it is a non-event because the application doesn't care what date it is anyway. I suspect the answer lies in how the firmware was coded. But how do you test a chip's firmware for compliance? It is my understanding that firmware cannot be modified like software, so how do you advance the date in the chip's firmware to test it? Is it possible to reset the clock in the chip itself?

Ultimately, my concern is that people testing critical systems, particularly in critical industries such as power, may be focusing on whether an application is date-sensitive as opposed to whether the chip itself will be operational on 01/01/2000.

-- Anonymous, November 20, 1998

Answers

Chips unlike software, usually have a very detailed spec sheet available in the manufacturer's data book. That's a good place to start.

Jim

-- Anonymous, November 20, 1998


Ralph, I can't answer your specific technical questions. Hopefully someone else will. I can provide this analysis which speaks to the subject of testing embedded systems. It's from the "Embedded Systems and Year 2000 Problem" paper at:

http://www.tmn.com/~frautsch/y2k2.html

"14.Beginning at the "black box" or "device" level it is appropriate to examine the individual embedded system from as many as ten technological viewpoints. These are chips and microcode, pre-manufacture custom functionality, post-manufacture custom functionality, interfacing of devices, drivers, operating systems, vendor-supplied application libraries, user defined functionality, user integration of systems and devices and the business processes associated with system use. In short, the manufacture and configuration of the embedded system and its application contain factors affecting overall Year-2000-compliance. Please see R. Strem and M. Smith: http://www.esofta.com/pdfs/Y2KEmb.pdf 12 December 1997."

-- Anonymous, November 20, 1998


In the following thread from the EUY2K forum: "Are power plants (and others) testing applications or the embedded chips themselves for y2k problems?" 11/20/1998 -------------------------

Yes, all of what you are saying is possible. And, in some percentage of systems it is highly probable.

I can give you this much factual information in a public forum: every single system that I have been personally involved with or about which I have received first hand news from a close colleague, that has involved IBM PC compatible embedded systems or heads, has _REQUIRED_ remediation in order to function correctly. These systems have included: multiple PC based business servers, two telecommunications product lines, 1 telecommunications PC client program, multiple nuclear power plant monitoring systems. And that's just in the last few months.

This by no means represents a meaningful statistical sample, but I don't need a Phd in Applied Math to know that five out of five is alarming.

The following are hypothetical examples but I guarentee you that a variation of each of these is highly likely to (read: _will_) occur.

Example #1: A PC compatible embedded systems board is being used inside a plant's Programmable Logic Controller (PLC). If the PLC goes down the particular plant process that it is part of goes down too. This PLC gives no visible clue that it uses dates in any way. It is almost completely stand-alone. In fact, it is so simple and so obviously doesn't use dates or times that it is not included in the Y2K inventory and assessment. The plant technician conducting the inventory scratches his head and wonders why on earth such an expensive piece of equipment is being used for such a mundane task. All it is does is measure the temperature of a step-up transformer and raise an alarm if the temperature goes out of range. Unfortunately, the technician isn't aware that this PLC is PC based. After midnight 12/31/1999 the software clock happily rolls over to 00:00:01 1/01/2000 and the hardware clock happily rolls over to 00:00:01 1/01/1900. The PLC keeps humming along fine. At 00:15 the power goes out due to a deliberate temporary power outage in order to balance disturbances elsewhere in the power grid. A few minutes later at 00:21 the power comes back on. The PLC starts its boot procedure, during the Power On Self Test (POST) of the Initial Program Load (IPL) the BIOS (Basic Input/Output Services -- in firmware) reads the hardware clock and it sees that the year is set to "00". The BIOS code has been programmed to "know" that this is not a valid date and drops the PLC into the BIOS setup screen. Of course there's no display monitor or serial terminal attached so there's no obvious visual clue as to what's wrong. The PLC is simply not booting up. No amount of re-booting will change the behavior. A PLC (PLC2) further down the line (that has come up successfully) times out on our rogue PLC (PLC1). PLC2 had been programmed to raise an alarm if it had not received a "temperature OK" update from PLC1 within 5 minutes of a power-up reboot (3 minutes for the systems to reboot, 1 minute to initialize and 1 minute to give the first status). At 00:26 the main SCADA computer shuts the plant down in fail-safe mode because it has been unable to get a critical temperature data point (our PLC).

Example #2: We start with the above scenario but make our PLC a little more sophisticated. It is a renovated model that has been declared Y2K compliant by our hypothetical vendor "Surelywell Controls". This enhanced version of our previous PLC integrates the temperature over a period of one minute and then sends the result to the downstream PLC with a timestamp. This system had been inventoried and assessed, the PLC had been upgraded to a Y2K compliant model from Surelywell Controls. The system had even been _tested_ and _passed_. But it is still going to fail. Why? Two reasons: the local time vs UTC problem (see below) and "sleeping" code problem. In this system the PLC operating systems is running using local time with the hardware clock using UTC (or GMT) time. So at 00:00:00 UTC it is actually only 19:00:00 12/31/1999 local time in New York. When this system passed the Y2K test, the tester used the BIOS setup to set the date and time. This sets the hardware clock, not the OS software clock. So when the Y2K test was run the internal software time and date was actually 19:00 12/31/1999, NOT 00:00 01/01/2000. Now this sets up the first part of the failure, insufficient or not well understood testing. The second part of the failure is caused by "sleeping" code. This PLC is only transmitting the temperature and a timestamp, not the year. Furthermore, it's only integrating the temperature over 1 minute so how can the year be a factor? Because the code originally transmitted the year and the time but a code change early in development or an Engineering Change Order (ECO) after deployement required that the year be removed. Fine, the programmer simply changes the piece of the code that used to transmit the date and time to only transmit the time. All the code that did "whatever" manipulation with the date is still there and the code path is still executed it's just that the final result is not transmitted. When the hardware clock rolls over to 05:00 01/01/2000 the software clock will roll over to 00:00 01/01/2000 EST and the Y2K bug in the "sleeping" code is hit and the PLC faults causing it to fail and hence causing the system to fail.

In a follow up investigation we find that in the fine print Surelywell Controls only claimed the PLC hardware, PLC firmware and PC BIOS to be Y2K compliant. They specifically spelled out that they could not warrant or be held liable for defective PLC application code. They also gave a polite warning that there was no substitute for thorough end-to-end systems testing after the new Y2K compliant PLC had been deployed.

Variations on a theme: All of the above scenarios are complicated by several subtle factors. If the operating system and/or application is using local time in software with the hardware clock programmed to UTC (GMT) then the "Y2K" problem (in the western hemisphere) will begin to manifest itself some hours _before_ midnight (19:00 12/31 on the east coast of the states and 16:00 on the west coast). If the Y2K test is performed with the hardware clock set to 00:00 01/01/2000 then the local time in software will be something like 19:00 12/31/1999 -- not a valid test by itself. The reverse is also true. If the tester sets the local time to 00:00 01/01/2000 for the test then the internal hardware clock will be set to 05:00 01/01/2000 and the previous condition hasn't been tested. For thorough testing you have to test both scenarios.

Also, there is absolutely no accounting for an application programmer's creativity. I have witnessed with my own eyes code where the programmer had bypassed all operating system APIs and BIOS calls and wrote his own "hardwired" library routine that read the hardware clock directly. The routine of course did not correct for the Y2K hardware bug in the RTC. Why did this programmer write this routine instead of using the documented OS or BIOS methods? Who know's? But I can give you two real reasons that happen every day. 1> He couldn't find the OS or BIOS manual that documented the proper call to use, but he did happen to have a copy of the IBM PC/AT tech reference on his shelf which explains -- in gory detail -- how to talk to the RTC hardware. So, it was easier for him to write his own routine (doesn't have to leave his desk) instead of perhaps spending hours trying to locate the proper documentation. 2> Because it's FUN! Programmers get a kick out of developing there own code. In fact, given a choice, many embedded programmers (especially junior ones) would rather code everything themselves than use "someone elses stuff".

Although the above examples are hypothetical they are based on real world systems that I am personally aware of. I know for a fact that there are systems deployed that many people depend on that have problems similar to these and they will NOT be fixed. It's not that there isn't time to fix them, or that they can't be fixed, it's simply that there was an executive decision to not fix them. The systems are obsolete, they will not be repaired. If you want to be compliant you have to upgrade.

Conclusions: 1> Lots of stuff is going to break and there's nothing we can do about it. 2> Any scenario you can imagine probably can and will happen.

You may ask, "On what authority do I speak?" I am software systems engineer with over 15 years experience. For 10 of those years I have been a distributed and embedded systems specialist. What does that mean? It means I am not a specialist in desktop applications. It means I am specialist in complex multi-processing, multi-tasking, realtime, distributed network systems. I know what I am talking about.

For more info see my comments in these threads:

http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=000Da8 http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=000BzM

Regards, A. J. Edgar Manager, Systems Software Centigram Communications Corp.

Disclaimer: In this forum I speak only for myself based on my own personal experience. I in no way, shape or form speak for my employer.

-- Anonymous, November 20, 1998


Jim,

Good information, unfortunately it's not very encouraging when the spec sheet says "NOT Year 2000 ready and a fix is NOT planned. A resolution (replacement, upgrade) will NOT be provided".

Ref: http://www.mot-sps.com/y2k/black_prod.html

Many hundreds of millions of embedded systems use those chips.

If you had a million dedicated specialists and they could some how miraculously fix one embedded system per day, they could not fix everything in the time remaining. In fact they couldn't even inventory and assess it all in the time remaining.

--AJ

-- Anonymous, November 20, 1998


Following up on my own post again... :-)

Late breaking news:

MAINSTREAM COMPUTER MANUFACTURERS FAIL Y2K COMPLIANCE TEST In a move that's sure to surprise the Y2K skeptics who think any computer you purchased since 1995 should be OK, Federal Computer Week recently conducted Year 2000 compliance tests on brand new 450 MHz Pentium II PCs.

The results? According to the report: "The systems from Compaq Computer Corp., Micron Electronics Inc., Hewlett-Packard Co. and Gateway Inc. failed the CMOS/RTC test."

Read that again: computers you buy off the shelf *right now* from Compaq, Micron, HP and Gateway fail the CMOS/RTC test (real time clock). They are not fully Y2K-compliant.

Story at: http://www.fcw.com/ref/hottopics/y2k.htm

--AJ

-- Anonymous, November 20, 1998



Mr. Edgars' repsonses seem particularly important for utilities since some of their embedded systems are PC based. From what I can see: 1. Some of these PC embedded systems will be missed in the

inventory phase. 2. Some that are "fixed" might not be fixed for the real time

clock (RTC) associated problems. 3. It will be difficult to know the extent of this problem until

after the year 2000.

Some utilities, like San Onofre(sp?) in CA, have reported up to 160,000 chips and or embedded systems. If just one out of a 100 are pc based and if one tenth of these were missed in inventory you could have up to 160 potential failures in one plant alone.

These numbers are speculative. What we need are some estimates for the percent of PC based systems in a utility. It would also help to have an estimate of the percent that could be missed in inventory. Can someone help with these estimates?

-- Anonymous, November 21, 1998


Steve:

You might want to check Mr. Edgar's previous post (go to Embedded systems thread, then click on "Will Software override RTC function"). These posts may give you a better understanding of Mr. Edgar's perspective on the IBM/PC embedded systems issue, in terms of numbers and projections.

-- Anonymous, November 21, 1998


A question for Andrew J. Edgar:

From example #1:

After midnight 12/31/1999 the software clock happily rolls over to 00:00:01 1/01/2000 and the hardware clock happily rolls over to 00:00:01 1/01/1900.

From example #2:

When the hardware clock rolls over to 05:00 01/01/2000 the software clock will roll over to 00:00 01/01/2000 EST and the Y2K bug in the "sleeping" code is hit and the PLC faults causing it to fail and hence causing the system to fail.

Reading thru this I noticed that the hardware clock rolls over to 1900 in the first example, but to 2000 in the second example. Is this a typo? (2000 presumably would not trigger the Y2K bug??)

(This question also posted in the TimeBomb 2000 (Y2000) Q&A Forum, thread: Utility crosspost)

-- Anonymous, November 24, 1998


Tom Carey wrote: "Reading thru this I noticed that the hardware clock rolls over to 1900 in the first example, but to 2000 in the second example. Is this a typo? (2000 presumably would not trigger the Y2K bug??)

In the first example we've got a faulty hardware RTC hence it rolls over to 1900 and the software OS clock rolls over to 2000 (there is an assumption here that in the first example there is no time zone set or the system is set to GMT -> UTC == local time).

In the second example we're dealing with a "compliant" PLC that has a new and compliant RTC in it. The fault is caused by the clock having been set using the BIOS and the test technician not being aware that the PLC is using time zone information in software. The software clock is 5 hours behind (EST) the hardware clock (UTC) so the bug in the software is not hit until the hardware clock gets to 5:00AM 01/01/2000 at which time the software clock hits 00:00 01/01/2000. One second before, when the hardware clock was 4:59:59 01/01/2000 the software clock was 23:59:59 12/31/1999, still in 1999.

Does that help clear it up a bit?

--AJ

-- Anonymous, November 25, 1998


Moderation questions? read the FAQ