Date sensitive chips in electric utilities

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

This is a "Mary Had a Little Lamb in Whole Notes" question.
Why is it necessary for date-sensitive chips to be used in electric utilities? I was not able to provide a satisfactory answer to this.

-- Anonymous, July 17, 1998

Answers

Automation. Plain and simple. As with making automobiles, making power is much cheaper (and potentially more efficient) by replacing the human element with technology, particularly in the case of embedded and date-sensitive controls.
The electric company I worked with for 15 years has jettisoned 20 percent of their workforce in the past few years (3000 people). It's going to be very difficult for them to run all required processes without that extra human element.
For more details on the embedded controls problem, please see www.euy2k.com.

-- Anonymous, July 19, 1998

It seems to me that the makers of switches and controls for electric utilities do not make the computer chips that are in these devices. Therefore, the makers of these devices must order the chips from computer chip manufacturers. I find it impossible to believe that each chip ordered by all the customers of chip makers would be custom designed and built to each customer's specifications. It is more likely that the chip maker would design their chips to do a variety of tasks and thereby cut the costs of fulfilling their customer's needs. Certainly, these "do it all" chips would have to contain the capability of keeping track of time and dates. Therefore, it is possible that ALL devices containing a computer chip could fail at 12:00 a.m. 01-01-2000. If there is a flaw in this reasoning, I would really like to hear about it because this is a scary scenario.

-- Anonymous, July 20, 1998

Kelly, If a technical monograph I recently came across is correct, your guess is about half right. Virtually all chips are indeed made with date capability, but not all use that capability in their current application. The "scary" part is that where these chips don't have a date function in their current application, "the clock is still ticking" internally, but not in sync with the calendar. Apparently what happens is that the chip's internal clock starts up from its "born on" date, or some other default date, and begins the march to '00'. It may hit that date in 2000, or 2001, or even 2005. This "expert" said we can expect chip-related failures over a six-year window. The good news is, if he's right, everything won't go haywire all at once. The bad news is, we really have no idea when they WILL go haywire. So... even if we somehow squeeze through 01-01-00 without total meltdown, there may be more surprises in the (black) box(es). I too welcome a more technically-minded person to comment on this. Thanks!

-- Anonymous, July 20, 1998

Why would an embedded system go down just because the (unused) clock rolls over? When a PC's clock rolls over, it just chugs along with the wrong date.
I thought the basis of the problem was that software -- or firmware in this case -- would not process dates correctly. If the firmware isn't paying attention to the clock, what's the problem?
Yes, I think some embedded systems are going to make life rough for a lot of people. But unless shown otherwise, I have to doubt that systems that don't use their clocks will be among them.

-- Anonymous, July 21, 1998

The question "Why would an embedded system go down just because the (unused) clock rolls over?" is a good one. And remember, only a small percentage (1%, 3%, 5% seem to be touted a lot) will fail.
One needs to understand that when an event occurs that the "computer" (such as an embedded system) cannot handle, this is then treated as what is known in the computer biz as an "exception". What happens then depends on how the "exception handler" -- the software (or firmware) that is responsible for dealing with this event -- starts executing.
Apparently, in the great majority of embedded systems, the end result will be that the way that the exception is handled is to CONTINUE (as if the exception had not occured). In a few systems, the result will be such that a COMPLETE FAILURE occurs. In others still, the result cannot be predicted -- i.e., the system will exhibit unintended behavior.

-- Anonymous, July 22, 1998

Joe, do any microcontrollers/embedded processors actually flag an exception when the clock rolls over? I'm ASSuming they don't, but....
From my playing around in the 8-bit computer days, I think what you've called an exception (isn't that a mainframe term?) we called a "hardware interrupt." The 6502 processor, one I'm fairly familiar with, would (upon such an interrupt) read an address from high memory and jump to that address.
So if a clock rollover *does* generate an interrupt/exception, and the firmware doesn't provide a valid handler (which could be simply an Return from Interrupt instruction), it would indeed go wandering off into some who-knows-where part of memory. Most likely, the system would hang pretty quickly -- BUT could be revived by a reset.

-- Anonymous, July 23, 1998

Larry, having had mainly a software background, and mainframes at that, I am sure that your description is much more technically on target. However, the important thing to note is that, CONCEPTUALLY, they pretty much amount to the same thing: 1) Computers -- including embedded ones -- can have a timing function "in progress". This is independent of whether the application(s) that we use them for uses this timing function in any way. 2) If the timing function, due to the inability to represent the time due to an overflow type condition, causes an undesirable event (exception, hardware interrupt, whatever), this may cause a failure (possibly curable with a re-set -- though not always obvious or convenient!) or worse (incorrect action).
This is a concept that folks who are not familiar with these things often have trouble comprehending....

-- Anonymous, July 23, 1998

Joe, I think we're using different words & approaches to say the same thing. :-)
Given your scenario, I'd put odds on the system taking some kind of incorrect action rather than just hanging. That's where the fun begins....

-- Anonymous, July 24, 1998

Another possible outcome of a hardware register rollover (overflow) may be the following: The interrupt-handling routine takes care of it ok, but the date now has 00 for the year. A subsequent calculation (of elapsed time, for example) using the 00 date and a previous 99 date results in a negative result. How the software handles this depends on the programmer's style and thoroughness, but in any case would cause undesireable effects.
Another thought: I also read the monograph referenced before; it is by Mark A. Frautschi, Ph.D.,of Shakespeare and Tao Consulting, at the following URL.
http://www.tmn.com/~frautsch/y2k2.html
The point in time when an embedded processor rolls over its date, depends on the "epoch date" which was hard-coded in, and how long it's been since the processor was powered up. It could roll over now, or anytime in the next few years. So why not cycle power on all of them to reset the date; wouldn't that give the system a new lease on life? I'm sure that this resetting procedure would not be simple, but maybe it's easier than trying to fix the basic problem, or than suffering the consequences of unexpected sporadic failures.

-- Anonymous, July 24, 1998

Michael,
You say:
>The point in time when an embedded processor rolls over its date, depends on the "epoch date" which was hard-coded in, and how long it's been since the processor was powered up.
This brings up a question"
How many of these devices have NOT been reset by a power down/power up process sometime since their initial installation?
Perhaps the process occurred during normal scheduled downtime, perhaps it occurred as a result of a weather related power outage. If true, this may indicate that we could expect few failures before the year 2000, but then have them spread out for years afterwards, even if we do nothing.

-- Anonymous, July 25, 1998

I believe that there are two technical articles that can answer these questions. The first is a discussion of several levels between system design and application where Year-2000 faults can occur, and with varying severity. The second article includes discussion of two timing chips that were not compliant and the engineering (firmware in one case and both firmware and hardware in the other) that produced the Year-2000 compliant versions. In all four cases, time is represented as a string which may be called as a unit and later parsed as necessary. Thus, when one needs hours, seconds or minutes, one also gets years.
http://www.esofta.com/pdfs/Y2KEmb.pdf http://www.dalsemi.com/TechBriefs/tb8.html
While these articles do not provide one answer to this question, they are the best illustrations that I know of for the basis for performing an analysis of a given system and its application.

-- Anonymous, July 25, 1998

Here is some relevant information from a y2k project manager for a NZ power company. It takes the form of a Q & A session, my questions his answers. Like many people in IT (business computing), I have worked on application software for business systems I am therefore quite aware of the associated y2k issues. What is a complete mystery to me is the possible y2k problem of embedded controls systems/chips on equipment used to generate/distribute the power supply. As the prospect of power cuts would have greater immediate visible repercussions than the effect of business software failure or malfunction, I would be pleased if you could provide clear answers to a few questions. Do you really think there is any risk of embedded control systems failing thereby causing the associated machines to malfunction or stop as a result of an internal date/time clock moving to 2000. [Brian Donnelly] Yes, there is. So far we have found several problems primarily in the control & monitoring systems, which if left unchecked would leave the control network unable to respond to system events or failures. Actual embedded systems are not in themselves a large risk, in general (if equipped with a clock) will simply roll over to /00, the risk is in the control software recognising how to interpret this.
Are power control systems internationally (at least in the non-ex-communist world) fairly similar in design, ie would we expect the same sort of outcome in 2000 all over the world, whatever that might be. Is the concern about the power supply completely misplaced. [Brian Donnelly] To my knowledge, power systems in UK / N America / Australia / NZ / and parts of Asia, are all fairly similar (I think this probably extends world-wide, but on that I cannot comment) in nature and deployment. I don't think concern is misplaced. I am aware of a number of industry problems world wide which are similar in nature. Fortunately, the industry is fairly well prepared for handling system failures as electrical generation and distribution is a volitile business subject to frequent "corrections".
I think that most people would find it hard to believe that there would be a likelihood of y2k power cuts. If there were a risk it is also likely that any bad news would be suppressed to prevent panic. [Brian Donnelly] I think there are a number of "watchdog" agencies out there depending on the country in question, so I don't think the problems (or potential) would be covered up. When I'm asked if the lights will go out, my typical reply is "that they may not go out... but I'd expect them to flicker". The reason why I think there will be problems is because: (a) I don't think it is possible to identify all embedded components (we operate 29 power stations of both thermal and hydro variety with ages ranging from 10 to 80 years old); (b) the supply chain though not long, is complex (i.e. fuel supplier to generator to grid/grid operator to distributor/retailer) with failures at any level creating potential for wider spread of failure. Why I don't think the problems will be catasprophic is because: (a) the diversity of the generation capability (i.e. it is unlikely that a single problem will affect all stations, Y2K included); (b) there is a current state of over supply (i.e. generation capability exceeding demand by in excess of 10 % and rising); (c) control system across the supply chain are similar or in some cases identical; (d) the level of Y2K awareness in the industry; (e) the sharing of Y2K information across the industry.

-- Anonymous, July 29, 1998

I was glad to see Brian Donnelly's comments (via Richard Dale), since when it comes to the techical side of how we get our "juice", it gets pretty mysterious to most of us. At the same time, I could not help but observe that the reasons stated as to why Brian does not see Y2K problems as being "catastrophic" seemed to be somewhat on the curious side. For "(a)": is not that the Big Worry with Y2K, that in fact this single problem WILL EFFECT ALL, and at roughly the same time, regardless of backup systems, etc.? For "(b)": I don't follow what our "current" supply state has to do with what happens in Year 2000, unless it introduces a trivial time delay of some sort (?? Like I said, I am missing this one). For "(c)": would not the fact that control systems across the supply chain are similar or idential almost guarantee a catastrophe if one is not Y2K compliant? For "(d)"/"(e)": if this were 1993, these would look like better reasons; to even consider these as positive statements in mid-1998 (as opposed to what one would like to wish were presented, like "the level of Y2K remediation in the industry" and "the level of successful Y2K testing across the industry") is really stretching.

-- Anonymous, July 30, 1998

ON THE EMBEDDED FRONT (Mark Frautschi)

[Mark] added some background effects to http://www.tmn.com/~frautsch/y2k2.html and collected some links of noncompliant embedded systems under the third reference. In on-air interviews, several callers have asked [Mark] to name asked (and in one case demanded) that [he] name a single example of a failed embedded system...

~C~

-- Anonymous, January 26, 1999

Moderation questions? read the FAQ