Are We Traveling Through A Cyber Mine Field?

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

Life can be profoundly unfair, bringing misery and hardship to some and good fortune to others. Further, both situations have been known to change in an instant. Do I and the rest of the "informed public" feel lucky.....you bet!
How lucky can you get? Eighty billion embedded chips out there.....at least some of which are directly or indirectly controlling critical processes......and nothing happens, despite the fact that at least some of these chips are measuring elapsed time in negative numbers!
Once I learned how the date and time of day were typically integrated (eg. the NIST paper previously posted to this forum), I couldn't buy the "divide by zero command" hypothesis.......zero could never be reached. I am hard pressed, however, to ignore "the elapsed time negative number" hypothesis. Could it be that the chips are measuring elapsed time in absolute numbers? Would this way of measuring explain all forms of process control? If not, are we sitting on a powder keg waiting for these critical chips to de-energize?
If ALL power (ie. normal facility power, back-up generator power, individual or shared capacitor, internal or attached individual lithium or rechargeable battery) is removed from these critical chips and subsequently replaced, will we get what "common sense" and the "law of averages" suggest was possible immediately after the rollover?

-- Anonymous, January 02, 2000

Answers

Mr. DeFranza, is there something about a date-sensitive chip counting for 100 hrs. before it fails? I came across this idea on another board. I'm not a programmer, just a clerical. Anyone know anything about a 100 hour 'boundary' with embedded chips?

-- Anonymous, January 02, 2000

I have not heard about the 100 hour time delay and would like to know a credible source for this information. Meanwhile, I believe I was able to answer most of my questions from the following URL: http://www.tmn.com/~frautsch/y2k2.html
It will serve most participants in this forum well to read and re-read this paper...... It appears to imply that we're not out of the woods with regard to infrastructure damage potential yet....or anytime soon. I will try to obtain other credible supporting documentation.

-- Anonymous, January 02, 2000

Thank you, Mr. DeFranza.
I do apologize for not being able to provide a credible source regarding the "100 hr. boundary" for embeddeds, if there is such a thing. However, the post (on another board) which led to my question was rather scanty in information. The person posting there stated only, and I'm paraphrasing in single quotes, trying to convey the idea without compromising the other poster in any way:
'All right, said the embedded chip. I've experienced an error condition. But I'm not able to merely do a core dump like a mainframe computer. I have to keep performing. I'm is real-time, I must be fault tolerant. So, I issue an alert of the error condition to a SCADA system, and then I restart the program that I run.'
The reply (also paraphrased here) was: 'Yeah, I'll keep counting...until I arrive at 100 hours.'
That's all I know, Mr. DeFranza.
Thank you for providing the link to Dr. Frautschi's beautifully written paper. I still had it on file; it was one of the first authoritative works that got my butt moving on y2k, still a valid statement. In the section on Timing, for example, Dr. Frautschi states: "For embedded systems that are explicitly date dependent the minority of systems that are non-compliant will experience a peak of failures at the rollover point, midnight on 1 January 2000." The minority? Perhaps we should not be surprised, at this point, by the apparent smoothness of the rollover...

-- Anonymous, January 02, 2000

Frank, The reason lower level embedded system devices had very few problems that would have affected operation is quite simple. Most do not use dates for anything other than date stamping information. Dates are rarely used in these types of devices for calcuations or controlling functions. This is why hard failures of PLCs due to y2k bugs was not a serious threat - the ladder logic programming uses timers that for functions requiring times that work off of the microprocessor clock cylces, and the RTC is used for date stamping.
Higher level PC, Sun, and other computer based systems such as MMI's, Operatior consoles, that interfaced with lower level embedded systems were quite a different case, these had more potential for problems since they sometimes did use the dates for more important functions.
The embedded system peak was indeed at the rollover, in fact, of the assessments and testing I saw well over 90 to 95% had problems right at the transition (although some also had problems at the leap year, almost always minor). Problems on other dates were in the minority.
By the way, I have often heard experts (especially those with IT backgrounds) quoted as saying that the "divide by zero" error was not a real threat, well thats funny to me, since I saw this exact failure in a portable data acquistion system I tested. So maybe it was unususal, but it certainly occurred in this instance.
Regards,

-- Anonymous, January 02, 2000

I would just like to add, if you have not read it already, to study the "Crouch-Echlin Effect". This does directly impact all xx86 processors, and especially after multiple cycling on and off. I mean cold restarts, after a certain number of restarts, these chips WILL lock up, this was proven. So, we will have this stuff happening over the next few weeks, months, even years, until all the old xx86 stuff is gone. Happy New Millenium. The power is working in Arizona, (as I expected)and I had a fantastic celebration in a remote private resort surrounded by hot mineral springs. It even snowed today, after a record 99 days with no percipitation, state-wide. I am very fortunate to be in Arizona! Thanks Rick and Factfinder and everyone else, I can't wait to hear from Bonnie next week! Tim

-- Anonymous, January 02, 2000

Factfinder,
Electronic components are a fickle lot. There is much reiterative adjustment going on particularly after a cold start....much more so with brand new equipment. That being said, I don't doubt you witnessed a "divide by zero" command in process. The problem is that the phenomenon is apparently not reproducible under controlled conditions. Logic, therefore, takes the lead and says when you rollover to one from 60 (minutes, seconds, nanoseconds), there is no "room for zero."
Tim,
The "Crouch-Echlin Effect" appears to fit in the "non-reproducible" category also. Here is one article for those of you on the Forum not familiar with it. http://www.albertaweb.com/year2000/docs/doc1858.html
Again, the fickleness of electronic components has to take the lead here. Clearly, it was not possible to test every microcircuit of every xx86 manufactured to see if all of them operated properly. Add this to the interaction between associated components at varying stages of their operational lives and you get many, if not infinite, possible outcomes per unit of time.....most of which are acceptable by designed tolerances.......falling out of tolerance gets you a directly or indirectly observable event... a real "glitch."
NOTE: Despite the pervasive use of the term "glitch," which I first heard in 1969 during a missile launch simulation session, the date rollover problem is no "glitch." The term really minimizes a major systemic problem. I don't know of any glitches that required remedial and precautionary expenditures of a half-trillion dollars......and counting.
Anyway, returning to my reason for asking the lead question, the post-rollover possibility appears to exist that a number of embedded chips that have lost their backup power sources over the years are now totally dependent on normal facility power to retain their function. Given the suspect chips post-rollover state, it appears possible that with a loss of normal facility power and subsequent restoration of power these chips may fail....with significant consequences. Needless to say, these problems are a whole lot easier to attack with the lights on. While my concern regarding this matter is more for the impact on chemical processes and the associated environmental health and safety risks of unintended creation of hazardous materials, it is likely that the precautionary measures would be applicable to all industrial and infrastructure entities.
If my hypothesis is correct, a local post-rollover black-out may set the stage for multiple problems upon power restoration that simply could not have arisen under pre-rollover conditions. Again, if my hypothesis is correct, de-energizing embedded chip circuitry for eg. preventive maintenance purposes, could result in post-power restoration consequences that could not have developed in the pre-rollover state.
Unless anyone out there can demonstrate with reliable, reproducible results that this hypothetical scenario cannot occur, I would suggest that complacency should not govern our inexplicably good fortune to date. Here in Chicagoland, power outages were frequent last year...we had two in my neighborhood the last week of December.......I would be less than amused if I had to become a chemical, biological and/or radiological refugee. Any thoughts?

-- Anonymous, January 04, 2000

Moderation questions? read the FAQ