Old info but apparently still being ignored...

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

...by all of the Yourdonites. I can't believe how many responses on this site still refer to the "power grid failing" based on what Big Ed told them in his book.

ANP

------------------------------------------------------------------- This page is part of Dick's comments on Fallback. See the introduction.

Chapter 5

Power Control Principles

The book makes one serious misstatement in Chapter 5.

".. the computer software for electrical generating units has been written more carefully, and tested more thoroughly, than the business software in most companies. But these systems do have date calculations embedded within them (e.g., to regulate electrical generation or distribution in accordance with traditional hourly, daily, or seasonal variations in demand)"

The misstatement is important because it implies that date calculations are at the core of the principles of operation. Such is not the case.

The generation of electricity is regulated according to actual demand. There is no practical way to store massive amounts of electric energy, so it must be generated at the instant of demand. When you flip on the light switch in your bedroom, you actually cause the nation's electric system to increase the generation just enough to power the lights. The details are too technical to discuss here, but if you think about it, if the electricity can't be made in advance and stored, there's simply no other way.

The point is, the calculations for regulating power generators and power systems are in-general not date sensitive. Indeed, the regulating computer programs dont even need access to the date for any purpose. Hourly, daily or seasonal variations in demand have nothing to do with it. That's terribly important.

What is date sensitive, are the forecasts of what might happen tomorrow, or next week or in six months. We use this information to plan. On days of low forecast demand, the workers at some plants might be told to stay home. During weeks of high demand, we make sure that no schedulable maintenance is scheduled. These plans are certainly date sensitive, but at the same time they are much more subject to human reasonableness checks than are automatic controls. If, for example, a Y2K bug causes an outlandish forecast that says in the next year New York City will use no electricity while Sandusky, Ohio will use seven zillion times it's previous use, what will the effect be. Do we expect humans to actually act on such forecasts uncritically? Of course not. The safest of all automated systems are those which retain a human in-the-loop as the final filter. I encourage anyone who doubts the truth of this, to book his flights on airlines that no longer put pilots onboard their airplanes.

To be completely candid, there is one specific kind of date sensitivity that all embedded continuous regulating systems share; electric power related or not. That is the calculation of control action as a function of elapsed time. In most cases, elapsed time is calculated from a series of pulses generated at regular intervals, like ticks of a clock. For example, a program that runs once per second may assume one second's worth of action per execution, or it might measure the time now, and subtract the time at the previous calculation. In theory, some controllers might do the subtraction with date sensitive calculations, and thus be subject to one-time massively large errors in the amount of elapsed time. Even so, most of the errors would be benign in that nothing would happen, or that independent checking algorithms or physical limits restrict the effect of the error. Nevertheless, programmers will have to dig through the elapsed time calculations to find out just how the elapsed time calculations do work.

Use your home computer thermostat as a n example. Suppose it goes crazy and orders the furnace to use zero fuel for the next minute, then returns to normal? Not much damage right? Suppose it orders minus 50 gallons to be burned or plus 50 million gallons to be burned in the next minute. The fuel pump is probably only capable of delivering 0 GPM off, or 0.2 GPM on. The number of ways the thermostat can go bad, or how erroneous the thermostat's order can be, has no bearing on the actual consequence.

Try the sanity check yourself. Think of a cruise control in your car, that adjusts the throttle every 1/20th of a second to hold constant speed. What is the worst case consequence you would expect resulting from an bogus calculation of a single 0.05 second time elapsed interval?

Regulation Versus Protection

A second serious misconception in Fallback, is that it seems to confuse regulation of equipment with protection of equipment. Chapter 5 says,

"While its unpleasant to think about the power outages that could be caused by the computers shutting down, things could be even worse: an aberrant computer could, in theory, overload and burn out the generating hardware, or cause other irreparable damage."

Actually, regulating devices and protection devices are highly separated. This is true in all process control systems, not just electricity.

For example, a generator, or a power transmission line has a maximum current it can handle without damage. It is protected against this by a device which measures the current and opens the circuit breaker if the current is too high. This calculation is totally non date sensitive. It can not be overridden by any orders from an aberrant central computer. In many cases, the protective devices are not even computers, they are electromechanical relays.

In recent years, electromechanical relays are being replaced by microprocessor based solid state equivalents, called intelligent electronic devices, or IEDs for short. IEDs, take on the added functions of communicating to the central computer, and to create historical archive records. These functions might be date sensitive. However, the basic protective functions are still not date sensitive, and there is no reason to assume Y2K vulnerabilities. Further, each protective device tends to be free standing, with its own computer in its own box. Further still, the different devices overlap in function. Thus if the overcurrent protection fails, the protected device may overheat, and a completely separate overtemperature device will provide backup protection. Even if the central computer goes belly up, these independent boxes keep on doing their thing.

It's hard to explain the difference between regulation and protection without getting overly technical. However, let me just point out that it is power component and power system protective devices (the non-date sensitive kind) that actually activate most blackouts. When things get out of whack in a power system, each device and each subsystem of each system protects itself from damage. If dangerous conditions occur, it opens the circuit breakers. If enough breakers open the result can be a blackout, but the equipment is protected. In fact, you can say that the whole purpose of protective devices is to force blackouts when necessary in order to prevent serious or permanent damage.

The reason I'm explaining this here is that Fallback's premise that Y2K caused problems might damage generators and other equipment, thus resulting in month or longer outages, is not a reasonable extrapolation. In my opinion, Fallback's statement,

"The most likely scenario, in our opinion, is the blackout that lasts for a couple days; a less likely scenario, but one we feel should not be ignored, is the one-month blackout. Why? Because it could take that long to fix whatever Y2000 problems are discovered in the hours after midnight on December 31, 1999; and it could take that long to restart the system.",

is unfounded. It shows ignorance of protection versus regulating functions.

Robust Systems

The book makes the point.

"How Could Such Failures Happen? The first thing to realize is that such failures already have happened, on numerous occasions  the only difference is that they werent caused by Y2000 bugs"

That isn't the only difference. There has never been a widespread month long blackout in more than 100 years of experience. I think the statement also implies that the less than perfect history, makes Y2K extrapolations more plausible. Actually, just the opposite is true. It would serve the public better to point out that it is the system which has never failed and thus which remains largely untested that is most vulnerable to complete collapse. The more frequently a system fails the more real life experience we have about how they fail and the consequences of failure.

The non-technical reader reading the non-technical descriptions of the ripple effect and descriptions of the power system might reasonably come away with the impression that the systems are fragile. In other words, they might believe that nearly everything has to work right, or else it blacks out. The reality is just the opposite. Steam power generating plants are enormously complex, the power system itself is enormously complex, and dependent on all these power plants. The power system has operated for more than 100 years, and all during that time there has never been an instant when there were not thousands of malfunctioning, or aberrant devices participating. It's also probably true, that there has never been an instant when some isolated customer wasn't inadvertently without power. There has never been a case, hurricanes and tornadoes notwithstanding, of a national blackout, or a regional blackout lasting much more than 24 yours. That knowledge puts quite a different light on things.

I can indulge in a little hyperbole of my own. Given the many millions of components in the national power grid, and the thousands of ways that each may fail (including Y2K induced), the normal state is that the power system is challenged by failures thousands of times every day and survives most of them. Stressed by weather, especially ice storms and hurricanes, the failure rates rise millions of times higher. The ripple feedback loops are in effect continuously. I challenge anyone to come up with a substantial calculation that shows that the morning of January 1, 2000 will stress the power grid even as much as a typical ice storm.

Seven Zillion Dollar Utility Bills

There's one major difference between a financial institution and an electric utility. There is no natural upper limit on the size of a financial transaction, but you can only use a finite amount of power. My own house is wired for 200 amp service. The maximum I can use without melting the fuses is about $1,500 worth of energy in 30 days.

Actually, there are many levels of defense against getting your power shut off erroneously. Most of them are relatively Y2K immune.

1. As far as I know, there are only two ways for the public to pay for electricity. a.A fixed amount per kilowatt-hour (kWh). Your bill is calculated from the meter reading. No date calculations of any type are involved. b.The "budget" plan. A levelized fixed amount per month. Date calculations may, but need not be involved in calculating the monthly amount. My local utility adjusts mine every year by taking the total kWh 2. Anti-theft algorithms. To catch those who might be tempted to bypass the electric meter, utilities run programs that look for anomalous unexplained changes in usage. These can just as well catch outrageous miscalculations as they can catch electricity thieves. Indeed this happens from time to time in real life. Electric meters do get misread, and the customers do call the utility to complain and the utilities are used to rectifying the errors without too much ado. 3. Social connections. In my part of the country it is cold in the winter. Because of several unfortunate incidents in which elderly people froze after having their power cut off, laws were passed mandating that the utility must first contact the department of social services and arrange for a visit by a social worker before an order goes out to cut the power. 4. On-scene human beings. If you have a private home, your power can only be cut by a line crew disconnecting a switch at the nearest pole or underground vault. The people who do this will normally make contact with the residents, and they are perfectly capable of reading a meter and inspecting the service. At my house they can see that any bill for more than $1,500 for a 30 day period can not be correct. They should be able to readily detect an erroneous seven zillion dollar bill, and to override the computer orders with human intelligence. Especially in the glare of publicity following January 1, 2000.

Dick Mills, http://www.albany.net/~dmills/ West Charlton, NY 8/12/97

-- Another NORMal Person (Sam Malone@BettyFord.com), March 22, 1999

Answers

A couple of things to keep in mind:

1. Dick Mills wrote those comments in the summer of 1997, in response to the first draft of the first edition of our book, which we had posted, in manuscript form, on my web site. We responded to as many of these comments as possible, and were careful to incorporate many of Mills' comments in the final version of the first edition of the book, so that readers could see that there were strongly differing opinions on the subject.

2. We updated the chapter on utilities substantially while preparing the second edition of the book, during the summer of 1998. That chapter was also posted on my web site, and I don't recall hearing any fundamental disagreements or objections about the material

3. There are now many sources of information about the utility situation, including Mills, Cowles, Roleigh Martin, the testimony from Congressional and Senate hearings, the NERC report, etc. We referred to as many of these as possible in the second edition, because they all have relevant things to say, and one has to decide for oneself which report and which predictions have the most credibility.

Ed

-- Ed Yourdon (ed@yourdon.com), March 22, 1999.


My goodness, it looks like we have a more advanced NORM system now, with the ability to link the thread titles to what it actually is posting. Wow, this one could really knock the socks off of the Doomers. Wooooooooooo!!!!!

-- King of Spain (madrid@aol.com), March 22, 1999.

Na, we don't use old info. euy2k <:)=

-- Sysman (y2kboard@yahoo.com), March 22, 1999.

Uh, oh. The hoping-for-the-end-of-the-world gang will not like this! Facts!

-- Y2K Pro (2@641.com), March 22, 1999.

An interesting tome if somewhat nieve:

~~Snip~~

"""That is the calculation of control action as a function of elapsed time. In most cases, elapsed time is calculated from a series of pulses generated at regular intervals, like ticks of a clock. That is the calculation of control action as a function of elapsed time. In most cases, elapsed time is calculated from a series of pulses generated at regular intervals, like ticks of a clock. """"

~~Snip~~ Even if the central computer goes belly up, these independent boxes keep on doing their thing.

~~Snip~~

One problem that exsits in seveal chips is when they roll over to "00" they detect this as an error. This can have several effects, the chip may try to restart with an epoch date, it may stop functioning as it was programed to do on failure (even though this is not a true failure), or it may not care.... Lets hope that all of the DONT CARE chips are in devices I need to use!!

-- Chip (InBed@Computer.com), March 22, 1999.



Just one note from Mr. Cowles site (euy2k):

"Within a typical electric utility, embedded logic control is prevalent in every facet of operation; from load dispatch and remote switchyard breaker control to nuclear power plant safety systems and fossil plant boiler control systems. Whole generating units (generally, gas turbines) are controlled from miles away by personnel adjusting system loads in response to peak demands. Embedded logic control is the dirty little Y2K secret of all production facilities (manufacturing and utilities) that has the most significant potential to bring whole companies to their knees."

<:)=

-- Sysman (y2kboard@yahoo.com), March 22, 1999.


The last time I saw an official government comment on power, it said something like "major, nationwide power failure is not expected." That could mean minor problems happening nationwide, or it could mean major problems happening in various localities.

I don't think most people on this forum expect the grid to go down and stay down. On the other hand, if you live in a rural area that has harsh winters, electricity problems could be a major issue for you.

-- Linkmeister (link@librarian.edu), March 22, 1999.


Comments to Chip:

What is naive (I assume thats what 'nieve' was supposed to be) is the assumption that the date rolling over to '00' will be detected as a failure. The only way it is detected as a failure is if there is an instruction that checks the date for a OO year. If there is no reason to use the date in calculations, why check for it? The answer is you wouldn't and any marginally competent programmer would not have a routine looking for dates and causing system halts or restarts when the date was in no way significant to the function of the program in the first place.

Comments to Sysman:

Once again, your conclusions are erroneous. Systems have embedded chips, therefore they will fail when the date changes. You need to revisit Logic 101 because you missed a whole lot of steps in between there. Once again, just because a chip is not 'compliant' by definition and just because it is not 'fixed' or replaced before Jan.1, 2000, this does not mean that it will fail or cause other catastrophic problems. The overwhelming majority of these non-compliant chips will not even know that Jan. 1, 2000 occurred because they have no need to reference an absolute date and don't care whether it is 2000, 1900, or 0000.

ANP

-- Another NORMal Person (Sam Malone@BettyFord.com), March 22, 1999.


Ok ANP, this is not my area, so once again I'll use a quote from the expert: <:)=

A midwestern US fossil facility was testing a boiler feedwater control loop for date rollover to Year 2000. The control console date was set in a fashion similar to testing a PC - it was changed to 12/31/99, 23:58, and then powered down. A few minutes later, it was powered back up - with the only resultant problem being the year shown as 1980 (a typical older BIOS response). The logic loop (PLC and other instrumentation) continued to function normally. Boiler levels were simulated up and down to drive feedwater regulating valves; again, no problem. Then, the technicians reset the console clock to 12/31/99, 23:58, and did NOT power down. When the clock rolled over to 01/01/2000, there was no problem. The technicians powered down the console and then restarted it - and guess what happened? The console rebooted with a date of 01/04/80, the downstream PLC (which had not been powered down) apparently saw this as a significant mismatch with it's own clock (time as a function of integers rather than actual date), and interpreted this condition as a gross control failure. The feedwater regulating valves were driven shut, and the boiler trip logic was initiated (the 'fail safe' condition for the boiler). In a 'live' situation, the plant would have tripped.

Embedded logic is really the wildcard in the whole Y2K scheme of things for any industry where process control is utilized. No one knows or can even guess how much embedded control has the potential for failure on 01/01/2000. And even if all non-compliant embedded logic and controls were identified in every industrial process that used them, there's absolutely no assurance that the controls industry or chip manufacturers would be able to meet demands for upgrades or replacements. Even getting support from the vendor who installed your system(s) is going to be a crapshoot.

-- Sysman (y2kboard@yahoo.com), March 22, 1999.


Coments to Sysman:

"Ok ANP, this is not my area, so once again I'll use a quote from the expert: <:)= "

Who is the 'expert'? Where is this alleged story from? Anecdotes and urban legends make good filler for Reader's DIgest but that is about all they are worth (send this in, you might get $75 if they publish it!)

"A midwestern US fossil facility ... the downstream PLC (which had not been powered down) apparently saw this as a significant mismatch with it's own clock (time as a function of integers rather than actual date), and interpreted this condition as a gross control failure. The feedwater regulating valves were driven shut, and the boiler trip logic was initiated (the 'fail safe' condition for the boiler). In a 'live' situation, the plant would have tripped."

I don't believe a word of this! Why would a difference in date constitute a 'gross control failure'? Answer: it wouldn't because the date is absolutely meaningless to a regulatory control loop. These types of controls and valves are incremental by design, that is the system calculates changes to the output, not absolute outputs. In the event of a 'gross error' or power failure, everything simply maintains its previous setpoint and previous output. No boiler control system was ever designed that would initiate a shutdown because the date in the PLC did not match the date in the central host. In fact, in most of these systems, there is a designated master timekeeper which sets the time in all of the systems attached to it so the PLC would have been reset in due time. Unless you have any 'real' proof of the above, I stand by my original assertion.

"Embedded logic is really the wildcard ... there's absolutely no assurance that the controls industry or chip manufacturers would be able to meet demands for upgrades or replacements."

Again, the assumption is that every embedded chip is non-compliant and everyone utilizes date related functions in a non-compliant way and that every one will cause a system failure at midnight 12/31/99. This simply is not the case.

ANP

-- Another NORMal Person (Sam Malone@BettyFord.com), March 22, 1999.



Why are we still speculating about this? Shouldn't there be hard information by now? Is it too much to ask for the Senate (or NERC or NRC) to hire a consulting group to pick a plant at random, and go in and test for Y2K problems? With enough money (and done early enough) the utility would not object (hey, free Y2K remediation!)

And for those utilities which have fixed their problems, can't they say something straightforward like "Senator, a plant identical to ours but unremediated (in Europe, say) would have crashed for a month. No doubt about it." or the opposite ("it would have been a minor glitch.")

Come on, already! There are only 9 months left! Why can't we get some hard info? If it's just legal hassles, someone should spend some money and get these guys to divulge info!

-- Michael Goodfellow (mgoodfel@best.com), March 22, 1999.


Hi ANP. This information is from Mr. Rick Cowles site at euy2k, linked above. Here's a snip from his bio. <:)=

Mr. Cowles began his career in the commercial electric utility industry with Stone and Webster Engineering Corporation in 1980, after serving six years on nuclear submarines in the U.S. Navy. In 1983, he joined the Operations Staff at Public Service Electric and Gas Hope Creek Nuclear Generating Station. Over the next 15 years, he worked in the power generation, regulatory and business ends of the electric industry. Hes spent time on the shop floor, in the board room, and control room. His information systems and instrumentation / controls experience (ISA certifications, 1983) span that entire timeframe, from System 38 and Tandem NonStop II system operations to an SAP enterprise resource planning implementation.

-- Sysman (y2kboard@yahoo.com), March 22, 1999.


PS - By the way ANP, what are your credentials? <:)=

-- Sysman (y2kboard@yahoo.com), March 22, 1999.

Ok Sysman, we are getting closer but Cowles site also shows it as an anonymous submission. I simply can't accept that as fact given my own experience and background in control systems (BS/MS in Systems & Control Engineering, 15 years in the Automation & Control Industry).

As far as his bio is concerned, let's say I am not convinced of his expert status regarding control systems and embedded systems in general.

Mr. Cowles began his career in the commercial electric utility industry with Stone and Webster Engineering Corporation in 1980, after serving six years on nuclear submarines in the U.S. Navy.

Q: Doing what? Since no college credentials were included, my guess is that he enlisted in the Navy out of high school and the fact that he was on a nuclear sub versus a garbage scow has no relevance to his qualifications. What was his position with Stone and Webster, design engineer or custodian?

In 1983, he joined the Operations Staff at Public Service Electric and Gas Hope Creek Nuclear Generating Station.

According to PSE&G's home page (www.pseg.com) "Hope Creek Generating Station began operating in 1986." Hmmm...

Over the next 15 years, he worked in the power generation, regulatory and business ends of the electric industry. Hes spent time on the shop floor, in the board room, and control room. His information systems and instrumentation / controls experience (ISA certifications, 1983) span that entire timeframe, from System 38 and Tandem NonStop II system operations to an SAP enterprise resource planning implementation.

Again, doing what? I have 'experience' with a lot of things but certainly not 'expertise'. His 'ISA Certification' essentially menas he attended a 2-3 day short course on some topic.

So, nice try but, again, how about some real evidence?

-- Another NORMal Person (Sam Malone@BettyFord.com), March 22, 1999.


The title of chapter five in my copy of "Time Bomb 2000" (bought June 1998) is "Year-2000 Impact on Banking/Finance".

Here's what Ed Yourdon had to say about electricity in an on-line chat last month:

http://204.202.137.113/sections/tech/DailyNews/chat_990212yourdon.html

[snip]

Shedel from [207.4.188.144], at 1:11pm ET

Mr. Yourdon, My big question is this: What do you really think the odds are that the power grid could go down for a significant period of time? In my opinion, this is the one big factor that could lead to a doomsday scenario. Everything else, we'll recover from... eventually.

Ed Yourdon at 1:11pm ET

Most experts now believe that we will not suffer a nationwide power failure. But we may experience localized power disruptions in various cities, perhaps lasting as long as a few days or a week.

[snip]

-- Linkmeister (link@librarian.edu), March 22, 1999.



Well ANP, sounds like you know what you're talking about. I've got to leave work now, but I'll be back on-line when I get home in about an hour. Maybe Dr. Robert D. Watson, Ph or Mr. Robert A. Cook, P.E. will pick-up on this thread. They're much more qualified to discuss this issue. Nice chatin' with ya! Later. <:)=

-- Sysman (y2kboard@yahoo.com), March 22, 1999.

Sysman:

The Cowles example you gave bothers me as well. This doesn't sound like a y2k problem at all. It sounds like this situation can happen whenever the console and PLC clocks get sufficiently out of sync. It was simply coincidence that this problem was uncovered as an artifact of y2k testing.

And clocks drift. And clock chips break. And BIOS code to read the clock chips (at least at the PC end - the console) very commonly have bugs in the RTC access code - indeed, almost universally! Of course, these errors are extremely rare events, but all combined causes of loss of sync ought to trip plants pretty regularly if this story is true.

And the story makes it clear that there is no internal enforcement to keep these two clocks in sync with one another - it was easy to change one and not the other.

Either this is a real design flaw in the system, or it is an intentional safety feature not understood by the testers, or the actual sequence of events became oversimplified in the summary, or (I think very unlikely) this event never happened. In any case, this is NOT a y2k bug.

-- Flint (flintc@mindspring.com), March 22, 1999.


Thank you for your input Mr. Yourdon.

ANP and Flint, like I said this is not my area, and I don't think I can add anything more to this topic. I'll keep an eye out for Dr. Watson and Mr. Cook and see if I can get them to comment. Here's more from Mr. Cowles bio: <:)=

Rick Cowles is one of the leading experts on the Year 2000 (Y2K) computer problem, specializing in Y2K impact on the electric utility industry and microprocessor based control systems. He is a founding member of the Computer Professionals for Social Responsibilitys Y2K Working Group, and has been featured on NPRs "Morning Edition", "All Things Considered", Tony Keyes "Y2K Advisor" radio program, NBC's "Today Show", ABC's "Nightline", CTV's "Dini Petty Show", and other North American based broadcast media outlets. Rick has also testified on the Y2K problem in the electric industry before the U.S. House of Representatives Science Committee, Subcommittee on Technology. He has chaired Y2K conferences, and speaks frequently on the topic at seminars worldwide.

-- Sysman (y2kboard@yahoo.com), March 22, 1999.


The amazing thing about the 10Q Reports from the electric utilities is the fact they mention checking embedded systems. I suppose they do this for fun and to spend money to reach the Y2K budget goals. I wonder why they keep talking about "Islanding". This must be a new dance. The problems in the safety systems of the Nuclear Plants are just ways to obtain rate increases. I am certainly pleased that someone is willing to guaranty that there will be no national blackout. Unfortunately I would prefer such a promise from the electric utilities..any one of the 7800 or so of them would be just fine.

-- Mike Lang (webflier@erols.com), March 22, 1999.

ANP said:

"Again, the assumption is that every embedded chip is non-compliant and everyone utilizes date related functions in a non-compliant way and that every one will cause a system failure at midnight 12/31/99. This simply is not the case."

This is typical pollyanna blather. It reveals that ANP is not focusing on the issue and is instead setting up strawmen. No veteran to this forum, or to the y2k problem in general, believes that outlandish crap.

What folks like ANP refuse to acknowledge, and what folks like Ed Yourdon perfectly understand, is that you don't need to have many problem before the situation gets out of hand. Two or three times the number of problems, if not remedied in a timely manner, are all that is needed to upset the apple cart. And solving them in a timely manner will be difficult in a world racked by other y2k problems.

Y2K will not happen in a vacuum. It will happen in parallel with major economic, technical, and political upheavals worldwide. Get with the program, ANP.

-- a (a@a.a), March 22, 1999.


Fuel problem? Transportation of fuel problem? Nuke safety problem? Communications problrm?

-- Wiseguy (got@it.gov), March 22, 1999.

Well ANP, following is a snip from the Chicago Tribune dated March 21 1999. You can thank Mr. Drew Parkhill, CBN news. Here's a link to the story here, that has another link to the Tribune site. Notice the number of Y2K stories on the Chicago site. <:)=

Power plant example

...

As part of an experiment last year, technicians at the huge Xingo hydroelectric dam on Brazil's Sao Francisco River set the dates on the plant's main computer forward to Jan. 1, 2000.

What happened next is still sending chills through Latin America.

"When they put the date forward, the whole control board went haywire," remembers Marcos Ozorio, one of the members of Brazil's presidential Year 2000 commission. "Twelve thousand warning lights flashed all across the board, with all kinds of alarm information."

Technicians quickly switched back the date, and are now ferreting out the plant's Y2K bugs. But "if you had been surprised by a situation like this, what you'd have had to do is shut down the plant until you found where the failures were," Ozorio said. "Automatically you'd be taking off the energy board 30 percent of northeast Brazil."

-- Sysman (y2kboard@yahoo.com), March 22, 1999.


To Sysman:

Now, that's a little more like it although if it happened last year I would have thought that it would have been reported elsewhere by now. But, I'll give you that one until I am able to verify/dispute it. However, the existence of alarms does not in itself cause the power grid to go down. If you have spent any amount of time in a control room in any industry, you would know that alarms are a normal occurrence. Control engineers overdesign alrm systems so they alert you to everything, whether it is significant or not. What I did not see in the article was what the post mortem of the exercise showed: were they really in danger of losing the power or was it just a series of nuisance alarms that, once acknowledged, had no detrimental effect. Although you haven't convinced me yet, this gives me another lead to check out.

To the a_a imbecile, no I am not a pollyana. You have added nothing to this as you only reiterated what Ed has written and what NO ONE -- repeat NO ONE -- has been able to show one concrete example of. I stand by my original premise which is that embedded systems and non-compliant chips are not a problem of themselves. Only when the application programming uses a non-compliant routine to do date calculations is it POSSIBLE for there to be aproblem and even then it is likely not to cause anything more than an administrative and bookkeeping problem, certainly ot enough to bring the US power industry to its knees.

ANP

-- Another NORMal Person (ANP@BettyFord.com), March 22, 1999.


Are any of us claiming here that the US power industry will be brought to its knees?

-- Linkmeister (link@librarian.edu), March 22, 1999.

Damn ANP, you're tough! This is good though, I'm enjoying our discussion even if I am in waaaaayyy over my head! I still hope to find one of the above noted Roberts to get their view, since they've been "nuke guys" for decades.

However, you have presented an area that I do have 31 years experience in, application programming. Well, more systems programming. I worked for IBM for 9 years, and coded part of the DOS/VSE operating system. I also worked on a "black box" Z-80 based terminal controller for about 3 years. All ASSEMBLY language. I love Assem, 360/370/390, x86, 6502, etc. Yea, I do COBOL, FORTRAN, BASIC, JAVA, CICS, etc. but ASSEMBLY is the hobby that I get paid for.

So only one question. An embedded system is usually a CPU, some sort of input/output and a PROGRAM, usually ROM based, but maybe floppy based, etc. So how do you stand behind your statement "Only when the application programming uses a non-compliant routine to do date calculations is it POSSIBLE for there to be aproblem and even then it is likely not to cause anything more than an administrative and bookkeeping problem"??? After all this is a SOFTWARE problem, be it on a mainframe, PC or "embedded system". <:)=

-- Sysman (y2kboard@yahoo.com), March 22, 1999.


Sysman:

My statement merely means that a non-compliant date or date function in and of itself is not a problem. Example: I have a dumb chip with a 2 digit date representation that will change to 00. I also have a date calculation routine that reads in the numerical day, month and year, and displays it at the top right corner of my display by lookin gup the month in a table and concatenating a '19' in front of the year. So, on 1/1/2000, my screen will show "January 1, 1900". I have a non-compliant embedded chip and a non-compliant date function but it has no effect on how my system operates. That is why most people are using the Y2K ready term now instead of compliant. Compliant puts a much larger and often unnecessary burden on the application.

Like I said, I am no pollyanna and I amvery aware of the real Y2K issues and the problems they can cause. But, the concerns over industrial control systems are extremely exaggerated and based on supposition and unsubstantiated anecdotal evidence as far as I can tell.

Now, back to the other thread!

-- Another NORMal Person (ANP@BettyFord.com), March 23, 1999.


Ok ANP. I'm going to sleep on your point (no comments from the peanut gallery please!). 12:45 here in NJ, time to hit the sack. I've put up a post seeking professional help (again, no comments please). Continue this Tuesday? Happy new year! <:)=

-- Sysman (y2kboard@yahoo.com), March 23, 1999.

ANP

Perhaps you can explain the logic flaw here. I have had several maintenance tech's and instalation engineers give me this as the reason they are vacationing somewhere warm for DEC through April or later::

Embedded control processor sends status data update, including time and date stamp to mainframe. One side or the other is less than fully compliant. (Probably the embedded control system) The control system waits for mainframe handshake or confirmation (depends on how tightly coupled/controlled the process). If the handshake does not arrive in a given amount of time, the error process ocurrs, which is usually a series of iterations of "send . . . wait". After a specified number of iterations the controller goes into an "Error sensed" state which may be open or closed.

If the handshake comes back with a diferent date the controller goes into the error state.

If the handshake comes back ok and then the mainframe goes down due to handling the non-compliant date, the NEXT control interval will not get a handshake.

If the mainframe goes down in the middle of a control iteration, where does the controller fail to?

Chuck, Night Driver ( I transport coinsulting tech's and BIG 3 Management Consultants. the Tech's are ALL VERY WORRIED, and the McKinBoozlings have to ask me "What's up with Y2K?")

-- Chuck, a night driver (reinzoo@en.com), March 23, 1999.


wups "coinsulting Tech's" don't actually trade insults!! LOL They are "consulting Tech's"

-- Chuck, a night driver (reinzoo@en.com), March 23, 1999.

They all must work correctly to operate safely. Most must work correctly to operate at all, with no margin of safety or error.

But it only takes one failure (in the wrong place) to shut it down. Completely. Unitl the right problem is found, isolated, tagged out, replaced (if available) restarted, and oepration begun again - if possible. Until the next change.

Not true (as in the beginning was argued) that there have not been long power failures. Aukland NZ lost power for several weeks - and they have 4 cables in parallel as backup against failure. If one controller (or one process controller) paper mill, power plant, airport, or chemical refinery goes down duo to controller failure - chances are several will fail at the same time, or at different times.

It isn't the known errors that will shut things down - it's the unknown ones left over after you THINK you have taken care of all the known ones that will shut the processes down. The only step that counts is full-up intergated testing, and retesting the system.

Then drill to train the operators. And the fossil plants and the grid isn't doing this. the chemical industry and steel industries, and rubber, and cement and coal, and .........

It isn't the

-- Robert A. Cook, P.E. (Kennesaw, GA) (cook.r@csaatl.com), March 23, 1999.


OK Chuck, I'll give it a try.

The so called handshaking between regulatory and supervisory control systems is typically just an acknowledge (ACK) or negative acknowledge (NAK). Lets say a tank level controller gets its setpoint from a higher level controller every 5 seconds and transmits its current feedback at the same interval. The supervisory controller may also be getting its target from a batch scheduling or other optimizing controller which it is communicating with on say a 30 second interval. The actual control logic for a PID controller is typically about 10% of the total lines of code used to implement it. The rest is control mode handling (Local to remote, Auto to manual, etc.) and error handling logic (excessive error, unexpected change, etc.) This is the subtlety that most people who have not had direct experience with industrial control systems miss. To address your question, if communications is lost between the different levels in the hierarchy, the controllers automatically 'shed' their control mode to the next lower mode. If it was in cascade mode, it switches to auto and if it was in auto, it switches to manual. What this means is that the controllers will either control to the last valid setpoint it received or hold the output at the last valid position. Since industrial processes are largely static in nature, this is the correct fail-safe mode.

Now, lets assume that some moron used an absolute date implicitly in the calculation of an operating point target at some point in this chain. When the year changes to '00' you will likely get a calculation of zero or infintiy, depending on whether you divide or multiply. All control loops have high and low limits on inputs and outputs. If an output limit is reached, an alarm is triggered. If an input alarm is reached, the input is rejected, the last valid input is held, and the control loop sheds to its fail safe mode. If you are familiar with analog instrumentation, you may know that the standard rnage of measurments is 1 to 5 volts or 4 to 20 milliamps. The reason for this is so that a feedback reading of zero volts or zero milliamps is (correctly) interpreted as a failure and not a valid measurement of zero.

I would also disagree with Mr. Cook's assertions that all things must work correctly to operate safely and most must work to operate at all. Most industrial processes are designed to be inherently stable (rememeber, all of these industries were in operation long before computer control came into being) and at any given time in any plant, there are dozens of loops in manual mode or instrumentation requiring calibration or replacement. The term is called 'graceful degradation' and is inherent in any control system design. At some point, the number of failures will cause a problem but no one failure of a control valve or transmitter will shut down a plant. As far as his examples and how they relate to Y2K, TMI was caused by mechanical and human error and the others all sound like mechanical defects or failures, not the result of a software bug. Tragic cases but not really relevant.

Finally to Sysman, sorry for leaving last night. When I got your sign off, nobody else had offered anything of substance up to that point so I bagged it as well.

ANP

-- Another NORMal Person (Sam Malone@BettyFord.com), March 23, 1999.


Score so far on embedded systems in electric power generation plants::

"Yes, they matter"-- (1,2,3, ...,n) "No, they don't"----- (1,2,3, ...,n)

where n is any positive integer.

With just over 8 months to go in the bout, the contest so far is a draw.

-- Tom Carey (tomcarey@mindspring.com), March 23, 1999.


Moderation questions? read the FAQ