NERC's 2nd report states only 4/10ths of one percent have done full integrated y2k testing at a power unit level

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

ftp://ftp.nerc.com/pub/sys/all_updl/docs/y2k/secondfinalreporttodoe.pdf, p.21

"Of particular interest are the results of integrated tests involving the entire power station. More than 40 units at more than a dozen utilities have been tested while operating on-line and producing power. These tests consist of simultaneously moving as many systems and components as possible forward or backward to various critical dates. These tests require an extraordinary level of preparation and coordination to ensure the safety of all systems and that the impact to the electric system would be minimal should a unit trip during the test."

What does the above numbers mean--take the total number of units in the US per the US DOE data, which is 10,421.

http://www.eia.doe.gov/cneaf/electricity/ipp/t1p01.txt -- Electric Utilities as of 1/1998 (Number of Units): 10,421

It means that:

Only 4/10ths of 1% of Power Units have been fully tested; with 11 months left!

I know there are some very optimistic utility insider experts on this forum. I don't see how optimism can be so high with such little integrated testing. I realize that NERC says that tbe above tests encountered no significant problems, but to assert that the rest of the class does not need to do integrated testing because the "A" students did it without problems, is to me absurd. I work in a very large Fortune 500 company and the most expensive part of our Y2K project is the integrated Y2K test lab. I know the utilities do not want to spend the time nor the money to do integrated testing, yet I've seen several project teams go into the lab to find out they were not in fact done and they had to go back to development and then go back to the lab again. At our company one can not go into integrated y2k test labs without certifying that at the component level everything is found to be Y2K okay.

This is one of the major reasons that I'm only giving my own utility a 80-90 percent chance of providing me power on January 3rd, 2000. I'm giving no odds for Jan 1/2 because if the grid is messed up, my local utility could be thrown a punch that knocks them out for a day or two, but since my local utility is a net generator of electricity, they have 20 percent excess capacity in the winter time (which means they could tolerate a 20 percent loss of power), and because they have backup for telecommunication failure, and because they have the knowledge and experience to act as an island, I think things are good enough to give them this high of odds. I know that many inside the industry want 99+ percent odds but I don't think they deserve it--they're not doing the integrated testing; they are doing type testing at the embedded systems level (it may or may not be a reliable statistical sample, they don't provide that information--they only stated they are not testing all identical pieces of suspect equipment) and because they downgraded their goals from Y2K compliancy to Y2K readiness.

For those anonymous people on this forum who work inside the industry, I ask you, have your own utilities done a full unit remediation and unit-scale integrated, rollforward Y2K simulation? Are they in that number of 40 units tested that NERC talks about?

How can everybody inside the industry be so confident without doing this integrated testing?

By the way, when I talk about embedded systems, I use the Classic IEE definition, which succinctly stated, refers to any software-driven apparatus which is an integral part of operating equipment or a facility; so I include SCADA and EMS systems as part of the Y2K embedded systems problem. I realize none of the utilities have found showstopper embedded chips problems. I'm talking about doing integrated Y2K testing involving the remediated SCADA and EMS and facility they're dealing with.

Thanks! (I've been mostly a reader on this forum, I've followed the postings for about 2 years now.)

Roleigh Martin http://ourworld.compuserve.com/homepages/roleigh_martin

-- Anonymous, April 17, 1999

Answers

Roleigh, Since I am quite aware that you and Rick are two of the original Y2K gurus and noted celebs, I am honored to address your questions. After all, you guys helped to bring awareness to a significant problem that has ended up employing me in y2k in nuclear power (the significant problem being in mission critical software with important date functions, and in rare cases, in embedded systems).

The short answer, we have done integrated testing of systems, not the entire plant. Read the long answer below to find out why.

A quick comment here - I believe that EMS and SCADA used for the grid is way overblown as far as the potential y2k impact on power plants. A list of pretty good articles by Dick Mills on the grid, SCADA, EMS can be found at http://www.y2ktimebomb.com/PP/RC/index.htm. I have noticed a number of incorrect statements in some of these articles and havent read them all, but most information presented appears to be fairly accurate and he has done a much better job than I could on the grid issues. By the way, some of Rick's articles are here as well since he used to write this column. When it comes to Y2K and nukes, as you know, Rick and his forum are the knowledge base winners hands down.

The long answer: To the best of my knowledge, most power plants don't even have what is typically referred to as a SCADA or EMS. EMS a term used in regards to the power grid, and SCADA, when used, can be found in grid distribution (remote operation of circuit breakers for example) or in newer vintage power plants as an interface to the devices that actually do the controlling of the plant. By definition, the Supervisory Control and Data Acquisition System has only "supervisory", not direct control, although I believe this line is getting blurred a bit. In a power plant with modernized controls, you might find DCS, or SCADA used with PLCs or other lower level devices that actually perform the automatic control functions. In a good design, the SCADA could be shut down and the plant would do fine, perhaps require manual operation. For primary plant control of large fossil/nuclear, if there is a digital system involved it is likely to be a dedicated Distributed Control System such as Westinghouse WDPF, Bailey Net90, etc.

In hydro plants with modernized controls and in combustion turbines, SCADA is often implemented as a computer (PC, Sun, etc, variety of platforms)used as a Man Machine Interface (MMI) to the control systems/devices.

Integrated testing is a good idea, however when you look at the y2k problems found in a typical power plant, you see dozens or hundreds of devices that may have absolutely no interface with anything else. As an example, a plant may have:

* Distributed Control System primary control for the plant * 6 PLC's, each a stand-alone, controlling a certain process, not connected to the DCS. * 50 digital recorders with date stamps. (analog signal inputs) * 5 portable test instruments with date functions. * Mainframe Computers (VAX, etc) * PCs (desktop, MMIs, etc) * Software - business processes, engineering, desktop, etc.

So what do we test here, for our "integrated test"? Typically, most plants that have tested their plant have in fact tested their primary control systems (DCS in the example above) ability to operate and properly process dates in the year 2000. Very few are setting the dates up on everything in the plant, for a good reason  they are performing integrated testing only of components that are truly integrated.

If the six PLCs above are stand-alone, it is very unlikely that they have real time clocks or date functions, but lets assume one does and sends data to a PC based MMI (if your design allowed you to operate devices through the MMI/PLC, you might call this a SCADA system). If this SCADA does not interface in any way whatsoever to the DCS, you can hardly perform integrated testing of the two  they dont talk to each other. Setting the dates on both for a test is fine if you want to do it, but this is basically coordinating two separate tests. Setting the dates on everything at the same time, including the recorders, is a bit silly, unless you plan to just leaving everything that way to reassure the public (and a few plants have indeed done this).

Most plants are testing their components/systems individually, since they are not truly integrated, and this testing is just as valid for y2k testing as the coordinated testing.

I believe that if you are testing, you should test all the interfaces as well, it's just that many things don't interface!

Now for the diehards who insist on full integrated AND coordinated millennium testing, you would have also have to set all of your mainframe computers and desktop computers for the year 2000 and run your business/engineering software as well. And even then, this is not the same as passing through the year 2000, since you are really not changing the space/time continuum at all...;)

Now type testing, that's a whole 'nuther subject, I have written some about it here....lets save that one for later.

Regards,

-- Anonymous, April 17, 1999


FactFinder:

Would you be willing to describe in more detail why you think that the SCADA problem on the grid is "way overblown?"

-- Anonymous, April 19, 1999


Here is one more utility insider to back up parts of what factfinder said. Many generating stations have nothing that needs an integrated Y2K test since they have no true DCS/SCADA. The other generating plants we have that have a true DCS/SCADA were just tested (all passed with flying colors). Why was the integrated test just performed now? You don't want to do a test like that until everything is 100% remediated and you have a 100% correct test plan (look what happened to peach bottom when they did a Y2K test incorrectly).

The more I see the Y2K status of the rest of the electric industry, the more I believe that NO ONE will suffer a loss of power for Y2K reasons (at least in North America).

bob

-- Anonymous, April 20, 1999


bob,

could you elaborate on your last statement- that what you've seen leads you to believe that no one will lose power due to y2k, at least in north america? (this is not an antagonistic question, i'm just wondering what your basis for that statement is)

thanks.

-- Anonymous, April 21, 1999


I don't pretend to know very much about the utility industries, but I do feel the need to ask another question that hasn't seemed to receive much attention. Recently, NASA announced a huge solar storm brewing on the surface of the sun. They compared it variously to "a huge hurricane hitting the entire continent all at the same time" and the "explosion of several thousand tons of TNT." The comparison was also made to a smaller solar storm back in 1988 that took the city of Montreal from fully functioning to total blackout in 90 seconds. Now, they're calling this the "solar storm of the century," and have warned that this storm could incinerate the insides of satellites and fry the power grid to a point where repairs could take months. Does anyone have any information on this? Can anyone give me a reason to still be worrying about Y2K problems in the utility companies if this storm can and/or will basically destroy the necessary equipment in December?? Thank you so much for answering. I bow to the wisdom and intelligence and training of the actual experts.

-- Anonymous, April 21, 1999


Drew,

Why do I think that the most probable scenario for the nations electric grid is that no one suffers a loss of power from Y2K effects?

First lets forget all the possible Y2K failure modes that were proposed in the last few years, and only consider the failures we have seen in the real world. To many of those so called examples were nothing more than possibilities. Possibilities that had to be investigated, but which almost always proved to be false.

Let's consider the facts.

1) Real Time Clocks (RTCs), never just freeze up due to Y2K. Never. They just roll over just like a odometer in your car. The car doesn't magically halt operation. The only reason why most RTCs are considered Y2K non-compliant is that they only display the year with two digits.

2) Stand alone digital devices never just freeze up due to Y2K*. We found none in our utility. Industry working groups have found none. Every time I confirm our Y2K testing with vendor Y2K testing, I look at all of the testing results for all of that vendors devices. Not a single vendor that I have had contact with has anything stronger to say than the date might be wrong, and if you use that date for other purposes you might have other problems. It is extremely rare for the date to be used in a manner that will affect the operation of the device, except in the most limited of ways. Entire web sites ( http://www.borderlands.com/y2k/y2kchal.htm ) are devoted to trying find a single example of a device that just freezes due to Y2K. They just don't exist. Discard the idea that unused clock/calendars in devices (that are there because the manufacturer bought a cheap off the shelf chip) will lock up. If the clock is unused, then you are back to point 1). *Here I truthfully have to say that someone, somewhere, might find one, but that will just be the exception to prove the rule. And look at other household examples. You don't still believe that your car will stop working due to Y2k or your microwave will malfunction, do you?

3) At the highest level of control, the DCSs and SCADAs definitely have the possibility of tripping a plant. But even here, the data is mighty thin that this happens in real life. I concede that this could happen, but my point here is that these types of systems are the fewest in number, the easiest to identify, and the easiest to fix, typically just an upgrade to a new version of the OS.

But lets now expand from individual devices and systems to the power plants and the grid.

4) It still might be possible that some electric generating plant trips might trip. However, with the Y2K contingency plans already in planning and existence, the grid will be so stable that it will take trips of massive proportions to affect the grid. Most everywhere will have 2 to 5 times the normal spinning reserve on line. Normal maintenance will not be performed at this time, and extra staffing will already be in the field. Monitoring and response time will be at the highest and fastest. Since most plants did not have any Y2K problems serious enough to cause a plant trip in the first place, you would have to assume that the Y2K fixes caused a net increase in the total number of Y2K problems to cause enough plant trips to cause blackouts or brownouts.

5) The Transmission and Distribution (T&D) side of the house is the least affected of all portions of the grid. I assume I don't need to convince you of that.

6) So why do we keep hearing reports that we might still lose power? There are many simple reasons, and here are a bunch. No electric utility will ever guarantee you have power at any time. No one group will ever have the data that will "prove" the grid will stay up. Speculation of grid problems by those in high places is taken as gospel by others. Lawsuits are minimized by telling people that their power might be out for a short time, because if they don't take reasonable actions in face of a known problem, their legal recourses are more limited (I can also guarantee someone will lose power on 12/31/1999 for reasons that have nothing to do with the technical Y2K problem. Knowing that this might happen will reduce panic in those individuals). Just the fact that electric utilities are preparing Y2K contingency plans is enough "proof" to some that power outages will occur. Utilities can only obtain their budgets to address and eliminate the Y2K problem by convincing people there might be a Y2K problem. People incorrectly believe that every single Y2K problem must be fixed or we will still have serious problems. Etc. etc. etc.

7) I apologize if I was to brief in any area. You asked a straight forward question that is (obviously) not easy to answer.

bob

-- Anonymous, April 21, 1999


y2kguru wrote:

> 1) Real Time Clocks (RTCs), never just freeze up due to Y2K. Never. > They just roll over just like a odometer in your car. The car doesn't > magically halt operation. The only reason why most RTCs are considered > Y2K non-compliant is that they only display the year with two digits.

Oh come on, this is a strawman argument. Nobody who knows anything about embedded systems ever said a RTC would "freeze up" when they roll over. And RTCs do not display anything, they're just a little chip soldered to a board.

Real Time Clocks are worthless pieces of silicon unless they are interfaced to a micro-controller or microprocessor. The danger in RTCs with two-digit years is the software that is running in the micro-controller may misinterpret the data and cause the micro-controller to crash or do a hard reset or over-write some bytes in memory or whatever.

Yes, sometimes the micro-controller is also interfaced to an LCD, and the date is displayed incorrectly on the LCD. That is a cosmetic problem, and not one we need to worry about. The real problems come when the program running in the micro-controller doesn't do the right thing when the date is interpreted as either 0 or 100...

Jon

-- Anonymous, April 22, 1999


bob:

Thanks for you detailed reply. This is very helpful at providing reasoning behind your feelings of confidence. It is good to know that so many industry insiders have not found anything that trips plants, but before I feel confident, I would like to get more clarification.

You mentioned the Peach Bottom incident. You state that everything should be 100% complete before integrated testing. By citing Peach Bottom you seem to admit that having a part of the system non- compliant could cause major havoc at a plant. Am I understanding you correctly?

Also, you mentioned that although some devices provide bad date information, they continue to operate. Have you determined that this incorrect date information is not passed to any systems that might cause data to be corrupted? Are you sure that it is only cosmetic? If so, would you be willing to explain in some detail?

Can all automated systems in a plant be overridden manually? A while back I asked this question; can you explain how DCS can be overridden manually?

I was unaware that distribution was the least of our worries. I have heard almost exact opposite reports on the distribution issues. Would you be willing to explain your views?

I have more questions, but these would be a good start.

-- Anonymous, April 22, 1999


Catlyn:

I was unaware of this solar storm. Where did you get your info? Do you have an Internet link?

-- Anonymous, April 22, 1999


bob,

thanks for your answer. i am rushing out of the office today to go out of town, so can't review it in depth. will look at it tomorrow. thanks again.

-- Anonymous, April 22, 1999



Reporter,

According to the NOAA's Space Environment Laboratory (SEC), the sun is quiet, and is expected to be quiet. No major storms or surface disruptions expected; no alerts or warnings have been issued for the last 24 hours. In the last 7 days, the only events that have occurred worth watching are K indices of 4 and above (not nearly enough to warrant concern about the power grid), and A indices of 20 and above (same comment). Each of these watches and/or observations addressed a 24 hour period, the last of which occurred 2 days ago.

IOW, the sun is behaving as predicted; no unusual activity is expected beyond that which is normal for this period of solar cycle 23.

You can go to this page, and click on the link to get the latest information on geosolar and coronal activity, and predictions:

gopher://solar.sec.noaa.gov/11/forecasts

Hope this helps.

-- Anonymous, April 22, 1999


Catlyn and Reporter, yes, solar flare activity can have some detrimental effects on power grids and communications. (And has in the past.) And yes, solar flare activity does follow a kind of high/low cycle, which is known by scientists. We are coming into a period of expected intense coronal activity, the peak of an 11 year cycle which will happen to correspond roughly with --you guessed it-- the general time frame of possible Y2K events. For some links and more in-depth comments on the upcoming high-active phase, see the former thread on this site at:

http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=000ai0

There is also an article on the ABC News Science online titled, "Those Pesky Solar Flares" which has good information. Go to:

http://www.abcnews.go.com/sections/science/DailyNews/sunspots980805.html

See also the NASA site at: http://www.spaceweather.com/

Any of these links should help in understanding the solar flare issue. Although scientists are getting a lot better at understanding and predicting certain things about the sun's coronal activity, we are still in the position of being in a crap shoot in regards to the potential effects. There are a lot of things in life we don't have any control over; certainly solar flare activity can be added to that list. It is rather nasty cosmological timing that the peak flare period will be during a period when there are other things of import to be concerned about, but c'est la vie! Nothing may happen of import, or major problems could occur. Sound familiar? (smile)

-- Anonymous, April 22, 1999


bob,

only now (saturday night at 7:05 pm) did i finally get to really read your answer. again, many, many thanks. unfortunately, i'm walking out the door, so i can't respond til tomorrow night or monday, but i do have some thoughts. much of what you have told is in line with what others have said (on the positive side). i really will post a reply as soon as i can. again, my thanks. i like it when people respond with specifics.

-- Anonymous, April 24, 1999


Reporter,

I think this will start to answer some of questions you had.

1) Peach Bottom. As I understand the Peach Bottom Y2K testing problem, there was no Y2K problem, there was a testing problem. Imagine this. You have a Y2K compliant PC. You have a Y2K compliant version of Quicken (with the feature that will electronically pay your bills and you have set yourself up to use that function). You set the date ahead as part of a Y2k test. What happens? Your program tries to pay many months of bills all at once. Is this a problem? Yes! Is this a Y2K problem? No! It is a testing problem. At Peach Bottom (this is second hand info, so I can't attest to it personally) they apparently use VMS as the OS for the Rod Worth Minimizer program. There is a right way and a wrong way to set the date ahead on this OS. They did it the wrong way, which caused the backup system under test to stop opertion. Then, to compound matters, they did not notice that the backup system they were testing had failed to the primary. So they entered the date again, and then both were down.

2) DCSs - If your DCS is running a feedwater control system, and you lose the DCS, you probably have tripped the plant within seconds/minutes. This is because it is a relatively fast moving system that can easily become unbalanced from an apparent steady state (this is a statement on the effect of a loss of a DCS on this type of system, not a statement on Y2K problems found in this type of system). The grid, on the other hand, is a much slower moving system. The typical pattern can be described as 2 cycles of an uneven sinusoid over a 24 hour period. Your generating utility's Energy Management System (like a SCADA system) is controlling over all power output of a number of power plants to match this curve. Because this curve is moving so slowly, you probably can operate with this in "manual".

3) Wrong Dates, Cosmetic or Communicated - Yes, it is important to know if that "cosmetically wrong" date from one system is communicated elsewhere and what is done with it. The point still remains, in the electric industry, it is extremely rare to find a situation where this wrong date can ever affect production, transmission or distribution.

4) Distribution. - Here is a short explanation while distribution is the least affected of the generation, transmission and distribution triad. If you turn on your light switch, and are now asking for more power, not a single thing in the distribution system makes any changes to give you that extra bit of power (just like water flowing down hill, it just happens due to physics). There is nothing but electro-mechanical devices between you and the substation. There might be monitoring equipment that is digital in nature. But this won't affect output. There may be some digitally controlled breakers, but these are either open or closed, and they typically don't change position unless an electrical fault is detected or maintenance is going to be performed. Although this is not my primary area, I am not aware of any Y2K situation where control of these breaker is lost, much less a Y2k problem where the device just suddenly switches position.

I hope this helps.

bob

-- Anonymous, April 25, 1999


Jon,

You said

"Real Time Clocks are worthless pieces of silicon unless they are interfaced to a micro-controller or microprocessor. The danger in RTCs with two-digit years is the software that is running in the micro-controller may misinterpret the data and cause the micro-controller to crash or do a hard reset or over-write some bytes in memory or whatever.

Yes, sometimes the micro-controller is also interfaced to an LCD, and the date is displayed incorrectly on the LCD. That is a cosmetic problem, and not one we need to worry about. The real problems come when the program running in the micro-controller doesn't do the right thing when the date is interpreted as either 0 or 100... "

My response.

Name one.

Give me the manufacturer, model and version of just one microcontroller that fails hard in the manner you discribed. If the embedded Y2K problem was as bad as some would like you to believe, we should be finding these things right and left. Where are they? In all my testing, I found zero. In all the industry data bases that I have access to, I have found zero. In all the vendor Y2K sites I have been to, no one describes a failure like you have. Not 0.0001%, but zero. Even Rick Cowles example of a few weeks ago is just a battery charger/UPS whose trending program stops trending. The charger still worked.

Name me just one.

bob

-- Anonymous, April 25, 1999



y2kguru wrote:

> Give me the manufacturer, model and version of just one > microcontroller that fails hard in the manner you discribed.

I can't. I don't work in Y2K remediation. I do, however, write software for both PC's and embedded systems, and I know the scenarios I have talked about are possible.

Now it may be that there may not be any micro-controllers that crash hard on rollover. They don't have to. If they initiate a domino-effect failure because some sensor stops returning the right answer, then that is a problem.

Are you willing to tell me that there is not one single embedded system out there you've seen in person or in the database that doesn't do its job? By doing its job, I mean that it analyzes whatever input it gets, and provides the correct output. I'm not talking about seeing 01/01/100 on a display, I'm talking about getting the wrong answer for a critical reading, or doing the wrong thing based on the input it gets.

If a valve in a chemical plant (or a utility) closes when it is supposed to be open, bad things can happen.

Jon

-- Anonymous, April 26, 1999


bob,

This is my reference on the Peach Bottom incident:

"To simulate Jan. 1, the technicians had intended to connect the Rodworth to another computer that would serve as a clock. But instead of connecting the unit to the external clock, a programmer inadvertently reset the date on the backup and primary operations monitoring systems, which are not yet Y2K compliant, said Joseph Clepp, an information systems manager at Peco Energy Co., the Philadelphia-based utility that runs Peach Bottom. As soon as the date was reset, the screens in the control room went blank." - Washington Post
From this it sounds to me like they performed an accidental Y2K test. The rolled the wrong clocks forward. In other words, there was a problem that had not yet been fixed. This does not sound like a testing failure. Do you have a different reference?

You stated:

"...it is extremely rare to find a situation where this wrong date can ever affect production, transmission or distribution."
Do you have something to back this statement up?

Also, are you saying that DCS cannot be manually overridden but SCADA can? This still isn't clear to me.

-- Anonymous, April 26, 1999


FactFinder & Bob,

You are hitting the nail on his head. The purpose of an integrated test is to check the integration of the systems. In our power plants we have found just one full integrated system, the Eems power plant. This is one of the latest power plants in the world. Older power plants are less integrated. Two of our oldest power plants are not performing integral tests because there is no integration between the systems at all.

Because we have already tested components and systems on all critical dates an integral test takes place on one critical date: the rollover date. The integral test must last more than a day, so daily reports etc. can be printed.

Thanks for your contribution.

-- Anonymous, April 27, 1999


Bob,

First, thanks again for your answer. Second, sorry for the delay in replying. I am just way behind, and am having to cut back on some of my efforts to get basics taken care of.

Going point by point:

Re your first point about not worrying about "possible Y2K failure modes" which had largely been just theoretical - yet, clearly there have been some genuine problems. Senator Bennett said at last week's Washington DC Year 2000 Group meeting that power plant personnel *themselves* told him they couldn't get one system (or plant- he wasn't clear) to work. He didn't provide details. However, I do know of other tests where systems have failed Y2K tests.

Moving on:

1) The RTC issue. That one I will leave, at least for now, to your discussions with Jon.

2) I might have a few questions about this point later on, but at the moment, I'm too worried about my car & microwave :) Just kidding.

3) Are "new versions" of an OS always compliant? Or will they end up with "issues" like Win98, NT, etc?

4) Even if generating plants drun into problems and don't cause problems for the overall grid, you're still going to have localized power problems, right? Doesn't that go against what you said about "NO ONE" having power problems?

5) T&D having the least problems- actually, no, this is not my understanding. Are there concerns about substations, phase balances, load balances, etc? Or am I just way off base here?

6) Why do we keep hearing reports about the possibility of losing power? Several reasons. One, although its true an electric utility will not guarantee power, they can still say "Jan 1 should be a non- event" or "like any other day." Some apparently don't even want to do that. Two, speculation in high places is often fueled by the industry itself. For instance, Bennett's example of the personnel who admitted to him another plant or system would not pass a Y2K test. In addition, Dodd's remark last June about it was only a question of how serious the disruptions would be, not whether or not they would happen, came because of confidential info from the industry itself. Contingency plans alone don't mean a plant expects problems, any more than the fact that I have homeowners or car insurance means I expect a burglary or car accident. And I suppose some people still think every Y2K problem must be fixed to avoid difficulties (although I'm not one of them).

No, I think the main reason people think we may have problems is because people in the industry itself *keep*saying*so*. They sure don't advertise it, but, for instance, take a look at the "California & the Western Grid" thread, and all the stories there. Or the "Unequal Equation: Public Posts = Private E-mails," where the poster said she receives private e-mails telling her of problems that people can't discuss publicly. I have heard numerous stories (all from people I consider at least credible, and some from flat-out *unimpeachable* sources)- but of course I can't repeat them.

Now, I don't have any doubt that a lot of stories, maybe even most of them, are rumors, etc. But I can tell you that, at least in the case of the info I hear, it *is* valid. And whether or not much of the info other people are hearing is valid or not, the fact remains that it's still coming from *within* the industry. And perception is reality- ie, even if the problems aren't real, the fact that these stories are coming from people in the industry is a serious concern to those who hear them. That, more than anything else, I think is the reason for so much concern. Some Y2K optimists in the industry like to blame "doomers," but that's just ridiculous in the extreme. "Doomers" didn't write the Sept 11 NERC report ("we don't know what the impact will be," etc); "doomers" weren't buying generators (as people in the industry have been, and still are, to this day); "doomers" weren't telling friends and family members to prepare for 6- 8 months of dirty power- industry workers have been (well, so have been "doomers").

For myself, I would be perfectly happy to accept a Y2K scenario wherein no one in the US, or even Canada & Mexico, faced power problems. And, as I have said before, I do not have a problem accepting the information provided by you, FactFinder, CL, Murph, Dan (mostly on Yourdon) & anyone else who reports positive results (as long as I can verify them if I want to).

But at the same time, I've got to be intellectually honest about this, and acknowledge that I am also hearing negative reports as well, at least at this point. And from sources which require that I take those reports seriously. Now, granted, in some of those cases, even those people supplying the negative reports say it's *possible* the problems can be sufficiently fixed in time (although one person is not so optimistic, at least at this point).



-- Anonymous, April 27, 1999


Thanks everybody for your insights onto why you do not think unit- wide integrated Y2K testing is essential. My column in Westergaard Year 2000 today basically has my response. In essence, it is that in NERC's 3rd quarterly report, NERC has dropped its doubt that unit-wide integrated testing is important and has instead recommended it. I'm reproducing the relevant part of my article here via a copy and paste function -- if the hyperlinks do not show up correctly, please refer to the original URL. Would love to hear your take on why NERC has changed their tone to be more towards what I think is good on this issue. I should also add another comment not provided in my column--another benefit of integrated testing is that different logic paths are often tested in an integrated test and consequently errors that would not show up otherwise all of a sudden may show up in such a test. That seems to be the case in the 100-plus integrated tests done todate (although among these "A" students, the errors are not showstopper errors). http://www.y2ktimebomb.com/Tip/Lord/rmart9919.htm Two Recent Major Reports by ITA and NERC Indicate Additional Need for Preparations By Roleigh Martin May 10, 1999 [snip] The other essential report that has been recently released is Preparing the Electric Power Systems of North America for Transition to the Year 2000, April 30, 1999, prepared for the U.S. Department of Energy by the North American Electric Reliability Council (NERC). Although NERC presents their third quarterly report in a very optimistic manner, a very "big nail" sticks out of the floor of their optimism.

On page 21 of their report, the report observes (emphasis is mine):

"Testing of non-nuclear generators continues to indicate a minimal number of failures that might cause an unremediated unit to trip. Fully remediated units are all expected to be able to operate into the Year 2000.

Of particular interest are the results of integrated tests involving the entire generating unit. It is estimated that more than 100 units at dozens of utilities have been tested while operating on-line and producing power. These tests consist of simultaneously moving as many systems and components as possible forward or backward to various critical dates. These tests require an extraordinary level of preparation and coordination to ensure the safety of all systems and that the impact to the electric system would be minimal should a unit trip during the test.

Of all the integrated unit tests reported to date, not one test of a fully remediated unit has resulted in a Y2K failure that caused the unit to trip...Although these results are encouraging, thorough testing is required and in some cases components must be replaced or fixed. Examples of components that require particular attention include:
First of all, they are stating that fully remediated tested units have not tripped in an integrated Y2K test -- which is not to state that unremediated test units will not be tripped by Y2K.

Integrated Y2K testing is a very expensive, time consuming process. I work for a Fortune 500 company and I headed the Y2K project for only a small client-server application which interacts with Unix and two different brands of mainframe databases. (My Y2K project finished final testing in December 1998.) Our company's integrated Y2K test lab cost in the million-dollar range and took months to put together. Projects can only go into the Y2K test lab if they have been certified as being Y2K compliant and tested at the component level. Still various projects find during the lab tests that changes have to be made while in the lab and in some cases projects have to cancel their test and spend considerable time back in development and do a reschedule weeks later to go back into the integrated Y2K test lab.

Can you imagine how expensive an integrated Y2K test lab would be for an electric utility with their DCS, SCADA, EMS, PLCs and so forth? It is obviously very expensive and it is very obvious that most utilities are resisting doing this. What does the above number of 100- plus units fully tested mean? According to D.O.E., the number of units at electric utilities as of 1/1998 is 10,421. If you generously assume that NERC is talking about less than 125 units nationwide, that means that only slightly more than one percent (1.2%) of power units have been fully tested -- with 9 months left!

Consider these early testing Y2K project teams to be your "A" students. They are to be expected to have their "act" together and it does not surprise me that they basically got an "A-" on their tests. Nothing occurred in the tests to shut their units down; still the above list of failures did occur with these "A" students. Now what about the other 99 percent of the class? Are we all to expect they will automatically get an "A-" too? I doubt it and I bet some of the NERC experts have doubts too. For instance, note their statement (my emphasis), "Although these results are encouraging, thorough testing is required and in some cases components must be replaced or fixed."

Of strong importance is that this quoted sentence replaces a downplaying of the issue paragraph used in their
second report (page 20) but eliminated in the third report:
"One issue moving forward is how much of this integrated generator testing is appropriate. The answer is not simple because the preparations to conduct such a test on a unit are extensive and the results continue to indicate that a unit properly tested at the component level does not exhibit problems at the overall unit level. The experience with this type of testing will continue to increase in the next quarter."
I am purely speculating here but could this be one of the reasons why for the first time NERC included the following advice on page 6 in their report?

"...the risk of electrical outages caused by Y2K appears to be no higher than the risks we already experience. Electrical outages may occur throughout the year due to severe wind, ice, snow, floods, earthquakes, and other natural events. Electrical outages may also occur due to equipment failures, traffic accidents, or a power shortage during an extremely hot or cold period. Electricity customers should review their risk exposure to everyday events that could impact electric service and historical experience with their service provider.

Customers should:
  1. Identify the possible impacts of a service interruption on their business or home and initiate actions necessary to assure safety and business continuity. Power supply decisions should be based on the risk exposure of a customer on a year-round basis, rather than the anticipation of any single event, such as Y2K.
  2. Check the Y2K information provided by your local electricity provider on the Internet or through literature mailings.
  3. Customers with electrical demands essential to safety and public well-being, such as hospitals; emergency services; public communications; gas, water, and sewage facilities; and hazardous materials handlers should review their emergency power supply provisions and procedures, and coordinate their needs with the local electricity provider.
  4. Large commercial and industrial customers that would be impacted by an electrical outage should review their emergency power supply provisions and procedures. Large customers who are contacted by their energy provider should cooperate with requests for information about plans for use of electricity during Y2K transition periods."
Items 2 through 4 appeared in NERC's second report but the lead-in paragraph and item 1 did not. The only comparable advice given in NERC's first report is the sentence "Think about the impacts of Y2K in your business or home."

NERC's recent advice is my own. Consider whether you should purchase a backup power generator for ongoing needs not just for Y2K. That's because Y2K outages are not a definite event, however outages from other causes are most likely definite but we do not know when they will next occur. Last summer, I lost electricity for four days during severe storms -- I shared this fate with tens of thousands of metro residents. I will cover my preferences about generators in a future article.

Another finding is that the plurality (46.8%) of reasons for exceptions is "Vendor availability" -- see NERC's third quarterly report, Appendix A. In my six-part review of the first NERC report, I expressed strong concern over the demand-supply imbalance of vendor support to handle all utilities in this short time frame. This Appendix validates my concern and I wonder if the very large utilities that directly report to NERC are having this problem, what about the smaller utilities who will most likely have "less pull" with vendors in this rushed period of time?

The last concern about the NERC report is found on page iv of their third report: "Over half (53%) of the bulk electric systems reporting to NERC indicate use of an external contractor to audit their Y2K program." I would like to know if my own utility is in that half and I would also like to know what were the findings in those audits -- there is zero discussion of that. How come? If it was good news, should we not have heard about it? The U.S. government has reportedly awarded NERC money to study the "information that is self-reported to NERC" -- the excuse that NERC did not bother to investigate this seems odd to me. Furthermore, I realize that NERC has pledged confidentiality of individual utility company reports, but when they wrote on page iv that "many state utility commissions are conducting reviews of electric utility Y2K readiness programs" -- why did they not inform the readers of which states by name are doing that? The citizens in those states would like to know that.

Since NERC did not provide the names of those states, readers can get some help by The National Regulatory Research Institute web page on "State Actions on Utility Preparedness for the Year 2000." If you don't find your state discussed there, consult the CIO Magazine survey of individual state Y2K readiness using their interactive map.

Before going on to read anything else though, be sure and read the ITA report raved about above. It will give you a better picture into what you should definitely prepare for. The reviewed NERC report covers a low but still a significant possible event to be concerned about. Statistical significance commonly starts at a 5 percent possibility and even page 6 of an earlier NERC Y2K co ntingency planning report assumes a "High probability of less than 10% loss of generation; low probability of 10% to 25%" that an "Increased risk of generator trips/near coincident unplanned outages" could have a "high impact; location dependent impacts" that "may extend days." (Remember though, NERC is expecting an excess capacity of 40 percent on January 1st -- local problems are particularly foreseen where and if regional outages are concentrated or significant.)

If you want to know why and what you should be preparing for, these two recent reports are worth studying.



-- Anonymous, May 10, 1999

Moderation questions? read the FAQ