Electric Utility Independant Verification - Been there, Done that

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

A lot of good, factual information regarding electric utility test results has been provided here by engineers actually involved in the test/remediation program. For the most part, this info (and the summary data reported to NERC) has been ignored or ridiculed due to the lack of independant audits to validate/verify the results.

I contend that independant validation/verification has been achieved to a large extent. Large utilities that participated in the EPRI Y2K program have performed isolated, independant testing. The EPRI website database has provided a forum for the utilities to review/discuss best test practices and view test data obtained by other utilities in totally independant tests. For each critical device that I have tested, I have downloaded and stored test procedures/results from other utilities that corroborate my results. In addition, vendor test results are usually also available.

Independant corroboration by technical experts in the same field are of greater value than hiring a consultant to review test results. I doubt I could find a consultant that was as technically knowledgable the vendor design/test engineers and my peers at other utilities.

A final note on independant audits. The NRC performed independant audits of nuclear stations and found positive results. Has this made an impact of public perception of nuclear plant readiness? Based upon the feedback on this forum I would have to say NO. Why should I spend money on a consultant to review my test program? (I can hear the cries "who paid the 'independant auditor' - the utility, they are just another flunky for the utility management paid to put out the same old 'spin'". If you want an independant audit of my test program - you pay for it.

-- cl@sky.com (cl_sky@excite.com), November 17, 1999

Answers

[Fair Use]

As part of the Y2k readiness reporting process, NERC has provided four quarterly reports to DOE on the status of Y2k efforts in the electric power industry. The most recent report was delivered to DOE on August 3, 1999. A copy of that report may be found at the NERC web site at http://www.nerc.com/y2k/. This latest NERC report provides these findings and others:

On-site Review Process DOE contracted a team of independent consultants to perform on-site audits at a randomly selected sample of electric power organizations. The 36 entities selected represent slightly more than 1% of the 3089 organizations in North America. The on-site review teams conducted their audits during June and July 1999 and prepared this report based on their findings. The audits included interviews with Y2k program personnel, review of Y2k program documentation, and rerunning of selected Y2k tests in the presence of the on-site review team.

The results presented in this report are intended to be representative of the industry and are not intended to disclose the status of an individual organization. The results are presented in detail for each site visit. However, the identity of each organization participating in the on-site reviews is masked.

Below, I have listed a few of what I demed to be the most significant "Highlights" from this report. Enjoy.

(page 20)

"* 50% of the large municipal utilities and 40% of the cooperatives implemented a low level test method, usually only involving the December 31, 1999 to January 1, 2000 rollover test with power on. These utilities were either unaware of their potentially problematic dates or thought that this level of testing was sufficient evidence that the tested devices would not encounter Y2k problems."

(page 22)

"No utility reviewed was determined to be using all test dates known to the on-site review team."

(page 23)

"Testing methods depended greatly on the sophistication of the entity reviewed."

(page 26)

"Remediation and testing encountered difficulties, principally as a matter of vendor reliance rather than flaws in the program. Remediation gaps were found primarily in SCADA and customer information systems where vendors slipped on their delivery and installation schedules, moving readiness into the third quarter of 1999. Testing difficulties were either a matter of over reliance on the vendors' testing or a lack of clarity of the suitable testing methods necessary to verify vendors' claims."

(page 38)

"* Of the five municipals that were assessed at level 2, two reported in March that they would not be Y2k ready by June 30. [Self-reported data and on-site assessment were in agreement.]"

"* The third municipal of the five receiving Level 2 ratings indicated 85% completion in the APPA data in March with no delayed readiness noted. This same utility, however, reported directly to NERC and indicated itself Y2k Ready with a Limited Exception."

"* The fourth municipal receiving a level 2 rating was listed in the June APPA survey data as being 100% complete with no delayed readiness. At the final on-site review presentation, the general manager of the utility stated that he disagreed with the Level 2 rating because the utility's contingency plan covered the possible unavailability of the SCADA. This did not conform to the criteria for the on-site review, however."

"* The fifth municipal of the five receiving a Level 2 rating showed 100% completion in March with no delayed readiness noted, and was not listed in the June survey. The reason for the Level 2 rating by the on-site review team was an issue discovered during the on-site review (failed to test an aspect of the SCADA hardware.) Many utilities expressed concern about the possibility of missing something in their Y2k readiness process, but this was the only instance during the 36 reviews in which a utility was found to have missed a test of a mission critical nature. [Discrepancy between self-reported data and on-site assessment due to incomplete testing.]

(page IV-E-3)

"* All equipment that could be tested was selected for testing by Utility E personnel. In case of non-digital equipment, which is not testable, the utility relied on manufacturers' information."

"1 It was noted that transition from year 1999 to 2000 with power off was not executed. This led to scheduling of some retesting which is expected to be complete before 6/30/99. 2 Leap year not tested for a capacitor bank controller. This will be tested."

(page IV-E-4)

"* There have been no internal or external audits of the electric department's Y2K program."

(page IV-E-5)

"* The existing plan is for load shedding based upon discussions with their major customers. It needs to be expanded for to include the following Y2K considerations.

- Cooling water is available to run generators until power can be made available to Utility E's water system to resume supply of water.

- Diesel fuel is available to run the generators with 100% diesel fuel for about 3 days and storage of additional diesel fuel is being discussed (3). The fuel supplier is 45 miles away."

- Availability of natural gas would extend the horizon for self- generation. The diesel fuel would last a week if gas were available. The gas supplier is 15 miles away.

- Load shedding (4) is planned if the generation capacity is insufficient to cover an extended outage or if peak load exceeds generation capacity.

- Some capacitor banks for reactive power compensation may be taken out of service as a precaution to avoid irregular switching of the banks if the controllers malfunction in spite of having passed year 2000 tests.

- Communication system for internal communication is available even if the phone system is inoperable, but communication would be severed without public telephone (5) connections being available. However, a radio link to the power supplier is available from a cooperative two blocks away."

"(5) Overload of the public phone systems is a recurring phenomenon in other types of disasters and could be likely for the Year 2000 transition."

(page IV-E-6)

"* Two pieces of equipment were retested. One was a totalizer, which is used to check the loading of the system prior to a black start following a major outage. The other was a capacitor switch controller. Some expanded testing was done on these systems that uncovered some idiosyncrasies in the software."

"* Will rerun the Y2K tests based on the testing discussed in 4.3, and shared the Superintendent's concerns as noted above."

(page IV-AJ-2)

"* An outside consultant was hired to help with the project. They did an inventory, which was matched against the inventory done by plant personnel. This did not work well (inventories did not match up.)"

(page IV-AJ-3)

"* Inventory items were prioritized A, B, or C. Any device or system deemed to be mission critical was a priority A. All A devices and systems were tested on-site or for those that could not be tested on- site, the vendor was hired to develop and conduct tests at the vendors facilities. The vendor testing was witnessed by plant personnel. The EPRI database was also used to some extent to determine Y2K status of certain devices."

-- GoldReal (GoldReal@aol.com), November 17, 1999.


I do not recall seeing ANY single date test procedures on the EPRI database. These are the big players with the resources and knowhow to do the testing. Some typical test dates (some utilities used more, some less), most available from the GM test plan that was the basis of most utilities customized test plans:

Rollover - 1999 to 2000 Power on Rollover - 1998 to 1999 Power On Rollover - 2000 to 2001 Power On Rollover - 2027 to 2028 Power On Reboot Date Retention Rollover Test Power Off - 1999 to 2000 Manual Date Set Test 1Jan 2000 Manual Date Set Test 29 Feb. 2001 Manual Date Set - 29 Feb. 2000 Rollover 2/28/2000 - Power On Rollover 2/28/2001 - Power On Rollover 2/28/2004 - Power On Leap Year Reboot 2/29/2000 Leap Year Reboot 2/29/2004 Rollover 2/29/2000 - Power On Rollover 2/29/2004 - Power On Rollover - 09/08/99 to 09/09/99 - Power On Manual Invalid Date Entry - 00/00/00 - Power On

Smaller utilities and cooperatives use the same embedded devices (protective relays). These are closed system devices and tests by one utility verify them for all since there is no chance for the user to introduce errors at the application level.

Municipals and coops should test all "open" devices thoroughly that can be user programmed at the application level (hence providing the opportunity for a programmer to introduce a date dependant error).

-- cl@sky.com (cl_sky@excite.com), November 17, 1999.


You might be right, I hope you are. The problem is NERC is a MARKETING organization. So they do what marketing organizastions do, and lie. They got caught in lies. The fact that they have little to do with the reality of fixing is irrelevant to those who want to know reality and are being informed by NERC.

As for NRC, I've read most of those audit reports - pretty thin gruel. Couple of years ago, the former CEO of an organization that makes nuclear devices had me meet with their team of engineers on Y2K. They had found problems, reported them up the chain, been told that was the customers' problem, and their lawyers weren't letting them tell the customers. Pretty gloomy group of engineers. Not to say that is the situation now.

People are getting very suspicious of spin. That said, there has been more pressure on the electric industry over the last couple of years than any other industry except finance and banking. I expect some places will lose power, but after 2000 I don't think power is what we will be worrying about.

-- ng (cantprovideemail@none.com), November 17, 1999.


The purpose of true computer/software systems-level testing isn't to convince me, the public, the regulators, or anybody else that the testing was done.

The purpose of thorough software/system-level testing is to FIND PROBLEMS so they can be eliminated BEFORE actual use. Then, you retest to find additional problems.

The scope and purpose of the grid testing was to provide two public relations "drills" on key dates to allow positive PR to be released from pre-staged arenas. There has been no other publicly released testing of the grid, its distribution centers, or otehr inter-utiliity functions.

There has been almost NO publically released testing of any fossil power plants, and what has been released of the nuclear testing - indicates that, where systems are tested (after remediation) they break. Must be re-repaired, then (sometimes) break again.

I agree, the individual items that you may have tested may have worked under the specific conditions that you tested them under: no dispute. What you have not tested may (or may not) work. That they might work under actual conditions remains open to conjecture and assumptions about the quality of your testing simulations.

So, the burden of proof is on you to show you have tested everything, and that the system (as a whole) will work. The burden of proof (if I have prepared for disturbances) is on you to show the grid will oeprate corectly and reliably, not on me to show that it will fail.

----

I don't want an independent audit of your test program. I want your system - as an integrated whole - to work properly and reliably.

What has been found is that an independent audit will (1) cause you to work more carefully and justify your assumptions more clearly to the auditor. (2) Actually perform tests in a way tohey can be checked - which tends to find more trouble spots. (3) Present your information more carefully so errors can be detected and prevented. (4) Standardize your test procedures and accounting methods of the results - this tends to prevent erros and allows checks of your results against other similar facilities. (5) Gives you credibility to your supervisors and clients......

-- Robert A. Cook, PE (Marietta, GA) (cook.r@csaatl.com), November 17, 1999.


Robert,

Attemt to answer your post. Sorry for the lack of sophistication in formatting. Here goes...

The purpose of true computer/software systems-level testing isn't to convince me, the public, the regulators, or anybody else that the testing was done.

>> Yes, this problem has always been an engineering problem. I am convinced, you are not. You will not accept my data, question my thoroughness and ability, and some even my honesty and integrity. The purpose of thorough software/system-level testing is to FIND PROBLEMS so they can be eliminated BEFORE actual use. Then, you retest to find additional problems.

>> Yes, this has been done in almost all cases.

The scope and purpose of the grid testing was to provide two public relations "drills" on key dates to allow positive PR to be released from pre-staged arenas. There has been no other publicly released testing of the grid, its distribution centers, or otehr inter- utiliity functions.

>> You are absoloutely WRONG on this count. The drills were always unrelated to testing. Nothing was pre-staged except to install, troubleshoot and train drill personnel on the satellite phones, and communications protocols.

There has been almost NO publically released testing of any fossil power plants, and what has been released of the nuclear testing - indicates that, where systems are tested (after remediation) they break. Must be re-repaired, then (sometimes) break again.

> As you stated above, that was (and in a very few cases IS) the entire purpose for testing in the first place. Does this surprise you? Arent tests where the results are pre-ordained public relations drills? Of course things needed repaired, and the remediated systems re-tested. This has been completed. We both should be happy.

I agree, the individual items that you may have tested may have worked under the specific conditions that you tested them under: no dispute. What you have not tested may (or may not) work.

>> I have tested 100% of my mission critical equipment (my working definition= equipment capable of causing an outage or injury). My utility has completed all but one F&H unit that is scheduled for outage soon. This unit has a nearly identical sister unit that was completed and the lessons learned so all should be relatively smooth.

>> I have tested 100% of my non-mission critical and have remediated all but 1 type of device (this is nearly complete  even though the recorders wont fail until the 2000  2001 rollover). This is similar for other utilities as reported to NERC, and from conversations with peers at EPRI conferences.

That they might work under actual conditions remains open to conjecture and assumptions about the quality of your testing simulations.

>> No, it is open to YOUR conjecture. My test conditions, and those industry wide, simulated real power system faults. No auditor could do this. If my methods were inadequate, my peers in the industry (through EPRI database review), and my supervision would not have been shy in pointing this out to me.

So, the burden of proof is on you to show you have tested everything, and that the system (as a whole) will work. The burden of proof (if I have prepared for disturbances) is on you to show the grid will oeprate corectly and reliably, not on me to show that it will fail.

>> I have provided the proof via NERC and EPRI. This has been reported to all. There have been insider reports here. The data is discounted because of a perceived lack of independent verification. That was the entire motive for this post  there HAS been independent verification. ----

I don't want an independent audit of your test program. I want your system - as an integrated whole - to work properly and reliably.

> We both should be happy then. All should be happy, at least with electrical systems.

What has been found is that an independent audit will (1) cause you to work more carefully and justify your assumptions more clearly to the auditor. (2) Actually perform tests in a way tohey can be checked - which tends to find more trouble spots. (3) Present your information more carefully so errors can be detected and prevented. (4) Standardize your test procedures and accounting methods of the results - this tends to prevent erros and allows checks of your results against other similar facilities. (5) Gives you credibility to your supervisors and clients......

> Any utility insider who believes they will establish credibility in this forum is nuts. The culture of the electrical utility engineer is very conservative, we have backup coffee mugs just in case of a 2nd contingency. We have reviewed one another, EPRI has reviewed us, NRC has reviewed us, we have conducted internal audits. What more???

-- cl@sky.com (cl_sky@excite.com), November 17, 1999.



I am not an engineer, but I write and edit. I know that you can take a fine and flawless essay and replace a few significant words (sometimes even insignificant ones) with what seem to be perfect synonyms, and when you're done you no longer have a fine and flawless essay, but a clumsy work of ambiguous meaning. Every word interacts witih every other in ways which can neither be described nor understood. Read the Gettysburg Address. Try to make a single change in it without making it worse. Follow my metaphor here? I don't care how much IVV was done. When you put those remediated compliant pieces back together again, they aren't going to work right. How about we do full end to end testijng of everything, all at once. January one sound okay?

-- StanTheMan (heidrich@presys.com), November 17, 1999.

I know absolutely nothing about airports or how to maintain them; I'm a homemaker and a registered nurse. A friend of mine, however, is the director of maintenance at our local airport. (City size is 250,000) After he assured me about the compliance of the airport, I asked him questions about specific areas of the airport that he might not have considered. Within two minutes, I had named an area which had not occurred to him to check for compliance. Also, he did not know at that time that the FAA was non-compliant, as he had been assured by the local rep that the FAA was fine.

Lesson? Get an outside auditor. If you're in the job, you probably can't see the forest for the trees.

-- Ann M. (hismckids@aol.com), November 18, 1999.


Moderation questions? read the FAQ