Are Testing Methods Good Enough?

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

Rick,
The following snips are from Central and Southwest (CSW) Y2K FAQ at http://www.csw.com/y2k/FAQ.htm. Are these types of testing good enough? The maddening part of this whole issue for me is, "How much and what kind of testing will prove that Y2K compliance has been reached?" I look at this CSW FAQ and see a lot of comments about "industry standards" and wonder if the testing is good enough that I can go back to focusing on my real business instead of preparing for Y2K failures.
[begin snip] General/Overall Test Strategy: CSW's test methodology includes a requirement, when technically and operationally feasible, to minimally test a system's ability to function properly when the following system dates are encountered: [end snip]
"when technically and operationally feasible, to minimally test" I know its difficult to test 24x7 systems, but does this mean that physical testing is least likely to occur on the most critical components?
[begin snip] Generation: The fossil generation teams are testing the systems in place (not lab bench). As part of the tests, the system date is set near the respective critical rollover date(s) and allowed to rollover. Results are monitored and compared to the expected results. To the extent possible, this is being done to simulate actual operation of the system(s). Our generating units operate on a 7X24 basis. They are being tested during regularly scheduled outages. Some will be tested at low load as they return on line after their scheduled outage. We believe our testing in power generation plants is consistent with what the industry is doing. [end snip]
They are testing fossil plants during scheduled outages, presumably leaving the testing of coal and nuclear on the lab bench. How do you test a plant even if it is down? I understand manual date manipulation of the EMS/SCADA software, but how do you adequately test the data collection units/sensors that are located throughout the facility? Are they assuming that if the date needs to be changed in the sensor that the controlling software will set the date? They say this is industry standard, so what is the industry doing? Are they really doing comprehensive tests? Are they chasing down all the data collection units and resetting the dates (if possible)? If they are testing on the lab bench, are they testing units and then putting them into service?
A earlier post about SWEPCO (part of the CSW family) made a interesting statement. >I was sent by my municipality to a y2k information >meeting that Southwestern Electric Power Company >was holding. They said that about 25% of their >embedded system work was done since they could >not take the plants offline until after the summer heat.
Does this mean that 75% of the embedded controls (G as well as T&D) are going to be tested next fall? If so, then how are they going to be "Y2K Ready" by June? Further, are all fossil generating facilities (60 units) going down for scheduled maintenance before June? Another source who buys wholesale power from SWEPCO said that they only planned TYPE TESTING BASED ON GENERATING UNIT DESIGN! So if fossil unit A is like fossil unit B, if fossil unit A is OK, then fossil unit B is OK. Is it industry standard to use identical maintenance routines/parts on similar plant designs?
The earlier post went on to say, >They told us about a central computer in Dallas/Ft. Worth >area that receives about 8000 data inputs every 2 seconds. >It communicates with 4 SCADA systems as well. They said >there is a backup computer that receives the same >inputs and can be instantly used to take over the first >ones job if it crashes. They said that they are >setting up a 3rd one that will be receiving and >processing the same inputs, but with the date >advanced forward exactly one year. It rolled over to >the year 2000 on Jan. 1, 1999.
Can you accurately test a SCADA systems such as described without rolling all 8000 input sources over to year 2000? What is the purpose of 3 redundant systems? In the space shuttle, they used to have 7 redundant systems running different software and collecting data from some shared and some independent sensors. The 7th system mediates disputes among the 6 others. That makes sense. But why put in three systems that collect data from the same sensors? How do you decide which of the three systems is telling you the truth? Testing should be able to force Y2K dates into any system for simple date checking making this setup nonsensical to me.
[begin snip] Electric Delivery: For transmission, distribution, substation and system protection we assess the risk of respective systems not being Y2K ready. If risk is moderate or high, then equipment by model/type is being field tested for Y2K readiness. We classify the risk as follows: High Risk- Not Y2K ready will cause loss of power-interruption to customers. Moderate Risk- Not Y2K ready will cause loss of system protection coordination or loss of revenue. Electric Delivery is also field testing all microprocessor based controls for Y2K readiness during routine scheduled maintenance. This is typical of our industry's testing methodology. We also utilize an Electric Power Research Institute web site that provides information from other utilities about the testing of their systems and equipment. [end snip]
Obviously they are type testing "microprocessor based" controls? Does this include PLCs as well? They are relying heavily on EPRI. Where does the EPRI data originate? Does it come from the vendor or from other utilities? Are testing methods posted at the EPRI site as well? How does a utility know that they can rely on the EPRI data?
[begin snip] System Control/Operation: The EMS/SCADA team has constructed a test lab that fully simulates the real-time EMS/SCADA environment. Live data are passed from the real-time system to the test system. Testing is being performed on that test system. We believe this method exceeds industry test standards. We are fully testing the EMS/SCADA system for Y2K readiness using live data, which more closely simulates the real-time system than static testing would be able to do. [end snip]
I assume that they are using real data and changing the date! Does industry standards include simulations of sub-system failures, i.e. what happens when unit X fails and the system is loaded in a certain way?
[begin snip] Part of our inventory and assessment process is to identify equipment that may contain embedded systems. This is accomplished primarily by contact with the equipment vendors to obtain their recommendations on testing/replacement. We depend on the vendor to be source of the Y2K ready replacement part (embedded system) if one is needed. If there is an external output available to change the dates on the system or component we use that for testing the minimum set of dates as follows:
Each Y2K Readiness Team has the responsibility to assess the Y2K status of their computer systems or components with the respective vendors. It has been a major portion of their inventory and assessment process. All critical systems have been inventoried and assessed to determine which have Y2K exposure, and remediation plans were developed in accordance with those findings. [end snip]
Is it industry standard to call up the vendor to assess Y2K compliance? [begin snip] What are your findings so far with regard to generation? Inventory and assessment of all critical systems is complete and remediation and certification processes are well underway. The assessment findings have revealed that about half of the generation units have pneumatic and/or analog controls and therefore have no date functions, which significantly reduces the remediation and test effort. Risk is low in the Control Systems operation since all critical functions are either certified by the vendors or have fixes available. Less than 15% of the installed controls in the transmission and distribution systems have micro-processors, and very few have date logic (95% of those with date logic process new century dates correctly).
Provide specific information about your plans for addressing issues with transmission lines, substations and switchyards, and any other assets that may directly or indirectly impact the power supply. High risk equipment includes relays, reclosers, and circuit breakers. Although the date information is used only for date history recording of operations, we are testing various models of this equipment for verification. The date function does not affect the operation of the equipment. Models tested thus far have all processed the date functions correctly. Contingency plans for critical systems are in development. We have received Y2K ready letters from Schweitzer Engineering Labs (SEL) and General Electric GE for the microprocessor based relays. We have tested the GE digital line protector (DLP) relay with good results. Metering equipment records dates only for historical purposes and does not involve the operating function of the device. [end snip]
Do you think that embedded systems with time based calculations are on the radar screen? Surely they are, but I find not one mention of them here. Does this mean that most sensors are analog? I'm not an electronics expert, but I've tinkered a bit, designed and built a computer-based controller for a strictly manual machine. That being said, I'm not sure that any digital sensors can operate without a sense of time either based on an oscillator and a counter or on real time clock ticks. Most use a sense of time to average samples, e.g. A/D Converters. Are tests being done to check for delta time values going negative or zero?
What can account for the non-frantic activity by utilities to certify their systems? Do the assumptions as expressed here by CSW data mean that utilities presume that if outages cascade across the system that no significant damage has been done to the system so that restarts are possible? Do they know that the under-frequency sensors will in-fact take the generators off line instead of leaving them on to implode under increased load? Dittos for interchanges and substations with respect to transformers? It look like that if one malfunctioning device could damage a generator or a large substation, they would want to be sure that each device operates properly (not type testing). Furthermore are system designs "Y2K friendly"? For example, will an under-frequency relay depend on a central command to disconnect a load or will it act on its own? I assume that it would act on its own, but if it does not, then another level of system testing complexity is added.
This is my big question. I have NO doubt that the electrical utilities that are very dedicated and sincere about their Y2K projects. However, it appears to me that despite all their best efforts, they are not taking the problem seriously enough to tear down each piece of equipment and test it. Is this type of testing required? If not, then isn't our basic assumption on this forum flawed? If it is necessary to test each piece of equipment, then it is pretty obvious to me that we are going to be doing most of the testing in 2000. The utilities certainly don't seem to be doing the testing now. Mostly we are getting "It is OK because GE said so" or "Its OK because we tested a couple of them" or "This generating unit is OK because that one over there is OK." Should we be alarmed at this behavior or reassured?
I'm getting pretty tired of this David

-- Anonymous, February 02, 1999

Answers

[begin snip] General/Overall Test Strategy: CSW's test methodology includes a requirement, when technically and operationally feasible, to minimally test a system's ability to function properly when the following system dates are encountered: [end snip]
"when technically and operationally feasible, to minimally test" I know its difficult to test 24x7 systems, but does this mean that physical testing is least likely to occur on the most critical components?
Technically and operationally feasable can mean several things. To me, the devices with embedded chips that have no date function and no means for an external device to accept a date from or send a date to the device are no feasible to test. The oft-cited paper on embedded chips addresses these. In general it is impossible to make generalizations about criticality. I personally have not found any of these that would be critical. At any rate, these type devices would, in my opinion, most likely appear as random failures, not related to the Y2K critical dates. What say you Rick?
They are testing fossil plants during scheduled outages, presumably leaving the testing of coal and nuclear on the lab bench. How do you test a plant even if it is down? I understand manual date manipulation of the EMS/SCADA software, but how do you adequately test the data collection units/sensors that are located throughout the facility? Are they assuming that if the date needs to be changed in the sensor that the controlling software will set the date? They say this is industry standard, so what is the industry doing? Are they really doing comprehensive tests? Are they chasing down all the data collection units and resetting the dates (if possible)? If they are testing on the lab bench, are they testing units and then putting them into service?
Speculation, but I dont think that many of the sensors in a fossil plant will be date-aware. Probably mostly A/D or straight analogs. It sounds to me like they are testing the brains and those intelligent devices that it sets the date for. Our generations DCS tests were on the bench and ongoing beyond 2000 not put in service. The benefit of bench testing is reducing on-line risk.
Can you accurately test a SCADA systems such as described without rolling all 8000 input sources over to year 2000? What is the purpose of 3 redundant systems? In the space shuttle, they used to have 7 redundant systems running different software and collecting data from some shared and some independent sensors. The 7th system mediates disputes among the 6 others. That makes sense. But why put in three systems that collect data from the same sensors? How do you decide which of the three systems is telling you the truth? Testing should be able to force Y2K dates into any system for simple date checking making this setup nonsensical to me.
In a SCADA system the 8000 inputs are mostly contact in/outs and analogs (not Y2K type of digital). Only very few devices communicate electronically, and these are going to be in cutting edge substation automation applications. These are few in number compared to the older contact/coils controls.
Obviously they are type testing "microprocessor based" controls? Does this include PLCs as well? From the language they used Id say YES.
They are relying heavily on EPRI. Where does the EPRI data originate? Does it come from the vendor or from other utilities? Other utilities test results. Archives of vendor tests and certifications are also available, but I will be using these only for non-critical and for additional support to my testing on critical (you cant have too much due dilligence). Are testing methods posted at the EPRI site as well? How does a utility know that they can rely on the EPRI data? Many excellent test plans are available there. Some are collaborative efforts of EPRI member utilities. Most all are based on GM test plan. Good methodology, many dates beyond just Y2K, and also verify no bogus dates are accepted.
Do you think that embedded systems with time based calculations are on the radar screen? Surely they are, but I find not one mention of them here. Does this mean that most sensors are analog? I'm not an electronics expert, but I've tinkered a bit, designed and built a computer-based controller for a strictly manual machine. That being said, I'm not sure that any digital sensors can operate without a sense of time either based on an oscillator and a counter or on real time clock ticks. Most use a sense of time to average samples, e.g. A/D Converters. Are tests being done to check for delta time values going negative or zero?
The relays mentioned are a transmission protection and a vendor with a wide range of protection types. Neither use dates for protection. Both can have date set by local pc and are easy to test thoroughly. Separate functions use dates for SER and fault locating funcitons. Not even remotely related to protection.
PS: I'm getting tired of this too!

-- Anonymous, February 02, 1999

Moderation questions? read the FAQ