Just What is "Mission Critical"

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

As I am fairly new to this Forum, please forgive me if I inadvertantly cover items that have already been hacked to death.

One of the toopics that keeps coming forward is the focus on systems that are "Mission Critical", yet something that an IT person or a PR manager sees as critical may be only incidental to an engineer or an operator. In our company we have taken all systems that may be considered critical (whether hardware, software, PLCs etc) and broken them down into 4 areas of criticallity.

Level 1: Generation will cease on rollover to 2000. Level 2: Generation will fail at some time in the future, but will still continue on rollover. Level 3: Generation will continue, but there may be a loss of information systems or data recording. Level 4: Generation, information and data will not be affected. Administration, and business applications may fail.

We discovered over 5000 critical items which had to be evaluated. Most were Y2K compliant, but there were 490 which may have had a date/time issue. If we could not determine whether the device/application was compliant then we treated it as though it would fail. *Most* of these have now been checked and either remediated or replaced. Our new SCADA is the last of the Level 1 items to be addressed. However not everything has been easy. One piece of equipment which we knew was not Y2K compliant was considered by management (but not by operations staff) to be a level 1 item. (It measured water flow through the station). We spent over $20,000 on an upgrade only to find that the upgrade did not work. The support from the local distributor of the equipment was very poor, and now it has been decided that perhaps it wasn't a critical piece of equipment anyway, as there are other and cheaper ways to measure the same water flow.

Another one is a piece of software that hasn't been used for almost a year. Our company policy says that it must be available for use (even though no-one actually uses it), So management have listed it as a Level 2 item. It actually has nothing to do with generation, yet I had to spend quite a lot of time checking the code (Visual Basic) to see that there were no date issues, then testing it to make sure it would rollover OK. It would have made much more sense just to delete the program from the network.

These are the types of issues that have slowed down our remediation, but on the brighter side, another week and we will be ready.

How have other facilities decided what is critical and what isn't? And how is progress going?

Malcolm

-- Anonymous, June 17, 1999

Answers

Malcolm,

Personally, I'm glad to see you posting here with information. I like your straightforward way of discussing what you are doing, with some good detail and examples. Could you tell me what the general feeling is within your own area regarding problems that could crop up. By this I mean, how confident are you that the Y2k exposure will no worse than some natural problem from lightning, wind storms, etc.? Are you looking at this as having a higher level of danger to the smooth supply of power? Are there plans within your company to have extra staff on site for the rollover, and if so why? Thanks.

-- Anonymous, June 18, 1999


"We are going to discover -- 'we', the government and the corporate world; 'we', employees and customers and suppliers; 'we', the general population as a whole and as individuals and families -- we are going to discover in the next twelve months just how realistic we are being now in our handling of non-critical systems. Or how unrealistic. Not just in how we are progressing in fixing them, but also in how we are determining what is mission-critical and what is not."

FAA's Progress and Mission-Critical Systems

Nefariously snatching brief quotes from a webpage, to horrify the uninitiated, yet providing a link for the truth-starved masses, I remain, cordially,

-- Anonymous, June 18, 1999


Malcolm, I am also grateful for the postings you have made here. The information you've supplied has been helpful and thought provoking. I can only speak about what I have read from various industry statements, such as filings to the SEC, or other reports. In those, most utilities defined a method by which they were classifying their systems, such as "mission critical", "important", and "non-essential" or variations thereof. However, to my knowledge, you are the first to mention any sub-classifications of critical systems. I would think it's quite probable that U.S. utilities did do a further breakdown of their general system classifications -- I just haven't seen any description of such. An example of one nuclear plant's overall definitions is as follows:

"Mission Critical:

Components, systems, or groups of similar devices that, in the event of failure, degradation, or other adverse impact due to Y2K, would significantly challenge the ability of the nuclear business unit to:

"Operate in a safe manner (including nuclear and industrial safety considerations), Provide service to customers (electrical distribution), Generate revenue (power generation) or control costs, or Avoid adverse regulatory or legal exposure

Important:

Components, systems, or groups of similar devices that, in the event of failure, degradation, or other adverse impact, would:

"Present unanticipated challenges to plant operators, Cause the NBU to incur remedy or recovery costs >$100,000 but <$1,000,000, Adversely impact compliance with regulatory requirements or commitments"

Non-Essential:

"Components or systems that provide a useful function but are not classified as either mission-critical or important."

Some utility definitions I've seen are more detailed, and others have been stated in a generic way. For instance, PGE's definition of "mission critical" in their SEC reports is, "Mission-critical functions are those critical functions whose loss would cause an immediate stoppage of or significant impairment to major business areas." The Southern company stated only that "Mission-critical refers to devices or software that are required to maintain operations." Of course, the breakdowns which are used in actual plant projects could well be much more detailed, but only someone working on the project would be able to give that info because internal details are not showing up in public statements.

There has been mention from various types of utilities which said they were proceeding in the same manner as your utility when it comes to being unable to determine the status of a system -- any unknowns are presumed to need replacement on the principal of "better to be safe than sorry". I have also seen some definitions of how systems were determined to be critical which include regulatory safety items; those which are required by law but which do not necessarily impact on actual generation. (Perhaps your company's "must be available for use" policy re that software would fit into that category.) And I have seen reports about the remediation process for utilities which did include "retiring" a system or deleting it from the program, as you mentioned, rather than trying to replace or repair it.

Overall, I would imagine that every individual utility has certain internal variations in how they chose to define their priorities after they inventoried systems. The date they began their project, the availability of funds and human resources, and the size and type of utility would all present different options to be considered. I haven't seen any reports about what group of people (or individual) was responsible for the final determinations. As you said, " something that an IT person or a PR manager sees as critical may be only incidental to an engineer or an operator". Certainly, an office manager in charge of the payroll system, purchasing, or accounts receivable would be looking at critical systems in a different way than an engineer involved with generation, or the legal rep responsible to the federal, state, or community regulatory oversight agencies. All of the different sections are going to have their own priorities, but in the long run they all have to be functioning to ensure the continued viability of the business. I'll bet there have been more than a few heated "discussions" in management levels, trying to decide how to balance the various business aspects when it comes to prioritizing!

-- Anonymous, June 18, 1999


Gordon and Bonnie,

Thank you both for your kind comments. I will continue to post information on matters that I have first hand knowledge of, and I shall ask questions about items that are of interest, but that I may not be so familiar with.

Here in New Zealand, we are not expecting are disruption to supply on rollover, but we are still taking precautions just in case something unpredictable does occur.

The company I work for operates two hydro power stations, 2 geothermal stations (I don't believe that USA has any of this type), 1 gas fired thermal station, 3 gas turbine stations, and is in the process of building a new combined cycle station. We provide around 28% of the countries needs and are the 2nd largest power company over here.

I have had personal experience in a number of hydro stations, but only direct involvement with one thermal station. I have also had extensive experience in our system control center on both generation dispatch and T&D. So I feel very confident that I do know New Zealand's power system.

Most of our hydro power stations are quite old, and do not rely on any form of electronic control. However, over the last ten years there has been a lot of modification and enhancement whereby the old stations have had elctronic and/or computer add-ons to allow for remote SCADA control. In each of these cases a failure in the electronics will not affect the power stations operation, but will just mean that the station will continue to run at whatever it previous load setpoint happened to be. A single operator on duty at the station can manually change the output at any time, or he can manually start or stop the generators. Our newer hydro stations do have some PLCs to control start/stop sequences etc, but even with these a failure will not trip the plant, but will simply lock in the previous setpoint. We have proved that manual operation is also easy (and even quicker than SCADA).

The Thermal stations are a bit different. In most cases a failure of any of the systems will result in the generators running quite happily at their previous load, but there are some items (such as burner management) that could cause the plant to trip. Naturally these items were the first to be investigated and, if neccessary, replaced.

Our transmission and distribution system is very robust, and most line protection relays are of the older elctro/mechanical type rather than electronic. SCADA is used for remote operation of most circuit breakers, but in every case the protection systems are completely independent of SCADA.

Our biggest fear is that the communications systems will not be as robust as the power system, and hence it may be difficult for co-ordinated dispatch instructions to be sent to the various power stations. If this happens then the power system may be less stable than is normal, and there could be larger than normal frequency and voltage swings. All operators are being reminded of (and are practicing) the methods of taking local control and independent action if required.

We will have every power station manned during the rollover, and even the local manual control rooms will be manned as well as our central control centers. Initially we were not going to allow any leave to be taken during the rollover, but we have relented slightly and have set a manning roster which will allow for full manual control of all sites and still allow some staff to have time off. My own hours of duty will be 7:00 am to 7:00 pm on Dec 31st and again on Jan 1st. (So I can't even party up during the night).

We have tested the black start capability of our stations, and as an extra precaution we will have two generators at each station running on "Speed-no load". That is to say that the generator will be running, but not connected to the grid. This way if there is any failure in the system then we can close these generators in and be supplying our local areas within minutes.

From what I have been reading here I am glad that our country is mainly hydro based which is a much simpler technology than required for your nukes.

Malcolm

-- Anonymous, June 18, 1999


Thanks to everybody on the thread.

I had inquired lately (rather clumsily, I guess) on at least one other thread as to why the same general rules (as exemplified in the CA White Paper) that would normally apply to all business operations don't apply to electrical utilities. I was told, basically, that I didn't know enough even to be asking such questions. I see now that others in the industry recognize that, in some ways, businesses are businesses are businesses.

-- Anonymous, June 19, 1999



NERC definition (from the monthly compliance survey found at NERC website):

"Mission-critical means that misoperation of the referenced device or software could directly contribute toward the loss of a 50 MW or larger generating resource, the loss of a transmission facility, or interruption of system load."

In other words, a fault recorder used to analyze operations cannot trip circuit breakers or cause load to be dropped, so it is not mission critical.

-- Anonymous, June 24, 1999


Moderation questions? read the FAQ