BC Hydro has 250,000 date sensitive devices, yet only 333 didn't work. Should we be worried by these logistics...

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

That's an awful lot of devices, 250,000. Multiply that by the 8000 or so utilities in the US alone, and hey, we're talking serious numbers. I hope they didn't miss any of them...

-- samIam (dr@seu.ss), November 24, 1999

Answers

333 "didn't work." But the real-time real-life test has not occurred yet. DON'T FORGET Year 2000 WILL BE ACTUALLY ARRIVING SHORTLY.

-- Fasten Your Seatbelts (allaha@earthlink.net), November 24, 1999.

Cost and test prohibitive to test 250,000 embeddeds. Did they rely on manufacturers statements? Probably.

Please refer to the NIST paper on a preceeding thread. It stated that these statements should be relied upon only as a last resort. Also, the system needs to be tested end-to-end because , of course, data sharing among embedded devices.

-- PJC (paulchri@msn.com), November 24, 1999.


spell check: Cost and time prohibitive .....to test 250,000 embeddeds.

-- PJC (paulchri@msn.com), November 24, 1999.

That is ~31 per utility. I think they can be fixed in time.

-- Laura (Ladylogic@aol.com), November 24, 1999.

"I think they can be fixed in time." Sure, if you KNOW WHICH ONES NEED FIXING!!!!!!! Ever hear the expression, "needle in a haystack"?????

-- King of Spain (madrid@aol.cum), November 24, 1999.


It is so refreshing to witness the results of public school education. But math is not exact. It is what you feel it should be. There are no right answers or wrong answers, just answers.

Laura, you should submit your services to find the "~31" you claim. This could be a real feather in your dunce cap.

-- enough is (enough@enough.com), November 24, 1999.


If that is all that actually failed - under realistic testing conditions, then that is all that need be replaced.

The 3% failure rate rule-of-thumb is not a law, merely an average across all industries.......

If - however - these 333 were only replaced based on vender data - reference databases are changing regularly even as late as last week - at a rate of 12-16% total listed, then they might be facing difficult times.

Remember - all devices and processes face the date change; there is no way to avoid it. What is hard to test (with 100% confidence) is what the result of the date change is going to be on all the devices affected.

Some will fail unexpectedly, some will work okay - only to face failure due to "outside impacts" from other processes; and some will fail internally. If testing is incomplete at any stage of the affair - then the user is more likely to face problems.

Above all, remember that it is the entire process - from beginning to end - that must work efficiently for any company to stay in business over the period: failure in the middle isn't going to improve profits, and will increase the likely of recession/depression do to reduced productivity/poor productivity/no productivity.

-- Robert A. Cook, PE (Marietta, GA) (cook.r@csaatl.com), November 24, 1999.


As for the logistics I am pretty sure UPS is compliant or close. Could take a while for the driver to get a bicycle up to you with those critical chips.

Got Gas?

-- squid (Itsdark@down.here), November 24, 1999.


Laura,

There are about 3,200 bulk power producers in the United States. Of these, hydro power plants are the most simple in terms of design and operation. But since you're a Polly, let's say each of the 3,200 bulk power producers have only 31 "mission critical" embedded systems per plant that must be replaced.

Do the math:

3,200 x 31 = 99,200 (embedded chips in mission critical systems that will fail if not replaced)

Now, apply the "human factor" = the part of the equation that says "humans are not perfect". Say 1% of these chips are 1) never physically inventoried (didn't know it was there), 2) listed as "compliant" when in fact, it isn't (someone screwed up), 3) botched in remediation/replacement.

Do the math:

99,200 x .01 = 992

Now we have 992 "mission critical" embedded systems that fail on or about 2000/01/01. That's about 1/3 of the bulk power production in the United States. When these fail, the grid will become unstable and trip the remaining 2/3 of our bulk power producers off-line.

It's a fact that the United States has never had to "black start" all four grids. I wonder how simple that will be?

Now, say the telecommunications industry is statistically the same as the bulk power producers. (This is unrealistic, I know. They are worse off, from all reports I have read.) When the Telco's lose power, they will have to go to back-up. This will take time. Now, add these two factors into the problems the bulk power producers will be dealing with when the grid goes down.

And finally, let's say 1% of all 66,000 toxic chemical production/holding facilities in the United States have a "mission critical" embedded systems failure resulting in the release of clouds of toxic gasses requiring the evacuation of anyone within 5 miles of the toxic gas release.

Do the math:

66,000 x .01 = 660

Let's say 20 bulk power production plants have to be evacuated because they are located within 5 miles of the toxic gas clouds. Also, let's say 40 Telco's have to be evacuated at the same time. More problems for the bulk power producers.

Laura, this is a "Best Case" scenario.

Get it?

-- GoldReal (GoldReal@aol.com), November 24, 1999.


KOS:

They've already *found* those needles. Where do you think these numbers came from? Guessing?

I believe these are counts of *total* systems not counts of *different* systems. So for all we know, those 333 systems that "didn't work" were 333 of the same device. My guess is that it represents a fairly small number of different devices. If a fix is developed that works for one, in most (not all) cases the same fix can be applied to every copy. The task is not at all insurmountable.

I too hope they didn't miss anything important.

-- Flint (flintc@mindspring.com), November 24, 1999.



Lets say they found 333 different devices that didnt work. Nothing in the statement said they were critical devices. They could have 100 DFRs and 233 SERs. Nothing there that is critical. We found DFRs and SERs that didnt work in terms of dates. Worked fine in terms of information.

-- The Engineer (The Engineer@tech.com), November 24, 1999.

Yes, Flint, there is A LOT of guessing that goes into these sorts of efforts. Like the ESTIMATE that there are 50-70 billion chips floating around. Like the ESTIMATE that AN AVERAGE of 1-3% will fail. That's life in the "nobody knows" lane, dude.

OK, so here it is, the day before Thanksgiving 1999, and the power industry is STILL at the level of fixing/replacing these buggers. But I'm sure that doesn't bother you, does it?

It bothers a lot of people. Got candles??

-- King of Spain (madrid@aol.cum), November 24, 1999.

Flint wrote:

KOS:

They've already *found* those needles. Where do you think these numbers came from? Guessing?

I believe these are counts of *total* systems not counts of *different* systems. So for all we know, those 333 systems that "didn't work" were 333 of the same device. My guess is that it represents a fairly small number of different devices. If a fix is developed that works for one, in most (not all) cases the same fix can be applied to every copy. The task is not at all insurmountable.

I too hope they didn't miss anything important.

-- Flint (flintc@mindspring.com), November 24, 1999.

I don't know about the believing, but I *think* most of the forum is guessing and hoping with you.

-- (cujo@baddog.byte), November 24, 1999.


Folks you are getting your chain jerked around by Sam (who ever he is)

In a previous thread from yesterday posted by "Sam" Factfinder posted a responce from BC Hydro to clearify a few points. (This is on Sams thread so he should have this information.)

A snip

 BC HYDRO Ad - don't worry, we'll have staff on hand to operate MANUALLY if necessary....

3. Does the 220 (223) represent total ("high impact"?) devices remediated,
       or does the 223 represent device types? If types, how many devices?

       BC Hydro The 223 figure represents physical units remediated, not types.

4. Does the 8, 000 figure represent both mission critical and non- mission
       critical devices? (also is this total devices, or device types)

       The figure represents the approximately 8000 devices, out of our total
       inventory of 250,000 devices, that have a date-sensitive component. It is
       total devices, not device types, and is a sum of both mission critical and
       non-mission critical devices.

-- Brian (imager@home.com), November 24, 1999.


GoldReal, you wrote "Now we have 992 "mission critical" embedded systems that fail on or about 2000/01/01. That's about 1/3 of the bulk power production in the United States. When these fail, the grid will become unstable and trip the remaining 2/3 of our bulk power producers off-line.

It's a fact that the United States has never had to "black start" all four grids. I wonder how simple that will be?"

But you have failed to take into consideration the fact that not one single embedded device found by BC Hydro would have resulted in any plant tripping off line. So can you please explain why 1/3 of the power production will fail.

Next, you have failed to consider how much spinning reserve will be carried during the rollover. We will be operating on 50% that night, but I believe USA will be on 25 - 30%. This means that the grid could lose up to 30% of all generation and still maintain regular supply. If more than this was to trip then there would be some loss of supply due to load shedding. So even if your postulation that 1/3 of the grid would trip, that does not mean that the other 2/3 WILL trip as well.

On your question of Black Start, it is not a difficult procedure to black start any grid, but it can be time consuming to ensure that each part is brought on in the correct order, and with the correct load balance. I have been involved in an effective, unplanned, black start, and I have assisted with the planning for two potential black starts. (one of them may occur in 37 days, but we are confident it wont happen).

Malcolm

-- Malcolm Taylor (taylorm@es.co.nz), November 24, 1999.



Malcolm,

"This means that the grid could lose up to 30% of all generation and still maintain regular supply. If more than this was to trip then there would be some loss of supply due to load shedding. So even if your postulation that 1/3 of the grid would trip, that does not mean that the other 2/3 WILL trip as well. "

That is exactly what the representative from Arizona Public Service (APS) said at the AZ. Millennium meeting last week. He asked us to leave our electricity on so that we wouldn't trip the grid. He then echoed everyone else stating that Y2K will be a "people" problem. Funny, it seems to me that people without electricity, and safe water WOULD make it a people problem. Which came first? The chicken or the egg...

-- (Ladylogic@aol.com), November 24, 1999.


LOL! LOL! LOL!

This thread is really a keeper. And I mean, as in print it out in hardcopy form. Paper you can count on.

Here it is, the day before Thanksgiving 1999, and the pollies are talking about doing a black start of power grids as nonchalantly as you would talk about carving a turkey. I mean, thirty-seven frigging days. Does that mean anything to anyone??

In a word: gawd.

-- King of Spain (madrid@aol.cum), November 24, 1999.

KOS:

These estimates are meaningful, though vague, if properly interpreted. First, you give an estimate of number of chips. Then, you give an estimate of percentage of *systems* that will fail. Each system contains many chips. Finally, you apply system failure percentages to chip estimates and come up with something awful.

Testing has shown that power will stay up. Much as you hate to hear it, you're going to have to learn to live with it. Reverting to the misapplication of 2-year-old pre-remediation estimates ought to suggest the weakness of your case.

-- Flint (flintc@mindspring.com), November 24, 1999.


Malcolm,

You wrote:

"But you have failed to take into consideration the fact that not one single embedded device found by BC Hydro would have resulted in any plant tripping off line. So can you please explain why 1/3 of the power production will fail."

My response:

1) Then the BC Hydro plant doesn't have any "mission critical" embedded systems. This is unrealistic.

2) Malcolm, please define "mission critical". No, I'll quote it for you.

Here is the definition of "Mission Critical" as defined by the North America Electric Reliability Council in their August 3, 1999 report:

"Mission Critical - Mission critical describes a system, component, or application whose misoperation could directly contribute toward the loss of a 50 MW or larger generating resource, the loss of a transmission facility, or interruption of system load."

You wrote:

"Next, you have failed to consider how much spinning reserve will be carried during the rollover. We will be operating on 50% that night, but I believe USA will be on 25 - 30%. This means that the grid could lose up to 30% of all generation and still maintain regular supply. If more than this was to trip then there would be some loss of supply due to load shedding. So even if your postulation that 1/3 of the grid would trip, that does not mean that the other 2/3 WILL trip as well."

My response:

The grid is super-sensitive to flucuations, according to the Senate 100 Day Report. Should 1/3 of the grid "suddenly" go down, it WOULD cause massive flucuations in the grid causing the remaining 2/3 to trip off line.

As for the "reserve" power, it is just as susceptible to "critical mission" embedded systems failures as the primary power sources. Logically, if the "back up power supply" is MORE reliable than the primary supply source, you would be using the back up supply as your primary source and your primary source as your back-up. Further, the bulk of testing and remediation efforts have been spent fixing the primary power sources. Logically, the back-up power supply has not received the same level of scrutiny as the primary sources.

You wrote:

"On your question of Black Start, it is not a difficult procedure to black start any grid, but it can be time consuming to ensure that each part is brought on in the correct order, and with the correct load balance. I have been involved in an effective, unplanned, black start, and I have assisted with the planning for two potential black starts. (one of them may occur in 37 days, but we are confident it wont happen)."

My response:

Where did you get your initial power source to provide the power to black start your plant? From another bulk power supplier?

Also, how easy was it to communicate with them? What if the phones were dead? How easy would it be to co-ordinate the sequence for bringing the grid back up?

What if your back-up power source was unable to provide the initial power in order to black start your plant because everyone had to be evacuated due to a toxic gas cloud caused by a "mission critical" embedded system failure at a nearby chemical plant? Who would you call on (assuming the phones are working) to provide the initial power "boost" to start up? At what point do you scrap all "contingency plans" (because "we didn't think about THAT happening"), re-evaluate the situation and start over from scratch? How long will it take to reach the conclusion that the contingency plans are useless?

You see Malcolm, there's MORE to it than you are willing to admit. This is called being "short-sighted". You can't grasp the BIG PICTURE. It's also the reason this problem exists in the first place.

Your type seem to NEVER learn. We don't exist in a vacuum. My problems are your problems and your problems are my problems. But somehow, the power industry thinks it is somehow immune from this fundamental law of the universe.

-- GoldReal (GoldReal@aol.com), November 24, 1999.


GoldReal:

I enjoy your tendency to lecture people who know everything there is to know about things you have no more than a feeble grasp of. And the Brass Balls to tell *them* they don't know. Well, the beauty of stupidity is, you don't have what it takes to realize it!

Meanwhile, the actual numbers were that there were about 250,000 total embedded systems (all systems, not different systems). Of these, about 8,000 used the date. Of those that used the date, 223 had a date issue of some kind. That's what we know. We do NOT know what these date issues were, or whether ANY of these 223 were critical, or if so, whether the date issue affected the functionality of those systems.

From this, you concluded that there were NO mission critical systems AT ALL. How about, just maybe, some of the 242,000 systems that don't use any dates being critical? You seem to be concluding that any system that doesn't use the date can't be critical. The reverse happens to be the case, according to those who actually know what they're talking about.

If you really wish to build a false case, you need to be much more subtle than you have been.

-- Flint (flintc@mindspring.com), November 24, 1999.


To quote Diane sigh.

No GoldReal, it isnt Malcolm that doesnt understand, its you.

First: look at the thread Brain posted above this called People Get Your Act Together. It will put all of the discussion above in perspective.

Second: Why is it unrealistic that they dont have any mission critical embedded system? Please state what you know, not what you suppose. Remember its failure would have to result in the LOSS of generation or transmission.

Third: The grid (or grids to be more correct) are not super sensitive to any fluctuations. You should be aware that they also arent the same. The Eastern Grid is what we would call stiff and consists of (mostly) short lines and generation close(er) to loads. The Western Grid is not nearly as stiff. Has long transmission lines and the loads are more remote from the supply. While the grid is aware of what happens on it. We got get reports of a temp. frequency change this AM from 60 HZ to 59.939 HZ it responds automatically to those changes.

Fourth: It depends what you mean by a loss of 1/3 of the grid. A loss of 1/3 the generation, 1/3 the load? Way before that happened the grids would island. Some islands would be black, some wouldnt.

Fifth: You dont understand what Malcolm means by spinning reserve. Your logic about back up power supply being more reliable is faulty. Utilities tend to run base load machines. These are usually the cheapest to run. Then as more power is called for, more costly power is put on line. Due to agreements a number of machines have to be in a configuration where they can be added to the supply by closing a breaker. They are already up and spinning but not on line. During the roll over a lot of these machines that would normally not be even running will be placed on either standby or in spinning reserve. Its not a back up power supply. And yes they would have to get the same scrutiny as the other machines.

Sixth: Your understanding of a black start is also wrong. After the famous black out of NYC in 1965, when plants did run into black start problems, small gas turbines were added to many plants. Partly for peak loading, partly to give plants black start capability. With Hydro its even easier. You just open up the gates.

Seventh: Your what if is just that. What if a giant meteor hits the plant and knocks it out during Y2K? What makes you think a lot of power plants are located near chemical plants? Why should contingency plans be useless?

And no, your problems are your problems. If there is "more then you want to admit" please tell what it is. Facts only, not what ifs. But I will agree with KOS. Save this thread. Read it after the roll over. Then you can say: Son of a gun, I guess they did know what they were doing.

-- The Engineer (The Engineer@tech.com), November 24, 1999.


Is anyone here familiar with the "6 Sigma" method of analysis? (And have the time to work it with me?)

Sorry, King. If the numbers are miniscule, I'm going to quit having nightmares. I'll keep preparing, but at least I'll get some sleep.

-- (Ladylogic@aol.com), November 24, 1999.


To both Flint and The Engineer:

I hear you and understand your responses. Yes, it's okay for you two to be scared. I understand your insecurity with your positions.

Just know that history proves time and again, tunnel vision people such as yourselves who obtain positions of power and influence (due to the fundamentally flawed concept of "Might Makes Right", or in this modern version, "Wealth Makes Right") end up self-destructing and taking everyone under their influence down with them.

You are "right". However, I am "correct". There is a difference.

-- GoldReal (GoldReal@aol.com), November 24, 1999.


Malcom is correct. Most Y2k bugs cause minor problems, not actual funtional failures. BC Hydro's experience are typical of the US electric industry findings.

Read the BC Hydro response to my email query closely (Brian posted a link to the thread above). The 223 devices in mission critical applications were remediated, but the failure modes were benign.

From BC Hydro's response:

1. Regarding the quote from the poster at the www.euy2k.com forum,

"At the end of the day 223 high impact devices were remediated. While Hydro does not specifically say what kind of failures would occur, the way they defined the problem tells us that these devices would have caused serious problems if they had not been repaired."

"This is a third party's interpretation of information posted on our web site and does not represent BC Hydro's viewpoint. What we have stated is that in our survey of all our 250,000 devices, we found about 8000 that were date-sensitive. Of these, about 3300 were "mission- critical", or potentially high-impact devices within our system. After these were tested, we found 223 that needed remediation. None of these failure would have caused a generator trip or transmission line trip. Even though these 223 devices are 'potentially' high impact devices, their failure modes were benign."

"2. Can you give a number or estimate of how many of the 223 "high impact" devices had y2k bugs that were severe enough that a loss of power generation or distribution would have occurred? (Any information you could provide regarding manufacturer/model of such equipment would be appreciated )

None would have caused a generation or distribution interruption. ---------------- From the BC Hydro website comes the following press release: http://eww.bchydro.bc.ca/html/news_releases/1999/nov/nov99-02b.html

November 2, 1999 - Year 2000 test has all computerized BC Hydro generating plants operating in Year 2000

Vancouver B C Hydro has advanced its generating plants to Year 2000 without a single glitch. All applicable generating plants  those that use date- dependent computerized technology  have operated successfully in "Year 2000 mode" for periods of two weeks each. BC Hydros Year 2000 enterprise coordinator, Seiki Harada, says this province-wide initiative is one of many integrated tests the utility has undertaken as part of its Year 2000 preparedness efforts.

"During the past three months, 12 power plants generated Year 2000 power into the provincial power grid for a two-week period each," Harada said. "Not one single Year 2000-related incident was experienced during the testing period, and we expect the same performance during the actual rollover."

Out of 34 generating plants across the province, 12 use mission critical date-sensitive computerized technology to operate. Technology in the other plants is either electromechanical  not computerized  or not date-dependent.

Computers, microprocessors and embedded controls in each of the 12 plants were advanced to January 1, 2000 and operated there for a week. They were then advanced to February 28, 2000 and operated there for another week. February 28, 2000 was chosen because it was identified as a potential problematic date related to the leap year. For both date periods, each plant rollover test was a success.

"As a precautionary measure, we staggered the 12 plant rollovers over three months," Harada explained. "In case we encountered any unexpected problems, we wanted to limit them to one or two plants. Fortunately, and just as we expected, Year 2000 incidents were not an issue."

BC Hydro has been preparing for Year 2000 since 1994. A comprehensive inventory, assessment, remediation and testing program revealed 223 mission critical devices requiring Year 2000 remediation. These were fixed in May 1999, and BC Hydro does not expect service disruptions due to Year 2000-related issues arising from its systems.

CONTACT: Nadine Cahan Public Affairs Coordinator Phone: (604) 623-3998 ------------

These guys have already operated their plants using y2k dates, you really can't ask for much more testing than that. I believe BC Hydro has done a great job of providing information, and there response to my query was very thorough. I only wish US utilities had been as . This is a problem with the US "legal" driven information society, and this is, IMHO, a bigger problem than y2k.

See BC Hydro's y2k site at: http://eww.bchydro.bc.ca/html/lib_news_art_year2000.html

Regards,



-- FactFinder (FactFinder@bzn.com), November 24, 1999.


Moderation questions? read the FAQ