Canadian Phone Failure Update: Cascading Effects

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

PHONES GO DEAD, TORONTO PUT ON HOLD Fire in Bell Canada switching station knocks out more than 100,000 lines Saturday, July 17, 1999

Toronto -- An accidentally dropped tool was the beginning of a chain-reaction disaster that led to a communications meltdown for Canada's biggest city yesterday.

For Toronto, it was a day when the phones didn't ring, credit cards didn't work, and countless plans went out the window, from ordering an airline ticket to closing a real estate deal.

The breakdown, which lasted for most of the business day, had repercussions across the country. Credit-card transactions as far away as Vancouver couldn't be processed, and hundreds of bank machines went out of service.

Toronto found itself having what amounted to an electronic nervous breakdown. Some brokerages couldn't process trades. Customers found themselves unable to call an ambulance, order a taxi, or pay with plastic.

At the Art Gallery of Ontario, extra guards were required for Old Masters paintings after security lines went down. Traffic-light sequencing was knocked out. Travel agents couldn't book flights -- or take calls from customers.

And if you were hoping to get rich quick, there was bad news: Ontario Lottery Corp. terminals were out of service, making it impossible to buy last-minute tickets for last night's unusually high Super 7 draw.

"People are passing up $10-million," said H.W. Chan, manager of the Sun Wa Book Store on Spadina Avenue.

The cause of the problems was a telephone system breakdown that began with an early-morning fire at a Bell Canada switching centre on Simcoe Street downtown. The fire reportedly began after a repairman dropped a tool.

The tool landed on electrical equipment and the fire spread quickly. At its peak, more than 70 firefighters were on the scene.

What followed was a series of failures that revealed the fragility of the complex communication systems society takes for granted. Although backup batteries were in place to power the switching system, they were designed to last only a few hours.

The backup plan called for the use of diesel emergency generators after the batteries failed, but officials decided that wasn't safe because of the water left by emergency sprinklers.

When the batteries began failing, at around 10.30 a.m., service to approximately 113,000 Bell phone lines was wiped out. Most of those lines were in Toronto's downtown core, the most communications intensive patch in Canada.

The breakdown left Barry Gutteridge, Toronto's commissioner of works and emergency services, shaken about the city's vulnerability. Mr. Gutteridge said there will be an investigation into the accident, with a view to reducing the city's exposure in the future.

Other than a repairman injured at the site of the explosion, Mr. Gutteridge said, the disruption injured no one. Instead, it caused a series of potential crises that were averted only through luck and improvisation.

Mr. Gutteridge, for example, had to co-ordinate the city's emergency fire and ambulance services from the radio room at Fire Hall No. 1 on Adelaide Street when both his office line and cellphone failed. When the backup system conked out at Bell Canada, ambulance services managed to notify Mr. Gutteridge about the magnitude of the problem by E-mail.

Despite the frustrating communications problems, Mr. Gutteridge said, fire and ambulance services responded efficiently. The host of complications included the failure of all telephone lines to the Hospital for Sick Children. A mobile radio unit was sent to the hospital to handle emergency calls.

Bell Canada spokesman Don Hogarth said the 911 emergency service was maintained, although its capacity to handle calls was impaired. Mr. Gutteridge said that if 911 had gone down, mobile radio units would have been sent to affected areas to give members of the public a way of calling in emergencies.

Mr. Hogarth said the area most affected was between College Street to the north, Queen Street to the south, Bathurst Street to the west and Bay Street to the east. The shutdown "zigzagged" to areas outside that core zone as well, he said, depending which phone lines they relied on.

Several investigations are being conducted, including a Labour Ministry investigation into the industrial accident, the fire marshal's investigation, and Bell Canada's investigation, in which Mr. Gutteridge said the city will be involved.

The effects of the outage were widespread, ranging from the institutional to the personal.

Nancy Tarek of Oakville sat in the lobby of Toronto General Hospital for several hours yesterday, pumped full of painkillers, Valium and other sedatives after a medical procedure.

Because of her condition, Ms. Tarek wasn't allowed to go home by herself. But because the phones were out, she couldn't call her family to come pick her up.

"I'm just sitting here half-medicated," she said.

Some Torontonians found virtually all their communication options cut off: Telephones, fax machines, pagers and cellphones routed through the Adelaide Street Bell switching system were all knocked out.

Many securities dealers had problems communicating trades and relied on cellphones until phone lines were back up, but the Toronto Stock Exchange kept operating.

At University Avenue Funds, mutual-fund sales people who couldn't make calls simply went home.

A skeleton staff remained, processing transactions made before the phones crashed.

"I'm trying to fax over trades to the bank and they won't go," accountant Shelina Dossa said.

Almost one-tenth of the cash machines operated by the country's six big banks were out of service for parts of the day, the Canadian Bankers Association said. The Toronto-Dominion Bank was hardest hit.

Hundreds of bank branches lost access to their systems. Many simply shut their doors and referred customers to other areas where phone lines were still working.

The electronic failure created a short-lived bonanza for couriers, who suddenly found themselves in high demand. At the Printing House copy centre on University Avenue, manager Chris Gennings said the cost of a courier had been driven up by the briefly altered market conditions.

"You go out on the street and offer them $10 and they say the going rate's $20," he said. "And if you argue, suddenly the going rate's $25."

The breakdown created a nightmare for retailers, who were unable to authorize debit or credit card transactions. Some, including Loblaws, accommodated customers -- and created a bankers' nightmare -- by taking customers' debit card numbers and phone numbers so banks could call them back to confirm the transaction.

Some businesses decided to do credit card transactions even though they couldn't get them approved.

"I hope and pray a lot of trustworthy people are shopping today," one retail manager said.

Hospitals and other medical services were seriously affected. Phone service was out at the Hospital for Sick Children, Mount Sinai Hospital, Toronto Western Hospital and Toronto General Hospital. Hospitals were also affected by the failure of pagers, which they use to track down specialists, surgeons and doctors on call.

Sick Children's poison-information and medical-information lines were shut down. Those two lines usually receive nearly 400 calls a day from all over the city and province.

The failure created chaos for many law offices, which found they were unable to close real-estate deals because the main clearinghouse for title searches was unavailable, putting millions of dollars worth of potential transactions in jeopardy.

Travel agents were particularly hard hit. David Gallie, manager of the Flight Centre on Queen Street West, said the day was a wipeout for his business.

"I've lost $30,000 worth of sales," he said. "Clients can't order tickets. I can't call the airlines, and I can't book a seat. And I can't sell a ticket because the credit-card authorization system is down."

For Toronto police, the failure meant the loss of phone and computer systems, although their radios and most cellphones still worked.

Constable Don Petrie, who works in the Eaton Centre, said the phone failure gave police "a bit of a taste" of what could happen if the millennium bug wipes out computer systems on January 1.

"It's a bucket of cold water," he said. "It shakes you back to what it was like when we didn't have these services." http://www.globeandmail.ca/gam/National/19990717/UMAINN1.html =====================================================

"we should also obviously expect that we will have a large number, possibly, of what would be manageable failures taken one at a time, which will overwhelm the normal emergency response processes when they happen all at once." ...

"we've asked FEMA to... make clear to the state and local emergency managers ... that those local governments should not assume that the federal government and FEMA will be able to come to their assistance no matter what their problem is, because we may have so many problems in localities across the country that we can't be everywhere at once. "

John Koskinen, Chair - President's Council on Y2K Conversion Transcript, APEC Summit, May 4, 1999 United States Information Agency http://pdq2.usia.gov/scripts/cqcgi.exe/@pdqtest1.env?CQ_SESSION_KEY=YLWXNVIGNNZM&CQ_QUERY_HANDLE=123990&CQ_CUR_DOCUMENT=1&CQ_PDQ_DOCUMENT_VIEW=1&CQSUBMIT=View&CQRETURN=&CQPAGE=1 =====================================================

CBS July 15, 1999 CITIES NOT READY FOR Y2K - Only 2% of 21 Biggest U.S. Cities are Prepared - 9 States say they're less than 70% ready - Computer glitches could disrupt city operations

The nine states that reported having completed work on less than 70% of their most important systems are New Hampshire, Ohio, Alabama, Louisiana, Colorado, Wyoming, New Mexico, California and Hawaii ...

"Completing Y2K activities in the last months of the year increases the risk that key services will not be Y2K-ready in time for 2000 because there will not be enought time to deal with unanticipated complications,' Willemssen said ...

Sen. Robert Bennett, a Utah Republican who heads the special Y2K committee, said he feared that many state and local governments were "leaving little room for testing, contingency planning and unexpected problems."

"Only very efficient executive-level management and contingency planning can sustain us through the upcoming historic date change." Sen. Christopher Dodd, D-Conn. http://www.cbs.com/flat/story_168939.html

===================================================== WASHINGTON - Dozens of towns and cities across Texas were faulted at a Senate hearing Thursday for ignoring inquiries about their readiness for year 2000 computer problems.

One hundred of the state's municipalities were contacted by staffers with the Senate's special committee on the year 2000 problem, but only 25 responded.

"What are you going to do about people who insist on remaining asleep?" asked Sen. Robert Byrd, D-W.Va. "What are they going to do in the face of a dire emergency - sleep through it?" ...

Of the 25 Texas municipalities responding to the Senate survey, two-thirds said their emergency services were ready for the year 2000. Fewer than half said they have a written contingency plan in case of failures, and hardly any reported independent verification of repaired equipment. http://www.dallasnews.com/texas_southwest/0716tsw3y2klocal.htm

167 Days until 2000



-- Cheryl (Transplant@Oregon.com), July 17, 1999

Answers

Welcome to our future.

Thanks for the grim tour, Cheryl. Maybe one more person somewhere will read you posts and start to understand.

-Greybear

All the Kings horses and all the Kings men couldn't put Humpty Dumpty together again.

-- Greybear (greybear@home.com), July 17, 1999.


"electronic nervous breakdown"
how apropo

-- bad day (technology@not.answer), July 17, 1999.

No injuries, Gutteridge said? While this article does not link the stuck elevator to the phone problems, other media reports have. Some of these folks had to be hauled-off to hospital because of heat exhaustion.

temporary link

(for educational purposes only)

"Updated: July 17, 11:40 am

CN Tower/Elevator

Some visitors to the CN Tower in Toronto got quite a scare after getting stuck in its glass elevator more than 70 storeys up the building. The 30 tourists were trapped for 45 minutes because of a malfunction. The problem was compounded by the searing heat gripping Toronto this Summer. Power was eventually restored and the elevator was lowered safely to the ground. Five Japanese tourists were sent to hospital to be treated for heat exhaustion. There's still no word on what caused the problem."

-- Rachel Gibson (rgibson@hotmail.com), July 17, 1999.


There appears to be a lot of clumsy workers out there from the recent problems we are seeing.

-- Linda A. (adahi@muhlon.com), July 17, 1999.

I expect Flint will have a rationale for this event, perhaps along the lines of -- "this just shows what people can do in a crisis."

Including -- of course -- the observation that the Toronto phone crash had nothing at all to do with Y2K.

Perhaps I'm being uncharitable....

The story reminds me of the 1967 blackout on the East Coast. Some hospital in New York City had thoughtfully installed backup diesel generators for just such an event. The lights went out, battery powered emergency lighting came on, maintenance rushed to the generator room -- and discovered that the diesel generators were equipped with electric starting motors. Not their finest day.

-- Tom Carey (tomcarey@mindspring.com), July 17, 1999.



As an exercise in linked systems: we have one failure (a dropped tool) causing a single effect (a small fire in one room in one building) that affected one service: (telephones).

I'd like people to try to count all the "systems" eventually affected. The result will be very sobering. Post your counts below - winner gets to mud wrestle the "queen of spain."

-- Robert A Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), July 17, 1999.


You mean the King is taken???????? Obviously, The Queen must NOT mud wrestle.

-- Will continue (farming@home.com), July 17, 1999.

Listen, I'm not sure what's wrong with Flint, but he sounds like he is at a snapping point on some of the other threads. If he comes over here, go easy on him. Flint, take the weekend off, seriously.

-- BigDog (BigDog@duffer.com), July 17, 1999.

Fuck that condecending little bastard! He deserves everything he gets.

-- Asshole Spotter (...@...), July 17, 1999.

Where are all the polly's that refuse to see the big picture?

-- FLAME AWAY (BLehman202@aol.com), July 17, 1999.


Actually, I expect a lot of this sort of thing. There are many single points of failure with widespread ramifications, and this is a great illustration. I predicted a thousand such events worldwide, and see no reason why that prediction shouldn't stand.

To add to Tom Carey's list, I should point out to Robert Cook that there was a great deal of improvisation. People aren't that rigid after all. I should also point out that although the effects of the explosion and fire can't be repaired in a day, the problems were solved one way or another in a day. It seems clear that if there are too many similar failures, the repair/workaround rate of each will diminish rapidly.

Without question, there are key facilities, critical points in our systems. We haven't identified them all, but some are sure to identify themselves bigtime. In critical places, single bugs can have amazing impacts. I consider this case Exhibit A for preparation -- as the article says, a taste of what might be coming.

-- Flint (flintc@mindspring.com), July 17, 1999.


Robert,

TO's mayor said today that the telephone outage cost the business community "millions of dollars" in lost business. And the technology journalist who has been making the links between this and y2k says Bell won't have to pay a cent, other than a small portion of phone bills for the time the phones were not working. Don't bother with "frivolous" lawsuits.

A pair of buzzwords you'll see more of down the road: the Bell spokeswoman kept saying "that's proprietary information" to pretty well every question asked of her. Forget the "public's right to know," "proprietary information" takes precedence.

Question: how many electrical engineers or electricians would be so careless as to drop a tool around a high voltage electrical panel? Reports consistently say the worker received a jolt; I'm wondering if he got the jolt before or after the tool was dropped.

-- Rachel Gibson (rgibson@hotmail.com), July 18, 1999.


Rachel:

You certainly are harsh here. Consider:

1) If every supplier of any service were responsible for all indirect, ancillary or collateral costs estimated by all customers for the disruption of that service, nobody could afford to provide the service in the first place. There must be limits of liability.

2) There needs to be a distinction between what something does, and how it does it. There are countless ways to do almost everything. Those who invent the best ways are rewarded for their cleverness (we hope). Do you really think the public has a right to know how something works? The only members of the public who can really use that information are the competitors!

3) People have accidents. Even specialists. Even the best programmers write bugs [g]. This isn't carelessness, it's misfortune. So how many electricians might suffer a misfortune? ALL of them. These things happen.

You sound a bit like the jury members who feel sorry for the victim, and determine to find somebody, anybody, at fault to pay for it. Even if nobody did anything intentional. In this case there was no hint of fraud or deceit anywhere. These things, bad as they are, just happen.

-- Flint (flintc@mindspring.com), July 18, 1999.


Gee Flint. Too bad people won't give a flying -bleep- about that theory of yours WTSHTF and failures begin to touch people on an individual basis, know what I mean?

Reality check.

-- Will continue (farming@home.com), July 18, 1999.


Okay, Flint, I'm harsh. What happens when it's a life lost, instead of money lost? No problem...just as long as it's not personal, eh?

-- Rachel Gibson (rgibson@hotmail.com), July 18, 1999.


Rachel:

I'm not sure I understand your reply. If phone service didn't exist, then nobody could disrupt it by accident, nobody would suffer any losses from this disruption, and nothing about it would be proprietary. But even more lives would be lost.

Suggestions like yours, while undeniably compassionate, tend to experience the Law of Unintended Consequences with thumping regularity. Some people seem quite content to sacrifice several lives to save one, provided the life saved is identified, and the lives lost aren't reported, or happened indirectly (for example, how many lives have been lost because a cure for cancer has not been found, because someone somewhere decided not to fund the research that would have found it?)

When none of the choices are very good, the optimal tradeoff leaves a lot to be desired.

-- Flint (flintc@mindspring.com), July 18, 1999.


Bringing this discussion back around to what analogies can be drawn between this incident and Y2K....

Yes, it did have some cascading features because it happened at probably the most important phone centre in Canada. If this bonehead worker had dropped his tool in a phone centre in Flin Flon, Manitoba, then it wouldn't have even made the news in Winnipeg.

*Nobody* was *expecting* this tool to drop his tool. Come the rollover, you don't think that Bell Canada (and all the other utilities in N America) will have personnel stationed at all important locations watching like hawks for any signs of trouble? It's not gonna be a surprise as the clock ticks down on 31 Dec 99. [Don't foget that the roll-over happens on the Friday night of a long weekend - not at the beginning of a busy work day.]

I concede that if enough sh*t happens at the roll-over then we are in for a long road repairing everything. But I think that the combination of repair/testing and the contingency plans in place for an event whose arrival is known precisely in advance wil mean that the sh*t won't hit the fan. Just an opinion, but one that I am more confident in than I was 9 months ago.

-- Johnny Canuck (j_canuck@hotmail.com), July 18, 1999.


Johnny:

You may be missing the point. y2k won't cause tools to drop, and critical facilities will be fully staffed at a high state of alertness. But the illustration here is that there are single points of failure that can have impacts all out of normal proportion. My gut feeling is that nobody has identified a few of these, and ripple effects will often be surprising.

While I feel the domino theory is a bit neater than reality, and there are gaps and redundencies and people will improvise to create firebreaks, enough individual chains will break to make their effects felt by all of us to one degree or another. I don't see how this can be avoided. Few of these will deprive us of things we can't do without, but most will deprive us of something we'd strongly prefer not to do without either.

-- Flint (flintc@mindspring.com), July 18, 1999.


How "long" emergency responses can be maintained is uncertain - under dire straits, people have lived in tents too during emergencies, but do you consider that an adequate substitute for their regular housing?

-- Robert A Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), July 18, 1999.

[ For Educational Purposes Only ]

A Taste of Y2K?

July 20, 1999

A Taste of Y2K?

As we've moved inexorably toward that Y2K moment when computers watch '99 become '00 and get confused, the reassurances have risen like a great cloud.

We'll be fine. Investments are safe. Elevators won't stall between floors. Planes will land. Phones will ring. Lights, banking machines, emergency services, traffic lights, life support systems - all will function.

And the Internet, of course, was designed to withstand nuclear war.

But not, it turns out, a monkey wrench.

Last Friday, someone dropped a spanner in the works at Bell Canada in Toronto and we all saw ghosts of Y2K to come. The telephone chaos cost at least $1 billion in lost business, stock trades, purchases halted in mid-swipe.

The ripples were pure idiosyncracy. They missed much of Montreal, but hit Halifax, Vancouver, Chicago. They knocked out touchtone phones but not rotary dial phones - for those who remember what they are.

Along with the billion, we took an even bigger hit in our confidence about Y2K. And even before our vulnerability could fully flower, we got hit again. A Bell equipment ``glitch'' in Peel left a million people without 911 service for much of Sunday.

And - oh, devious technology - many of those who dialled 911 without result found they couldn't call anyone else for help, either.

That's because the 911 system is designed to maintain connections to trace calls, even if callers hang up.

Clearly, the Y2K problem is different. But not that different.

In fact, the main difference is that Y2K doesn't need a real monkey wrench to blow the system. A virtual monkey wrench is just fine.

So there are at least two lessons in this: For those in charge of fixing the Y2K problem, please check it all again.

For everyone else, better hit the flea markets and garage sales before the rotary dial phones all disappear.
-----------------------------------------------------------------
xxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxx xxxxxxx

-- Ashton & Leska in Cascadia (allaha@earthlink.net), July 21, 1999.


Moderation questions? read the FAQ