Flight control center problems may be Y2k related

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread


Friday, January 7, 2000


Flight control centre problems may be Y2K related


A software "patch" that the Federal Aviation Administration (FAA) installed on its host computers before the Y2K rollover may have been responsible for the computer system crash that grounded planes all over the United States East Coast yesterday.

Last week, workers from the Professional Airways Systems Specialists (PASS) - the union representing workers who fix and maintain the FAA's computer systems - said the FAA had ordered an eleventh-hour computer patch to be applied to all of its host computer systems around the country, in an apparent last-minute effort to stave off complications from a possible Y2K glitch.

The system failure at the Washington Air Route Traffic Control Centre yesterday comes after a similar breakdown at an air traffic control centre in Boston earlier this week, where a crashed computer hard drive held planes in limbo for hours and delayed flights at nearby airports.

PASS national assistant Mike Perrone said that although the two problems involved a malfunction in different types of equipment, the FAA might not have thoroughly tested the patch in its rush to fix its systems before the New Year.

"I'm not saying these two situations are identical, but when all of a sudden you've got two problems pop up just a few days after you've put a patch in ... it's kind of hard to say that's just a coincidence," Mr Perrone said.

Mr Perrone said a new host computer at the Leesburg air traffic centre became overloaded with flight information when Wednesday's data was not automatically cleared from its memory.

The resulting shutdown forced the FAA to ground planes at all three Washington, D.C. area airports, causing backups at airports in Boston, Philadelphia, New Jersey, Raleigh, North Carolina and all three New York airports.

FAA spokesman William Shumann said Mr Peronne's statement "borders on the irresponsible", adding that the cause of the glitch was not yet clear.

Mr Shumann said the only thing that was clear was that the problem was not related to the software patch.

"The patch contains about 16 lines of code, inserted into a system with hundreds of thousands of lines of code," Mr Shumann said.

The patch was put in place on December 30 to deal with the "very rare chance that something could happen exactly at the rollover to the New Year. There is no evidence that the patch has anything at all to do with this morning's outage" or the outage in Boston, Mr Shumann said.

Mr Shuman did say there seemed to be at least a "superficial resemblance" between the incidents at Boston and Washington in that "both apparently involved a problem in a peripheral unit that led to a problem in the main computer itself".

The equipment that broke down at the Washington centre was installed by the FAA in March.

-- Homer Beanfang (Bats@inbellfry.com), January 07, 2000


Homer, thanks! Keep 'em coming. Looks like news is picking up.

-- silver ion (ag3@interlog.com), January 07, 2000.

"'The patch contains about 16 lines of code, inserted into a system with hundreds of thousands of lines of code,' Mr Shumann said."

What a LAME thing to say.

A bug -- even a *tiny* bug -- even if it's ONE STINKING BYTE LONG -- can hobble an entire organization.

All it takes is the right bug in the right place.

Methinks Mr. Schumann is sweating bullsh^H^Hets.

-- Ron Schwarz (rs@clubvb.com.delete.this), January 07, 2000.

Isn't it funny how this post has been up for a while now and no "pollys" have commented on it? When it was unconfirmed yesterday, they were all over it. My guess that as the reports of failures mount, the "pollys" will begin to disappear from this board. I'm still not quite sure why they're here anyway. I guess they're the self-appointed Y2K-policemen whose duty it is to make sure the "doomers" pay for their huge mistake of preparing responsibly. Hey, whatever makes you feel good about yourself, do it, right?

-- EricE (ready@for.anything), January 07, 2000.

Well, okay, I can comment on this, for two reasons. First, I'm a programmer; and second, I'll be flying into the U.S. Northeast later this week. So I guess I'll find out.

This article is quite well-balanced. Unfortunately, a lot of relevant information (that I'd like to have) is not here.

For example, was the Boston problem really a crashed hard drive? If so, it's almost certainly a fluke, and the problem will not recur. Hard drives are unreliable; they fail all the time without any help from buggy code. So it's probably a fluke. And crashed hard drives get replaced, obviously, so it shouldn't happen again soon.

Another question: What happened in Boston when the failure occurred? I'd like to know if their radar screens went black. The article says planes were "in limbo." Does that mean they had to circle in the air? How much of a safety risk is that, if so? I haven't heard anything or found anything on the net. We don't know. We just know there weren't any crashes.

Another question: What happened in D.C.? It sounds like a lot of planes were grounded and delays accumulated -- but no safety risk.

Now, the key question: Was the patch relevant to the problems or not? The FAA says no, but let's set that aside for a moment (including the debatable "lameness" of Mr. Shumann's comments). Is it likely that the software patch caused the two hardware failures, regardless of what the FAA says?

No. The most common cause of hardware failures is probably age- related decay; after that, manufacturer defects; after that, physical misuse, abuse, or neglect (Computers don't like dust-- or Gatorade.) After that, the fourth-most-common cause is probably bad or misconfigured video drivers (these can ruin monitors). But only *very* unusual code will cause hardware failures. It seems a great stretch that 16 lines of code would cause failures in two different kinds of devices.

-- Jason Orendorff (jorend@yahoo.com), January 11, 2000.

Moderation questions? read the FAQ