Original letter from IBM re:3083 mainframes ATC

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Note the date(s) on this letter head and in the text. This may well end up in a courtroom someday. (Picture sobbing familys, friends of air disaster victims due to FAA negligence.)

Thought this deserved a thread. Started commentary at the end of

http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=000a0L

IBM Office of the General Manager Global Government Industry Bethesda, MD 20817 670 Rockledge Drive October 2, 1997 Mr. Robert J. Stevens President Lockheed Martin Air Traffic Management 9211 Corporate Boulevard Rockville, MD 20850-3202 Dear Mr. Stevens, I received your September 18 letter regarding a Year 2000 review on the 3083 hardware and microcode for the FAA HOST contract. I share your concern for the effective operations of the 3083 into the year 2000. IBM presented this concern and the need to replace the 3083s, primarily due to the lack of parts, age of engineering design, and the lack of year 2000 compliance to Lockheed Martin in June 1996. In addition, the FAA requested that we document these concerns in a letter in February 1997. The Lockheed Martin and IBM teams have jointly presented to the relevant FAA departments the need to replace the 3083 systems as quickly as possible. Both Lockheed Martin and the FAA have requested that IBM assist in the assessment of Year 2000 compliance issues on the 3083. Analysis of 3083 microcode involves reviewing hundreds of thousands of lines of microcode written in several different protocols. This code was written in the 1970s to support an architecture that has changed dramatically over the ensuing years and new processor generations. IBM does not have the skills employed today that understand the microcode implemented in the 3083 well enough to conduct an appropriate Year 2000 assessment. In addition, the tools required to properly analyze the microcode do not exist. We have expressed these concerns to your team and the FAA. The FAA has requested consulting support from IBM to better understand IBMs testing methodologies on other hardware platforms. We are prepared to review the FAAs Year 2000 testing procedures for the 3083 and make recommendations on how to improve that process. We are willing to provide a similar consulting service to Lockheed Martin. IBM remains convinced that the appropriate skills and tools do not exist to conduct a complete Year 2000 test assessment on the 3083s. IBM believes it is imperative that the FAA replace this equipment prior to the Year 2000. IBM has invested considerable resource to research this situation and communicate needed action to Lockheed Martin and the FAA. We will continue to support Lockheed Martins efforts to do the same. I look forward to working with you to achieve these results. Sincerely, Kenneth R. Thornton

-- RD. ->H (drherr@erols.com), March 07, 1999

Answers

Certainly seems clear - as far as IBM goes. Don't use 'em, we can't fix 'em, we don't trust 'em to do anything more complex than heating up a cold room with perfectly good electricity..

-- Robert A. Cook, P.E. (Kennesaw, GA) (cook.r@csaatl.com), March 07, 1999.

perfectly clear and utterly baffling at the same time. IBM tells FAA to ditch those old dogs immediately. They may fail, IBM doesn't know how, and can't find out. *Imperative* that FAA replace them. Strong language indeed.

FAA turns around and spends at least the last 18 months trying to use them anyway. Why? Most likely because FAA themselves no longer have the knowledge of exactly what they do site by site, or how they've been tweaked over the years. And they're wired into a huge, complex realtime system that simply cannot be brought down even momentarily for any reason. And replacement of the entire interdependent system at the typical government pace would mean almost no air travel in the US for a decade, and a slow bug-ridden rampup after that. The only alternative is a desperate race to find something as similar as possible, not quite as obsolete, and see how close they can come. Last I heard, FAA was trying with limited success to get a single site simulation working.

Kind of like being told you have AIDS. Always fatal, no cure, but you might live for quite a while if you're lucky.

-- Flint (flintc@mindspring.com), March 07, 1999.


But here, the fatality won't happen to the administrators running the FAA computers, but to the plane passengers and crews using them.

Like an Aids doctor - sorry, that's an expensive fix - guess we're gonna die. But only the patient has aids - ther's no "we" involved.

-- Robert A. Cook, P.E. (Kennesaw, GA) (cook.r@csaatl.com), March 07, 1999.


See much earlier post on the "Chinese y2k Cure". Put the FAA officials in the air 12/31/1999. Start in NYC about 11:45, and head West. Let's see if they can beat the clock, and land somewhere before the rollover. This should be mandatory for all officials in charge of public safety. NRC chairman should be in a ComEd nuclear plant (Zion?) for New Years Eve party, if they don't shut unsafe (non- compliant) plants down. As for Ko-skin-em, let him ring in the New Year in any public housing complex in any major urban area.

-- Bill (y2khippo@yahoo.com), March 07, 1999.

I'm confused by the 18 month delay myself, except that the gov can't even blow it's nose without a half-dozen meetings on where to get the Kleenex. The 3083 is an early member of the System/370 family, and as such has the same instruction set and the same bus/tag channels as all other 370s. I read the other thread and RD brings out several good points about timing and customized hardware. Even so, all that old hardware has only one place to plug into the big box, and any timing problems would be a relatively simple thing to fix. Any other tech views on the problem? <:)=

-- Sysman (y2kboard@yahoo.com), March 07, 1999.


Aside from other factors, there are a good news, bad news, couplet for the 3083s. The good news was an announcement that the FAA found one or more retired IBMers who looked into the 3083 microcode and concluded that its equivalent of a real time clock had a 32 year cycle and a base date of 1975. While this does not solve all potential 3083 Y2K problems, it does offer some hope. The bad news was a report that they were down to a few spares of a key part.

The following is from:

http://www.newc.com/natca/publicsafety/faay2k.html

"The processor uses Thermal Conduction Modules that contain processing chips.

Module failures can have consequences because they cool the processing chips. There is a shortage of spare parts for five types of these modules and they are failing at an increasing rate. For these 5 modules, there were 4 failures in 1995 and 12 failures in 1997. In addition to age, one factor which may be contributing to the increasing failure rate is that refurbishing after 7 years, as recommended by IBM, was not done. Despite a worldwide search to acquire additional units, there are only six spares left is the inventory for a key module. When the spare modules are no longer available, FAA will have to obtain parts by cannibalizing HOST systems at its two support facilities. "

Jerry

-- Jerry B (skeptic76@erols.com), March 07, 1999.


And by the way don't worry about those o-rings on the booster rockets for the Shuttle Challenger they'll be just fine. So much for the government experts! It may be a different government agency but you get my drift! Tman

-- Tman (Tman@IBAgeek.com), March 08, 1999.

For those interested, here are some Cory H comments over the last year on the subject:

formations." It is machine code but for the underlying machine. The underlying machine is some kind of simple, fast, computing engine that runs the microcode, the microprogram. At IML or IMPL time, if we're looking at a S/370 145, we load the control store with a microprogram that *is* a S/370 emulator. The microprogram then interprets S/370 opcodes. Microprograms, microcode, while the purists will say this is different from code for microcomputers, it really isn't. It's all just code. If you can read and write S/370 machine language, you can read and write microcode, it's just different low level language and specific to the underlying hardware.... Except, some people are using C, PL/M and in our world, PL.8, as microprogramming languages. In theory, (again to a purist), you're a microprogrammer if you drive the hardware gates directly, opening and closing paths through the hardware logic. In practice, as you pointed out, you're probably programming in a high level language and using a cross compiler anyway... so what's the diff? My guess on the FAA 3083 issue is that IBM has some customized microcode in the IO part of the 3083s. Those things had channel directors as I recall. IBM probably implemented back level support to maintain compatibility with either the RADAR or the displays. I don't think the Y2K problem is in the channel. I read last year that it was in the system control and data acquisition subsystem. There's that word again, SCADA. Given the vintage, the 3083 microcode development environment probably was a macro-cross-assembler running on S/370 TSO. This is speculation but how I'd do it. It could even have been IFOX00 with a custom SYS1.MACLIB, again, that's how they did it in those days. But hey, what do I know, we have people here who are happy to mouth off and call me clueless. ...so I suppose, clueless I am... even though I've seen teams do exactly this kind of programming... in mythic times, that is. cory hamasaki (JUNE 1998) ==================

Jerry B,

I haven't found any published reference to your ex-IBMer opinion. How about a link to same. In any event, the date handling appears to be scattered through the microcode for data acquisition and not solely a problem with a virtual RTC.

Sysman,

The I/O seems to have evolved from the electro-mechanical to the virtual environment. The original 360/50's had physically separate controllers for each device. On the 3083, my understanding is that a common I/O channel(?s) is used but the microcode was tweaked to simulate separate (physical) inputs and make the application code believe it was still on the 360/50. Because there might be dozens of radar inputs from different kinds of radar, I think this is where everything gets verry foggy. In addition, (if that weren't enough, the 3083's were amoung the first self monitoring/diagnosing computers. Apparently, there was also a microcode problem in the internal management system of the cooling system (something to do with pump controls).

-- RD. ->H (drherr@erols.com), March 08, 1999.


Moderation questions? read the FAQ