how do we find out what are root causes of y2k failures?

greenspun.com : LUSENET : HumptyDumptyY2K : One Thread

The FAA sends in an accident review team everytime there is an aircraft crash. They determine root causes of accidents and make the causes public to prevent the same "accident" reoccuring. Aircraft crashes cause death which justifies the need/cost for this review.
How do professionals in the software industry and the public find out root causes of information system failures that may involve multiple vendors/sites/users? This is particularly relevent to y2k failures because lessons learned from accurate analysis could prevent similar failures in other industries, in other locations.
Ian

-- Ian Wells (wells@jymis.com), August 26, 1999

Answers

Ian .... Could the answer to that question possibly be that management of company A, which spent half a million to solve their problems , do not want company B ( mfg a similar product ) to have any advantage profit wise that could be used, say , in R&D , thereby gaining advantage in product AND bottom line ? I am certainly not the first to mention this reason for NOT sharing info . Also, stock holder watch dogs/whistle blowers might question the outlay, especially if they don't see the big picture , and/or have NOT A CLUE to Y2K. Another possibility is that management may have hidden these costs in some obscue area of semi annual reports , for the very reason I mentioned above. It is this greed of people/companies , which , in MHO, will bring us to the 8,9 or 10 level we most wish to avoid. Eagle

-- Hal Walker (e999eagle@freewwweb.com), August 26, 1999.

A much more likely explanation is that if manufacturer A says device built by company B will fail, they get sued. And if they admit to unfixed Y2K problems, their stock drops. With all the happy face reports, some may think they are the only ones with serious problems.
Better to just fix it (or try to) and say nothing.
As for reviews by an NTSB-like organization for software, you're dreaming. Unlike airplanes, where no change goes in without thorough testing and documentation, software is a house of cards, changed frequently by anyone the company lets work on their code. No controls, no certification, no reviews, lightly tested.
I'm not exaggerating here. I've worked as a contract programmer for years, and more than once, changes I've made go right into the product in the field on nothing more than my say so.
Hopefully, process control, banking, etc. are held to a higher standard, but I doubt it.

-- You Know... (notme@nothere.com), August 27, 1999.

The root cause is simply that 100 isn't a 2 digit number. The Julian calendar is reasonably complicated - different length months, leap years, and many businesses all work on "week" numbers also, which aren't always standardised. Programmers over the years have had to face and address many of these issues whenever they were using date related commands. Their inherant belief that time always goes forwards, and perhaps coupled with the idea the year 2000 is "a long way away", has allowed them to not worry about what may happen if the computer's time suddenly goes backwards, or their representation of it allows for that . I remember wondering when I first started school and they explained the date to me, what would happen when we got to 2000 - especially when people write the year down as two digits.
I still get different people telling me 2000 is or isn't a leap year - depending on their understanding of it. I know it is of course.
The only way the public would be sure to know about potential problems would be if their was a mandated testing regime combined with a public ledger for bug reporting.
I regret to venture this opinion, but software that has the largest market share is likely to be the most tested and proven - simply because more people are asking about it.
Lance

-- Lance Roberts (lance@2000solved.com), August 27, 1999.

Root Cause: Failure to take a long term view.

-- Mad Monk (madmonk@hawaiian.net), August 27, 1999.

Moderation questions? read the FAQ