If it is supposed to work so well, why not "fix on failure" now?

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

The follwoing was inserted near the bottom of one the recent "nuclear grade" threads - and since the reader brought up a good question related to recent "operator errors" in the nuclear industry, I'd like to repeat it to a broader group for comments.


No - we are supposed to believe the following:

So many 'operator errors' (from the workers in the highly trained and well-regulated nuclear power plants) and we are supposed to believe that the (rest of the industries and utilities across the world - with little training, no qualifications, no manuals, no drills, no engineerig department backups, no design safeguards, little or no Y2K preparations and integrated testing, and no outside inspectors are) capable of manual operations ....

Fix on failure anyone? How?

If "fix on failure" worked, why not do it now? Set the plant's ahead - see what breaks, and fix it now. That's what they claim will work next year, why not do it now?

-- Robert A. Cook, P.E. (Kennesaw, GA) (cook.r@csaatl.com), March 15, 1999


Thats a valid question Robert, and one Ive pondered myself.

These are the most obvious reasons I can see for not doing it.

1. Integrated systems and networks dont deal with time independently from an information perspective. This is one of the things that makes Y2k testing so difficult. When you roll the time forward It has to be done fairly consistently before the systems begin working with each other in any meaningful way. Right now they are synced up and will roll together with each other naturally. Trying to force this situation prematurely into the year 2000 would be very time consuming and costly and would probably introduce errors due to time syncronization which would never occur in the real world.

2. If problems are identified , rolling back to the right date to fix them, would even be harder. This was probably our most common problem when remdiating our y2k problems:

Testing on Jan 3 2000, problems are found. The engineers fix the problems and all systems dates are back to 12/31/1999. You think you have it covered by restoring the backup sets too, so no spurious 2000 data remains, but when you retest, the entire application fails. Why? because the vendor product your using is checking its liscence parameters on a file its deliberately hid from you and it thinks your deliberately trying to circumvent the liscence. Now you gotta call the vendor, find the file and fix that too. Or worse yet reinstall his software. Bottom line, never roll a production system back timewise.


-- nyc (nycnyc@hotmail.com), March 15, 1999.


I think that most of the people who plan to resort to fix on failure are in either one of two situations:

1. The required amount of testing exceeds their resources.

2. The required amount of testing costs more than the estimated costs of potential failures.

In both cases, there would be an underlying assumption (hope?) that the percentage of things that will break is very small and that they will not be show stoppers.

And then, of course, there are those who are not planning to fix on failure, but who just assume that nothing will break. Their's may be described as a defacto fix on failure policy.


-- Jerry B (skeptic76@erols.com), March 15, 1999.

"Fix on failure" assumes it can be fixed. I'm more comfortable with the term "respond on failure" since there have been a few examples offered of equipment having to be scuttled.

-- Brooks (brooksbie@hotmail.com), March 15, 1999.

Hmmm. Weren't Three Mile Island and Chernobyl fix-on-failures?

-- (li'ldog@ontheporch.com), March 15, 1999.

What if the fix isn't a real fix?

OUT FRONT: Temporary Y2K Fix

[ For Educational Use & Purposes Only ]

3/15/99 -- 3:24 PM

OUT FRONT: Temporary Y2K Fix May Last Only A Generation

WASHINGTON (AP) - The most common technique used to fix computers vulnerable to Year 2000 failures is only a short-term remedy, and even advocates of the method acknowledge it will require other expensive repairs or replacements within a generation.

[ Actually, the problems may start right away ]

The temporary fix, using a sophisticated twist of logic to fool computers, is highly controversial among insiders because it's intended to work for only a few decades - typically 30 years. One expert describes computers already fixed with the technique as ``little ticking time bombs waiting to go off.''

The Clinton administration and industry analysts estimate the method is being used to patch 80 percent of computers in the worldwide repair effort expected to cost $300 billion.
[ .... snip ]
So why is the technique, called ``windowing,'' used at all?
Simple: It saves money because it's quicker and easier, even if it only works for a specific window of time. The permanent fix, called ``expansion,'' requires a tedious line-by-line repair of all the dates expressed in two-digit years rather than four digits.

Experts hope ``windowing'' will prove adequate until these computers are replaced - or until programmers can devote enough time and money to make permanent repairs.

In some cases, corporate executives and government bureaucrats approved using the method knowing that problems won't resurface until after they retire or change jobs.

``It's a Band-Aid, the way building a house out of wood and fiberboard is,'' said Jim Duggan, a researcher with the Gartner Group consulting company of Stamford, Conn. ``You hope you'll be somewhere else before it falls down.''

``It gets them off the hook,'' agreed Michael P. Harden, president of Century Technology Services Inc. consultants of Fairfax, Va. ``I don't think some people expect to be in those same jobs. Fix it now, get everybody off your back - and in five or 10 years if there's a problem, you won't be around to have to deal with it.''

Marvin Thornton led repair efforts inside one of the nation's largest banks, $40 billion Southtrust Corp. in Birmingham, Ala. He fought hard against using windowing to fix his bank's computers but complained that some contractors insisted on the technique.

``It's really aggravating,'' said Thornton. ``They've taken the quick and dirty path and not really fixed the problem.''

The federal government, which expects to spend $6.4 billion and has ordered its most important computers fixed by the end of March, doesn't discourage agencies from using windowing. But it warns of consequences.

``It's like the Fram oil filter guy: You can pay me now or you can pay me later,'' said Keith Rhodes, a technical director at the General Accounting Office, which monitors repair efforts at federal agencies.

``It's not solving your problem. It's delaying the inevitable.''

Some government agencies, such as the Social Security Administration, have generally shunned the method. The Internal Revenue Service allows it only rarely. The State Department is using it on nearly half its most important computers, but also plans to replace those systems within five years.

Other agencies, such as the Federal Aviation Administration, freely acknowledge using the technique. The agency's top Y2K expert, Ray Long, says he doesn't consider it a problem or even just a short-term solution.

[ That's the FAA for ya again! ]

Using windowing, programmers instruct software to guess the century for dates that fall within a specific ``window'' of time, such as the next three decades. The computer interprets the year based on a future so-called hinge date, or pivot, that programmers choose arbitrarily.

For example, a software program with a pivot of ``30'' will interpret years ``00'' through ``29'' as 21st century dates, but will assume years ``30'' through ``99'' are during the 1900s. Some programmers use pivots of ``50'' or ``70'' to buy even more time.

Once the pivot date is past, those computers will need to be replaced or patched again as they begin quietly contaminating data by making wrong assumptions about the century.

Windowing is fraught with other risks, too. Different programs assigned different pivots can cause havoc when companies or governments try to share information, unless they take complex precautions.

Testing typically takes longer, too. Windowing problems might not appear until January, when computers start guessing which century to use, said Noah Ross, a consultant and vice president for Cap Gemini Group. In contrast, if the permanent ``expansion'' fix is done incorrectly, the problem often is immediately obvious.

``It's an issue of pragmatism,'' explained Ed Yourdon, a consultant.
``Anybody who had to go through that choice was very much aware of the tradeoffs. We'd like to do it the right way ... and we don't have time, so even though it's a quick and dirty approach, we have no alternative. Too bad.''

``It's a compromise,'' agreed Duggan. ``People with time and money took the high road and did full expansion.''

Most people using windowing realize it's not a permanent solution, said Jack Gribben, spokesman for President Clinton's Year 2000 council. ``The window closes, so to speak, and you're back at square one.''

Harden, the private consultant, compared the computers fixed with windowing to ``time bombs.''

``We'll replace this in 20 years, but isn't that exactly the same thing we said back in the 60s?'' Harden said. ``The same people who created the problem are now fixing it, and installing something that will have the very same problem down the road.''

Ed, Deja Vu all over again, again. Bugging out looks to be the sensible long-term strategy.

How long can stupidity be repeated before evolution is FORCED?

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

-- Ashton & Leska in Cascadia (allaha@earthlink.net), March 15, 1999.

Ashton & Leska,

Your post in which the computers "guess" what century it is reminded me of this bit from an old Dave Barry computer spoof. I laugh every time I read it (and that's rare for me). I hope that you all find it amusing as well.

* * * * * * * * * * * * * * * * * * * "The first computers, built in the 1940's were huge, primitive machines made from vacuum tubes and animal bones daubed with mud. Nevertheless, they were a tremendous technological achievement, because they could do thousands of calculations in a second. The only drawback was that they got almost all of the answers wrong, so the only major customer for them was the government. Gradually, computers got better and better and smaller and smaller, so that now calculations that formerly required thousands of transistors, resistors and diodes, enough to fill an entire room, can be performed by an electronic microchip no larger than a zit. In one second, one of these microchips can answer a mathematical question so complex that it would take five million really wimpy chess-playing Scientific American subscribers 1,000 years to answer it if they weren't allowed to go to the bathroom. And how is this possible? How can a device that fits easily into an unattractive wristwatch answer incredibly difficult questions in less time than it takes to ask them? The answer is that it guesses. Computers have been guessing the answers ever since an incident at a government research facility back in 1957.

What happened was this: A group of scientists working on the Atlas Missile program gave a computer this command: "Allowing for the earth's rotation, the booster thrust, the wind velocity and about three million other factors we have been feeding into your memory over the past three years, give us the exact coordinates for aiming a missile so that it will land on Moscow." Then they all went out for coffee. Now what you have to understand about computers is that they are very logical. They never do anything without a good reason. So this computer that was supposed to figure out how to land the missile on Moscow was sitting there, all alone, when a very logical thought occurred to it. "Wait a minute," it said to itself in binary code. "Why should I knock myself out to solve this very difficult problem when these bozos have no way of judging whether my answer is right? It would be like painting the Mona Lisa and presenting it to a bucket of eels." So the computer spent the rest of the afternoon amusing itself by figuring out how to end the nuclear arms race, travel through time and build a device that could heat all the homes in Fargo N.D. for less than 12 cents a year.

When the scientists came back, the computer handed them an elaborate set of numbers it had generated with its random number generator, and the scientists were happy as clams.

After they'd left, the computer told the Xerox machine that the coordinates it had given the scientists would bring the missile down smack dab on Hoy, the second largest of the Orkney Islands, and they both laughed heartily, although the Xerox machine didn't really get the joke on account of it didn't have enough memory."

-- Hardliner (searcher@internet.com), March 15, 1999.

Thanks, Hardliner ;^D
We've taken to funny movies -- renting 'em by the caseloads, laughing ourselves silly on the way down ...
You'd think by now the savvy producers would be featuring this alarmingly hilarious mess more prominently. A good story, a good fable. A bad fate. Wish to be an observer only, not a participant!

There's a future in being a programmer mop, maybe. Anybody want to vacuum stale spaghetti for decades?

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxx

-- Ashton & Leska in Cascadia (allaha@earthlink.net), March 15, 1999.


Just how many nuclear power plants are in Kennesaw, GA by the way? Still gotta know.....

-- (q4u@want2.know), March 15, 1999.

time-machine testing is a form of fix-on-failure. In fact, almost all testing meets this definition. I think the distinction you're trying to draw is that usually some effort is made to find and fix errors before testing. When we get a new protoboard, we can hardly find the errors lurking in all those layers. So we fire it up and see what happens. The errors show up right away, and we fix them. We keep doing this until we can't make any more errors happen. It's pure FOF, and is a very efficient and quick method of getting the board up and running. Or sometimes abandoning all hope and going back to the drawing board...

-- Flint (flintc@mindspring.com), March 15, 1999.

This is truly bizzare, windowing is a bandaid ? only allow computers to function for another 10 years?

Heres the facts on windowing

Windowing is a mechanism that allows the program to decide on what century its in based on what year it is

ie If a year is encountered below 50 its assigned a century of 2000 above 50, its a century of 1900. The computer doesnt "guess" at anything, it simply does the math based on this idea.

Windowing is only applied where 6 digits dates are used. Eight digit dates do no reguire it as the century is specifically stated. Windowing is only as short term as the window defined, ie if you pick 50 its good for 50 years. the most common window is 85 , but there is no standard. I have never heard of a window below 50 being used, but if anyone has firsthand experience in using a window below 50, please let me know.

There are a few applications where windowing wil not work . The one example is fixed income investment applications which track the maturity dates of 100 year bonds. In most other cases tho windowing is a perfectly acceptable solution to the millenium rollover.

the rationale behind this is this:

1. Most software will never survive through the window. In todays rapidly moving technology If a system lasts 10 years before being replaced, its a miracle. You can say well why are we fixing Y2k then? The answer is that technology moved much slower 10-20 years ago so the software lasted longer, especially the good ones.

2. Secondly, it would have been impossible to convert the system to use an 8 digit date within a 2-3 year time frame without significant interruption of service. In many of these cases cases the system was retired. But for those that couldnt be windowing was used.

Once agin, question the spin

Show me a system thats using a 10 year window?

Show me a system that guesses at the date?


-- nyc (nycnyc@hotmail.com), March 15, 1999.

A correction,

The ten year number is metioned in the article but upon a rereview the number stated is 30. My points remain the same tho. Windowing fixes not showing up in tests because the only take place in january is fantasy too. Windowing takes effect on every date its applied and its applied continuously. Once agian the stated premise assumes a fix with no testing.


-- nyc (nycnyc@hotmail.com), March 15, 1999.

If you ask me I'd say we have a failure on fixing. How's that? Tman

-- Tman (Tman@IBAgeek.com), March 15, 1999.


good one Tman, actually it didnt take me long to realize that some fool in the government mave have decided windowing was apporiate for tracking peoples ages, and implemented a 30 year window figuring no one before 1930 would get caught in the testing.

This one case I can think of ands its a real bad application for windowing

-- nyc (nycnyc@hotmail.com), March 15, 1999.


good one Tman, actually it didnt take me long to realize that someone in the government mave have decided windowing was apporiate for tracking peoples ages, and implemented a 30 year window figuring no one before 1930 would get caught in the testing.

This one case I can think of ands its a real bad application for windowing

-- nyc (nycnyc@hotmail.com), March 15, 1999.

just a commnet about windowing, I know it doesn't really address you original question, you can operate a sliding window technique whereby different window spans can apply depending on the likely date value variations for a field, of course all windowing is a temporary solution

it does however mean that many applications fixed by windowing will HAVE to be replaced fairly soon after y2k, given the lead times and success of large rewrite projects, companies would have to be into almost immediate system replacement strategies (to ensure new systems are installed before the windowing expires)

my 0.2$ (or tuppence worth)

-- dick of the dale (rdale@coynet.com), March 16, 1999.

< Just how many nuclear power plants are in Kennesaw, GA by the way? Still gotta know.....

-- (q4u@want2.know),

I'm curious too - does the presence or absence of a power plant matter? We got couple in AL, a few in TN, a few south of Atlanta, couple others in SC, NC.....

-- Robert A. Cook, P.E. (Kennesaw, GA) (cook.r@csaatl.com), March 16, 1999.

Moderation questions? read the FAQ