For Brent -- Comments about a "year for testing"

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

First, I need to recognize the claim commonly made on the forum that testing during 1999 doesn't count. We know that organizations are returning remediated code to production in large quanitities, and it's working OK. To counter this observation, some people are arguing that until that code encounters actual current 2000 dates, remediation errors (and gaps) won't show up.

There is some justification for this viewpoint, but it's inconsistent to claim on the one hand that they didn't "have a whole year for testing" and claim on the other hand that even if they did, it wouldn't have counted because this testing would have taken place during the wrong year! If not enough of the remaining bugs would have cropped up during this "year of testing" to prevent disaster, then what difference can it make if they spent 1999 testing or not?

Second, we get to the heart of the matter. As a preface, my understanding is that the "full year for testing" essentially derives from a Gary North shibboleth. In fact, relatively few organizations made such a claim (though North chanted it over and over). And those that did were simply trying to find a simple way to make the point that they were (in their opinion) ahead of the game.

Anyone who has been involved in large projects such as remediation knows that North's implication is false. That is, the implication that all remediation is done first without testing, and all testing is done afterwards without remediation.

In practice, systems (and modules) tend to be in all phases of the process at once. An initial assessment identifies all systems whose failure represents a threat to the health of the organization, and these systems are roughly ordered by importance. Remediation efforts start with the systems at the top of the list and work down.

If you take a snapshot of the process somewhere in mid-project, you find that the most critical systems are unit and system tested (possibly in a time machine) and returned to production. The next group down is undergoing unit and system testing (and introduced bugs cleaned out), next group down is being remediated (and routines and modules tested concurrently -- when a programmer changes code, that little part of the code is tested immediately to make sure nothing awful happens. If it does, fix it and try again). Systems of least importance (the "office coffee machine" systems) might still be awaiting assessment to see if they're worth bothering with.

OK, this still begs the question of WHY organizations didn't complete the process of working all the way down the list last year, and spend all of 1999 either with ALL their remediated code returned to production, or with the aches and pains of switching to whole new software systems (like SAP) already in place with a year to get over the hump of switching.

The answer to this question, as always, isn't nearly as simple as just saying they underestimated the size of the task, started too late, and now find themselves behind the 8-ball. This is surely true of *some* organizations, but by all indications this is a small minority.

What we're really dealing with here is diminishing returns, a concept apparently foreign to many on the forum. You can safely assume that not every organization is managed by total idiots, else they wouldn't be successful businesses in the first place.

Remediation and testing are very expensive. Breakdowns, especially of critical systems, are also very expensive. It's inefficient to spend resources on remediating and testing any system beyond any possible cost incurred if that system should fail completely. This equation applies almost right down to the module level. The goal has never been to find and fix ALL date bugs, but rather to make damn sure that you haven't missed any bugs that can really hurt, and therefore end up costing you more than finding and fixing them would have cost. Systems you can live without can be postponed until later, and fixed at your own convenience (that is, after they break and when you can get around to them).

Even critical systems do many noncritical things. And even bugs in the parts that do critical things may be noncritical bugs. In the vast majority of remediation environments, those involved have a very good idea which systems (and which parts of those systems) MUST work properly. The incremental dollar is more wisely spent re-testing such systems even more rigorously, rather than spending it even looking at the office coffee machine.

Beyond this, we must deal with the issue of code freezes. Once you're finished remediating, the last thing you want is to introduce new date bugs. However, normal maintenance activities can't be totally suspended for very long, nor can the periodic software upgrades that are a fact of life everywhere. The sooner you freeze your code base, the longer you must operate with no (or minimal) normal maintenance and upgrades. Nobody can afford to just stand pat for a full year of testing. Even 2 or 3 months is a strain. Code undergoes constant change, if you want to remain competetive.

One of the main problems with the usual superficial level of analysis is the underlying assumption that "not finished" means business failure. In reality, code is never finished. It's in a permanent state of flux, and businesses would fail if it weren't. Nobody will finish remediation and testing either. There is a point of diminishing returns with testing also. Testing cannot find all date bugs, but it CAN establish that the organization can function normally without undue interruption. This is close enough -- as close as we ever come.

How many key industries have failed to reach this point? We can only guess. I personally doubt we can guess very accurately if we equate "close enough" with certain failure because "close enough" isn't 100% compliant. This is a classic case of perfect being the enemy of excellent.

So I continue to expect a manageable level of screwups, with some newsworthy exceptions. I don't foresee many dominoes. It's always true that countless bad things *can* happen, but never true that they all *do* happen. Equating the worst possible case with the most likely case is a conceptual error very commonly made on the forum. I expect when nothing close to the worst case happens, many will say we were "very lucky". But it isn't luck, it's normal. It only looks like luck to those whose expectations were irrationally inflated to begin with.



-- Flint (flintc@mindspring.com), December 10, 1999

Answers

So's how come a can of nuts has a price tag of $5+ dollars at my local store? (small can of cashews too).

-- Broke (Purse@insideout.com), December 10, 1999.

Flint,

As someone who's "done software" for about 20 years, I must tell you that I have never read so many misleading (you're too good be obviously "wrong") statements, delivered in such a marvelously persuasive fashion.

You're good. You're wrong, but you're good. If you're not being paid by some government agency for these postings, you certainly should be.

I bow to your prowness with prose, but fundamentally disagree with your conclusions, and shudder at the ease with which you blithely toss in what you must think (or wish us to believe?) are "trueisms" of software development. They aren't...except maybe in your mind; sorry.

-- joe (joe@adeveloper.net), December 10, 1999.


Isn't it fun to do a well-thought out post and get a "can of nuts" post in reponse?

You seem to think computer "screw-ups" are no big deal. They can be a big deal if errors become visible to the general public or to your customers ... whose phone calls overwhelm your organization's ability to respond. Go to this Futurist magazine link for a true story about how a computer date-related screw-up in 1971 affected life in Washington D.C. -- a very well written article by a systems guy who was there ... and he seems worried about Y2K:

http://www.wfs.org/gouchen.htm

-- Richard Greene (Rgreene2@ford.com), December 10, 1999.


joe:

It's not my intention to be misleading, but of course I can only speak from my own experience (as long as yours). During that time, it may well be that I've worked for much better-managed organizations than you have.

If I've been misleading compared to your experience, please fill us in. Just saying I'm wrong over and over doesn't give anyone a hint as to how and why.

Richard:

Computer screwups come in all sizes. Most are very small, some are newsworthy. I explicitly said I expected some newsworthy screwups. I also expect life to be hectic and unpleasant for programmers for some time to come, even though the problems may not have grave impacts outside the organization. One of the key points I tried to make was that some people here describe the worst screwups as being *typical*, and you seem to be doing the same. At least you don't bother to mention the million small problems for every one of the size you cite.

-- Flint (flintc@mindspring.com), December 10, 1999.


Flint:

Thanks for taking the time to write that post. I don't know if you are right or not, but I suspect that there some truth in your argument. Maybe all companies won't be so lucky to get off with a few days of inconvenience, but I've been wondering why some of the well- known "incidents" that we have seen recently (Hershey's, Deutsche Bank) had problems that were only temporary.

Now, what about those embedded chips? :-)

-- impala (impala@wild.com), December 10, 1999.



>> Just saying I'm wrong over and over doesn't give anyone a hint as to how and why.

you wrote:

>> my understanding is that the "full year for testing" essentially derives from a Gary North shibboleth. In fact, relatively few organizations made such a claim (though North chanted it over and over).

You're, of course, correct, that "few organizations" made that claim..because "few organizations" made ANY claim! But...many very large corporations did make that claim, which is the more important point. Thinking back, I remember Citibank making the claim...there were many more; I'm sure other people here could name many more names.

A shibboleth? I don't think so. They were optimistic and they were wrong. What's new?

you wrote:

>> We know that organizations are returning remediated code to production in large quanitities (sic), and it's working OK.

"We" do? I don't! This is the "royal 'We'"? The stories of failures are increasing.

Alan Greenspan made a famous comment to congress, re the banking industry that "99% compliance isn't enough. It has to be 100%". You obviously disagree and make your argument well. We'll soon see...

-- joe (joe@adeveloper.net), December 10, 1999.


My experience is with operations. I have witnessed and helped plan, implement software patches, system installations and the little tweaks to fix the inevitable little problems that stem from installing large fixes. The little fixes many did indeed stop production were frequently "fixed" by throwing the emterprises and the software firms MIS support at the problem. My question how much support can be expected if most of that software company's customers are all experiencing problems at the same time? Let's tip a hat to Murphy and add communication and the occasional power problem?

This BITR could easily rip out the undercarriage of the world's economic machinery.

-- Squid (Itsdark@down.here), December 10, 1999.


You don't seem to get it, that grocery prices have risen at alarming rates. Hubby had seven rolls of film developed, $75.00, what happened to $5.00 per roll. I stopped to buy, simple metal brackets for shelving. Cost was $6.00 for two brackets, Hells Bells, I need at least $100.00 worth, I walked away. We, who are making almost 50K,(before the deductions) are having a hard time. Guess owning a Great Motor Home, isn't going to be in my lifes picture.

-- Broke (Purse@insideout.com), December 10, 1999.

Sorry JOE, but I need more from you than your statement that Flint is wrong. He outlined his views on remediation and testing. Understand, I'm not sure Flint is right. I would like to hear another programmer's arguement explaining why two professionals performing the same service would differ in the outcome regarding remediation and testing.

Counter-point PLEASE......

-- Tommy Rogers (Been there@Just a Thought.com), December 10, 1999.


joe:

[They were optimistic and they were wrong. What's new?]

Yep, I agree with this. The point I was trying to make (poorly, maybe) was that this claim implied a shape and focus of remediation projects at variance with the reality. Almost nobody was able to keep to their original schedule. But I don't believe anyone really intended to freeze their code a year ahead of time and spend a full year just running more and more comprehensive test suites against this frozen code base. Was that your understanding?

["We" do? I don't! This is the "royal 'We'"? The stories of failures are increasing.]

??? This has been hashed out in detail. There have been extensive debates on csy2k about the wisdom and cost-effectiveness of slamming remediated code back into production before adequate testing -- nobody is claiming that this *isn't* being done. A few of the stories we've seen have been of the "this was supposed to have been remediated and LOOK what it did" variety. Which wouldn't have happened had the code not been returned to production.

Most of the stories of failures we've seen have concerned switchovers to whole new implementations. These failures happened because the switch was actually *made*, and Hershey et. al. must sink or swim with the consequences.

(As a footnote, I believe a large part of the perception of increasing failures is due to an artifact of the media being much more alert to them and willing to publicize them, and this forum combing carefully for any story of failure they can find. NOT ALL the increase, mind you.)

BUT, you undermine your own case here, don't you? I have yet to see a single story about date bugs that have crippled any unremediated system. If these stories reflect a real increase in computer problems, that increase is due to new or remediated systems being returned to production. You can't have it both ways.

-- Flint (flintc@mindspring.com), December 10, 1999.



Mr. Flint's arguements are well constructed. Are they true? We will find out in three weeks. But since time is so short, who really cares. If they are wrong no one will remember, because there will be far more critical worries. if they are right they will soon be forgotten. For myself, i think they are wrong. New game about to begin, Mr. Flint. Think you will be invited to play?

-- Noone (Noone@none.com), December 10, 1999.

Sorry, their Rant hurts my head. Gonna watch my own money.

-- Those Stupido Geek Folks (Hellsbells@disapperingmoney.com), December 10, 1999.

Noone:

Sure, why not? Come next year, we all get to play it where it lies, regardless of who expected what.

-- Flint (flintc@mindspring.com), December 10, 1999.


There are software glitches and then there are Y2K software glitches.

Flint, you have erroneously made them one and the same.

Flint, you have to add Y2K glitches to the "normal" everyday glitches. The "normal" everyday glitches will ALWAYS be there. Y2K glitches are like piling on. You can't seem to comprehend this.

The total number of glitches are increasing. At some point, the number of glitches will surpass the programmers ability to keep up.

We shall soon see how "good" programmers really are. Programmers have only begun to fix glitches, compared to "Avalanche 2000" about to befall them...in twenty-one days.

-- GoldReal (GoldReal@aol.com), December 10, 1999.


Sheesh Flint, sure hope your predictions are on target!

-- Need Lots of Hot Water (Coldshowers@aren'tmuchfun.com), December 10, 1999.


Flint:

I have been reading your posts practically from day one. You continually use the PROCESS of remediation as if it were EVIDENCE that most Y2k projects have been successfully (or to use your term, "successfully enough") completed. Where is the EVIDENCE that this statement is true other than your belief that (allow me to paraphrase you argument), "in order for a Y2k remediation project to have any chance of success, it must be put into production BEFORE the year 2000, and since we haven't observed any nationwide problems so far, there is little chance we will have significant problems during the new millennium". Of course, this argument falls flat on its face if a significant percentage of businesses and other institutions choose to do nothing. The latest statistical evidence I seen leads me to believe that this is exactly what's happening.

But even before debating this last point, you must convince forum participants that your "in production before 2000" contention holds water, and without reliable statistical evidence for support (does ANY such evidence really exist?), it is much more sensible to prepare for Y2k induced problems as best we can.

-- Dr. Roger Altman (rogaltman@aol.com), December 11, 1999.


Damnit, Flint...I read your post and thought "Finally there's another person on this forum who has been involved in remediation and has seen the same things I've seen. WHO could be stepping up to the plate at the 11th hour?"

You really got my hopes up...until I saw your name.

Thanks for posting this anyway. It would have carried more weight on this forum had it been posted by an anonymous programmer, but you can't reduce yourself to that anymore than I.

-- Anita (notgiving@anymore.com), December 11, 1999.


Folks:

I have learned to read all of Flint's posts before commenting. That way he doesn't threaten me with attack from his friends :-). His analysis (I have always assumed that Flint is a he; could be wrong) matches my own experience. Is my experience too limited to generalize from? SURE. Does his analysis answer the question; what will happen? Not for me. Does it deal with the problem of insertion of late arriving [but not needed] patches into the software stream? Don't think so. Has this discussion solved any of the issues that I deal with? Not really. Flint, thanks for your thoughts.

Best wishes

-- Z1X4Y7 (Z1X4Y7@aol.com), December 11, 1999.


"At the beginning of 1998, virtually every enterprise working on the year 2000 problem was committed to being ready by the end of 1998, allowing a full year for the remediated systems to be in live use before the century date change. The need to use the end of 1998 as a target was widely accepted. Government agencies and regulators all gave this as their key objective."

Year 2000 World Status, 2Q99: The Final Countdown (August 16, 1999)

-- Lane Core Jr. (elcore@sgi.net), December 11, 1999.


As I understood that "year of testing" madate, it was driven by the need to test systems that do electronic data transfer. For example, to ensure that the banking system, which clearly exchanges a lot of data that needs to be extremely accurate (no tolerance for error with financial data), would be ready.

I don't know if anyone has literally assembled a list of Fortune 1000 companies that claimed they would be Y2K compliant by 12/31/1998 with all of 1999 for testing, but I do know that it is a significant number. It certainly was not something that Gary North made up out of thin air, as one should be able to verify by simply researching the applicable pages at his web site (www.garynorth.com).

20 days.

Y2K CANNOT BE FIXED!

-- Jack (jsprat@eld.~net), December 11, 1999.

Lane Core Jr: Thanks, dude. Glad someone has once again deflated Flint's blathering.

-- King of Spain (madrid@aol.cum), December 12, 1999.

And, for once, Flint appears to be wordless. You-da-man, Lane!

-- King of Spain (madrid@aol.cum), December 13, 1999.

Moderation questions? read the FAQ