Why is testing required?

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

I've got a rather simple question: Why is testing of a system or program's Y2K remediation necessary? The time necessary to spend on this has been estimated at 40-70% of the whole project.
Does the testing process typically involve 2K-rollover simulation in order to discover bugs; an offending bug is discovered and fixed; then the simulation is repeated until all obvious bugs are repaired. Is that how it goes? If so, if that's the basic way to do the testing, and if people are saying that 40-70% of the Y2K project is testing, then doesn't it follow that an untested, or partially tested, system will *crash* or be *useless* precisely to the extent to which it has not been adequately tested? Is that a good inference?
If so, I've just gotten a lot more scared. Please help me out here!

-- Larry Sanger (Sanger.3@osu.edu), May 16, 1998

Answers

Larry,
Here are my views based on programming a wide variety of computers and using at least 15 programming languages over the last 35 years. Anytime you make a change to a program, you risk introducing an error, simply because most programmers are not capable of thinking of all the ramifications of what they have done.
If you change a program to use 4-digit year dates instead of 2-digit year dates, you also then have to change the format of the data, which may be an extensive database. This introduces a whole additional series of opportunities for making errors, because you will have to create some new programs that read the old format and change it into the new format. You may have to change the formats of input forms. You will change the format of printed reports, and the data may no longer fit the way it used to. You will have to change the methods by which you control the reusability of data storage tapes. You may have to write programs to generate testing data for your revised system. You will have to think of all the different situations in which you may run programs - for example, beginning of the year, end of the year, beginning of the month, middle of the month, end of the month, months of 28 days, months of 30 days, months of 31 days, leap years, non-leap years. And for real time programs such as air traffic control, the difficulties of testing would be geometrically more challenging. And even after a system has been in use for years, situations arise in which a bug is discovered when some particular set of conditions arises for the first time.
The testing scenario will also be made more challenging if the program being tested is one of those undocumented or poorly documented programs for which there is nobody around who remembers what the program was really supposed to do, so how would you be able to think of all the test data you should generate and run?
<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>...,,,////

-- Dan Hunt (dhunt@hostscorp.com), May 16, 1998.

Larry - From 20 years experience with a major computer driven phased array radar system: 1. Programmers make errors. Testing, on a stand a lone basis, helps uncover these errors. 2. Most systems involve more than one computer. The computers talk to each other. Interfaces are involved. Errors may be involved in the handling of the interface data. The only way to find this is by testing the combination of computers after the individual elements have been tested. 3. Elements may or may not handle data and requests from an outside source in the same manner as an internally generated request. Testing is needed to check out all possible variations.
As a result, the work up to final testing of a system of multiple computers involves days, weeks, and months, of carefully orchestrated testing. We used to begin with simulators, but rapidly moved to the real world. Each computer element would be tested stand a lone, and each in turn would be tested with each mating element, and then the whole ball of wax would be tested in the final configuration.
This, by the way, was for a relatively small overall system in turns of number of individual computers and total lines of code, and involved only modest amounts of change to the working system that we started with. Multiply the difficulties by a factor of 100 or 1000 to get to business systems that must rely on interfacing with outside terminals.
It's a mess. Hope this helps.

-- DeAlton Lewis (0mb00782@mail.wvnet.edu), May 17, 1998.

Let me try to re-ask my question. I don't think I was clear enough the first time.
Basically, I'm looking for an explanation of the fact (assum- ing it is a fact) that the testing process requires 40-70% of all time devoted to Y2K problem-solving.
Is the reason (1) it is only after 40-70% of the time that a system no longer experiences errors that essentially render a system useless? Or is it (2) that it takes this long simply to (somehow or other) *review* a system or program to check it for bugs?
(1) is essentially a debugging process. (2) is, rather, a bug *checking* process. If (1) were the explanation for that percentage estimate, then a system that were not complete would not be functional (to the extent to which its testing/debugging was incomplete). If (2) were instead the explanation, then it might (miraculously) be the case that a system could be tested and no bugs (that need repairing) would be found; and that same system, were it not tested at all, would experience no bugs (that need repairing).
I hope that's a little clearer. Do you see why I think this is an important question? If it's (1), then we can practically assume that those businesses which say they're halfway done with their testing on December 31, 1999 will definitely be resorting to contingency plans on January 4, 2000. If (2), then we cannot assume this.

-- Larry Sanger (Sanger.3@osu.edu), May 17, 1998.

Geez, I think I still failed to be clear enough...
Insert the following paragraph in the appropriate place in my last post:
Is the reason that "testing" consumes 40-70% of the Y2K correction process that (1) it is only after performing Y2K simulations for *that* length of time, on average, that a system no longer experiences errors that essentially render the system useless? Or is the reason instead that (2) it takes this long simply to (somehow or other) *review* a system or program to check it for bugs?

-- Larry Sanger (Sanger.3@osu.edu), May 17, 1998.

The time spent testing is used to correct errors that are found. Begin with a simple test, test A. Run it. Find problems, pour over data, find errors, correct errors. Repeat test A, find more problems, repeat the cycle until A runs clean. Move to B, repeat.
Theoretically, by the time testing gets to the final test it should have exercised all possible branches in the code under all conditions. That's the key -- all possible branches, all conditions. It's easy to do with a simple program and one computer, and gets more difficult as the number of possible logic paths increase, and gets worse when you begin to assemble as system with multiple computers (each of which may perform a different function).
As nearly as I can tell, this answer corresponds to (1) in your revised question......it takes this long in order to test to make certain all the problems have ben resolved.

-- DeAlton Lewis (0mb00782@mail.wvnet.edu), May 22, 1998.

As to question #2, I believe it refers to what is called "desk-checking" of code, which is where you sit at your desk and look at your code. This ususally has the result of putting the programmer to sleep, and is not generally done.
As to question #1, every time a bug is found and allegedly fixed, regression testing should be performed. Regression testing means going back and testing everything that passed before to make sure that the bug fix didn't break something else. This is time consuming, especially if there are a lot of things to test.

-- Amy Leone (aleone@amp.com), June 26, 1998.

How many times did you have to ask your question to get an answer?
How many times did you have to ask it to get it phrased so you were asking what you wanted to ask in the first place?
That why testing is so hard...
Each trial requires a feedback loop -> run test -> find problem(s) rewrite code -> rerun test -> some things are fixed, others newly "broken" other are discovered. FIx code. Repeat until shaken, not stirred. (Oops, got Bonded there.) Exit loop. Release that version. Find second problem. repeat loop.

-- Robert a. Cook, P.E. (cook.r@csaatl.com), August 31, 1998.

In addition to what was previously mentioned: just figuring out how you're going to test everything and setting up the test environments with all the necessary data can be a very big, error-prone task, requiring many iterations - especially when you're testing links between systems. And even if you didn't find a single bug (which is not going to happen), people will make lots of mistakes just running the tests. And all of the results have to be reviewed and validated, which takes longer than you might think.

-- Deborah Barr (debbarr@concentric.net), September 01, 1998.

Larry,
I think the generic 40-70% estimate range answers your question.
It's been my experience that about half the effort expended from just the programmer's side alone requires about 50% of his productive time in setting up, executing, and analyzing test scenarios.
If things proceed with few bugs discovered, then your #2 scenario is the case and, with luck, the time expended falls in the 40% area.
However, if, while testing, many bugs are uncovered which require additional repairs and further testing, the situation is more like your scenario #1, which could easily add 70% to the coding effort.
Naturally, the more experienced, better educated, more meticulous, more productive programmers come out more often on the 40% side than the 70% side. Now, guess how many programmers fall into this category?

-- Nathan Hale (nospam@all.org), September 07, 1998.

Moderation questions? read the FAQ