Complexity, Testing, and Y2K: Why some engineers still Don't Get It

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

A quote from Yardeni's recent report on Y2K prompted me to write this. In particular, it was a quote concerning the thoroughness of the FAA's analysis and software testing:

Mr. Willemssen [GAO] was worried that the FAA's analysis "may not have found all date processing code in the Ultra assembly language programs that run in the UNIVAC processor."

It occurred to me that the brief "testing" that is being done (an hour here with an approach radar, an hour there with an Boeing 737) is probably not substantial enough to uncover any defects.

Another telling quote was posted today by our own Flint, where he said (in reference to the banks) "Don't worry. Everything will be tested. Thoroughly."

In the software development world, it is common knowledge that the first casualty of shrinking budgets and diminishing schedules, after Training and Documentation, is Testing. In my profession, I have seen the complexity of systems (function points, lines of code, number of interfaces, etc.) rise exponentially, without a corresponding increase in testing. The result has been buggier code, more severe slippage, and higher costs (sound familiar Mr. Gates?). Although I started warning my customers and managers about this phenomena about five years ago, I was always greeted with disbelief, ridicule and laughter. They aren't laughing anymore, but they still haven't grasped the concept.

é 
                                          O    
C                                      O       
O                                    O         
M                                  O           
P                                 O            
L                                O |           
E                               O  |           
X                              O   |           
I                             O    |           
T                           O      |           
Y                         O        |           
                       O           |           
                    O              |           
                O                  |           
            O                      |           
       O                           |           
  O                                |           
 
1990             1995             2000
Figure 1. System complexity over time-projected without Y2K factor

Figure 1 depicts the way I expected this effect to manifest itself before I understood the systemic nature of y2k. I expected a continued rise in complexity (with a continued decline in testing), until an Omega Point was reached, and the curve would start to plateau. This corresponds to the cusp in the graph around 2000.

é 
                                               
C                                              
O                                              
M                                  O O         
P                                 O    O       
L                                O |      O    
E                               O  |         O 
X                              O   |           
I                             O    |           
T                           O      |           
Y                         O        |           
                       O           |           
                    O              |           
                O                  |           
            O                      |           
       O                           |           
  O                                |           
 
1990             1995             2000
Figure 2. System complexity over time-projected with Y2K factor

Figure 2 is my revised projected complexity graph which takes the y2k effect into consideration. My point in relating all this is that most people (and I'm talking engineers, here at work) could not see the effect I described in Figure 1 coming, and are still not sure what it is they are experiencing. So the fact that there are engineers that do not understand the consequences depicted in the second graph is not so surprising.



-- a (a@a.a), April 14, 1999

Answers

[snip]

Principle 120
Use MCCabe Complexity Measure
Although many metrics are available to report the inherent complexity of software, none is as intuitive and as easy-to-use as Tom McCabe's cyclomatic number measure of testing complexity. Although not absolutely foolproof, it results in fairly consistent predictions of testing difficulty. Simply draw a graph of your program, in which nodes correspond to sequences of instructions and arcs correspond to nonsequential flow of control. McCabe's metric is simply e- n+2p where e is the number of arcs, n is the number of nodes, and p is the number of independent graphs you are examining (usually 1). This metric can also be "calculated" by the cookie cutter analogy: Imagine pressing a cookie cutter shaped like the program graph into rolled-out dough. The number of cookies produced (the number of regions in the graph) is the same as e- n+2p. This is so simple that there is really no excuse not to use it.

Use McCabe on each module to help assess unit testing complexity. Also, use it at the integration testing level where each procedure is a node and each invocation path is an arc to help assess integration testing complexity.
Ref: McCabe, T.,"A complexity Measure," IEEE Transactions on Software Engineering, 2,12 (December 1976),pp.308-320 [/snip]

From Davis, Alan M.,201 Principles of Software Development



-- Doc & Test (retrofit@critt.com), April 14, 1999.

So simple, eh?

-- Doc & Test (oivey@critt.com), April 14, 1999.

a. Yes, exactly. Why did you expect the curve to plateau originally (limits on complexity related to ability to ship in light of inadequate testing or aspects related to inherent nature of complex systems or both?) Nice that we'll have, uh, less complex systems post-Y2K.

With respect to Flint, I found his remarks about banking (believe it or not, it's not his intelligence I've ever questioned!) weird, since he has often agreed with me that testing is the one thing that will unquestionably be shorted this year. Plus, he's planning to take out most/all of his own bucks from banks. That didn't seem to add up. Flint, what say ye?

As a software guy, I have found the recent threads on hardware intriguing since the nature of testing seems to be rather discontinuous with the kind I'm familiar with. a., do you agree?

By the way, regulars, I think a's software experience has been too little noticed on threads where he comments. This guy knows his stuff.

-- BigDog (BigDog@duffer.com), April 14, 1999.


OK, a. defends Slick but still ......

-- BigDog (BigDog@duffer.com), April 14, 1999.

Big Dog: Thanks for your support. To answer your questions:

Both.

Flint never ceases to amaze me (that's one of the main reasons I'm addicted to this place is that I find it immensely entertaining). But, I don't fault him for his beliefs and perceptions. I am starting to see some really strange things in the workplace as the predicament we're in evolves.

I think that up until now, software testing has been harder than hardware testing, because software is a virtual device and hardware is a physical one. But, the embedded issue changes that. I think testing is turning out to be a real bitch on systems that are already installed and in place, and hence the happy face "we'll fix on failure" mentality for the hardware grunts (sorry, Flint).

As I've said before, I am technically more a fit for c.s.y2k, but I chose to call this place home for several reasons. After some early correspondence with Cory Hamasaki and Paul Milne last year, I decided they had the USENET scene covered pretty well.

BTW...as soon as someone worth electing comes along, I'll ditch Clinton pronto. But it always seems the election thing is rigged, or a choice of the lesser of two evils. I did the Perot thing for a while, got egg on my face. Thought about Libertarianism, but their party is not electable, etc. etc.

-- a (a@a.a), April 14, 1999.



Yes, this is OT but what-the-hay, it's the YOURDON forum, right? I flirted with Perot in 1992 until he went gonzo. Moreover, though I despise Slick, I have found the behavior of congressional republicans entirely pusanianimous or however-you-spell-it APART from anything having to do with Clinton-this or that. Sigh.

-- BigDog (BigDog@duffer.com), April 14, 1999.

'a' and Big Dog:

I meant exactly what I said, but you couldn't see the expression on my face. Everything will be thoroughly tested. No question about it. No question that this testing mostly *won't* happen before rollover. But I didn't say it would happen before rollover, and I was very careful not to say that. The code will be tested 'live'. Thoroughly.

-- Flint (flintc@mindspring.com), April 14, 1999.


Geez guys!

Give Flint some credit, guys....his original post indicating that the code would be tested thoroughly was, IMO, SARCASM! He means that it will be "tested" when it is already in PRODUCTION.... AFTER Year 2000....and to my dismay, he is probably quite right...

-- seagreen (seagreen@seagreen.com), April 14, 1999.


" The code will be tested 'live'. Thoroughly."

At last. Something I understand, and can believe.

Once you realize this, you can't help thinking about gathering supplies of food and water and (in cold climates) finding alternate means of not freezing. Which leads to thoughts such as, "how long is the testing phase going to last?"

From there on, you're on your own. Except for this forum.

-- Tom Carey (tomcarey@mindspring.com), April 14, 1999.


folks, give flint a break. he was obviously being sarcasted when he said the code would be tested. he meant post-1/1/00, after the rollover. i burst out laughing when i first read out, 'cause he's right. we- the entire world- are about to become the biggest focus group in history.

[for those of you who are blessedly ignorant, a focus group is a group of people used to test a new product or service. they are exposed to the product in its rough form - like a software beta test. for instance, a closed movie preview is a focus group. and based on their reactions, producers will make changes in the film.]

-- Drew Parkhill/CBN News (y2k@cbn.org), April 15, 1999.



oops, i meant "sarcastic." duh.

-- Drew Parkhill/CBN News (y2k@cbn.org), April 15, 1999.

Drew --- Thanks, that makes sense, at least until/if Flint clarifies. My whole point was it seemed out of character with other things he has said. I do think, though, software and hardware guys have, in general, a different take on testing .....

-- BigDog (BigDog@duffer.com), April 15, 1999.

The following might be a bit "lower level" than the "real" software experts in this forum need - they are already pro's in what they do and in the specific systems they test - but it is extracted from a project report intended for a more general audience to begin to get that audience aquainted with certain parts of testing.

...........

I've used an analogy of physical testing a structure before - like the Tacoma Narrows Bridge that collapsed under relatively light winds, but had only one car on it. There is, of course, a difference in intent and method between a physical test on a physical system, and computer test sequence on hardware, and a computer test sequence on software. This suspension failed tragically because it was subjected to more stress than its members could withstand. And any physical structure will fail under excess stress. The entire point of training engineers - and certifying them professionally is to allow the public to go in a building or drive on a highway, or cross under a railroad bridge with a degree of confidence that they will not die - that the structure will not fall down while they are inside.

The first kind of engineering test - a physical test of a physical structure or bridge pier or pipe, for example, cannot be a test to destruction - else you would destroy the very products you are trying to verify. Instead - you can only test to the specific design limit you want to verify, and do that test under controlled conditions. Once tested, you post limits to keep your design conditions from being exceeded. A speed limit may be imposed. A bridge may be posted to limit the size of trucks that can cross. Other conditions are implied through logic - a train engine is heavier than a truck - but trains are not expected to cross truck bridges. So when a truck carries an extra-heavy load (like the diesel engine for a train) - the bridges it crosses are specially surveyed to make sure they will hold the load.

Other physical tests are one-of-a-kind. A brand-new steel pipe, for example, will be hydro-staticaly tested to 150% of its design pressure to veify there are no leaks, and no bad spots in the welds and valves and fittings. Once "hydro'ed" you don't need to hydro it again to check for steel defects (under most circumstances) and so only need to raise pressure to 100% to verify a repaired valve doesn't leak at the bolted joints. In this case, you can never really "cliam" what pressure is needed to break to pipe; at best you can say "Well - it didn't break at 150% rated pressure, so don't let it get over 115% rated pressure in the future. To be safe, we will set a relief valve at 115% rated pressure."

Electrical testing is fundementally different. A new motor will be run at regular voltage, never at 150% rated voltage - and it will not be run at 150% rated load, nor at excess speeds.

The "test" for a motor is to check whether it runs at all, whether it is installed and hooked up to the right controller, and whether it rotated the correct direction if it runs at all. Some times a "burn-in" test is used as well to check for bearing wear and vibration; if so, the test manager can only certify - "This motor behaved correctly in this test - it probably wil keep on running successfully if the same voltage, frequency, and amps are used the next time."

Electrical components and actuators or sensors usually keep running steadily. Unlike pipes, they aren't usually subject to gradual failure or deterioration, and aren't subject to gasket leaks and corrosion. So routine operation after shutdown or repairs is usually a good assumption if the same quality input power is available, and if that power is properly connected. Thus, if the motor controller is fixed, then only a short running test is needed - to be sure the wires got hooked up correctly again.

Even a motor is relatively easy to test. Bearings and insulation resistance may breakdown over time in a motor, and so may need to be checked. But the actual rotor, windings and motor casing will likely last longer than the pump it is attached to.

Likewise, if you test electronic hardware (and implicitly embedded hardware associated with control processors, process internal computers, or embedded chips), you can't "stress test" it like the pipe that is "hydro'ed" to 150% rated pressure. if you do, you only risk burning out the components, shortening their lives, and getting the warranty voided.

So hardware test engineers have to be careful about what is tested - and this introduces a great ability to make errors and miss things that should be tested but were skipped, to double test other things, and to fail to test for certain conditions - particularly those conditions that are rare, hidden, or associated with intermittant (start-up/shutdown) conditions. Therefore, even the best of hardware testing can logically expect to miss some things.

In any case, (as I was recently and correctly reminded), you really can cause aditional failures in electrical and electronic components by testing improperly. But you have to test all expected conditions - so what is "too much" testing? Hard to tell sometimes - especially when you combine electronics with controllers with software with hardware in a complex system. All-up or end-to-end system testing cannot really be simulated under all possible conditions - there are too many unknowns. Single component testing - checking just one thing - won't determine if an error condition caused in one place can cause an unexpected problem elsewhere. Isolated testing - where the component is isolated, and its controller(s) and software are tested - can let a person know if the single component responds to the input conditions right - but again - only the responses that have been tested are known to work under the specific condition that they were tested.

And even then, you have manipualted the results by isolating the system and its input conditions from the "real" world.

Software testing - that is completely different.

There - the entire goal becomes to "break" the software - to find as many "bugs" as possible. The unknown dangers become not the physical destruction of a pipe or piece of steel through excessive stress caused by testing, but the danger of NOT testing some particular combination of inputs and decisions that will result in a failure in the output. That is, the danger involved in software testing becomes UNDERstressing the software and thus allowing failures to remain in the code. These failures, of course, will remain in the code until found later.

Some failures will likely never be found - there are too many "decision points" in any software package of even small complexity to run every combination of possible events. The result of this effect of "hidden" flaws in existing software, and the impossibility of finding and removing every flaw under every condition is that virtually any software change may "create" new failure modes in many other locations of the program.

The second danger of software testing lies in the ability of most software to "run" most of the time and "succeed" most of the time under "most" of the input condtions. The result is "Flawed" software that is released and put into service. Outright "Failures" are relatively rarely purchased and installed. But "Flawed Software" - as all here have pointed out - are almost universal in most software packages commercially released. Flawed software is equally common (perhaps more common) in customized packaged installed in only limited areas. Worse, failures (bad data or incorrect outputs) may come from flawed programs, and such failures may not be recognized as such by the people revieing the outputs. Particularly if the output is hidden in vast data streams or as "invisible" commands to remote units in sites not under the immediate control of a human observer.

What is unknown - now - is the cumulative effect of a large number of simultaneous failures and flaws occuring in several different programs at the same time in many different areas. That "unknown" is what we may found out in a few months.

Third - The results of software testing are very "subjective" - Not in that the results themselves are subjective - it can sometimes be very clear when a mistake is made in the output. Rather, software testing is subjective in that the original programmer has a vested interest in verifying that his product "works" - he has an emotional and economic reason to be able to tell his or her boss - "I tested it and it works." Obviously, the faster and more often he can deliver "tested software" - the higher his salary and the better his boss is treated.

The temption (in software) is very, veryu strong against testing. It slows the release date. It may find new problems that need more time to fix, to test the new fixes, that may themselves create new errors. It is expensive - the customer wants to pay for new software that works with all the new features that he ordered - the customer does not want software endlessly stuck in "testing" and "debugging cycles." Most software testing must (by its nature) occur after programming is completed. If/when it finds an error - the result of software testing is to require the programmer to stop his latest job, and return to a section that was previously "finished." Just the time to stop something, start soemthing new, finish it (again), wait for the second test results, ..... is expensive.

The result of these "cultural dis-incentives" is generally poor quality software that is released with flaws and undiscovered errors.

-- Robert A Cook, PE (Kennesaw, GA) (Cook.R@csaatl.com), April 15, 1999.


Thanks, Robert. This is trite, of course, but as a software manager (and I'd like to think I was a good one), a key was realizing that great programmers and great testers "hate" each other and SHOULD be recruited-selected with that "trait" in view. Programmers view testers as "those who can't" and testers view programmers as ego- driven babies. Genius testers are as critical as genius programmers, however.

-- BigDog (BigDog@duffer.com), April 15, 1999.

big dog,

my general observation is that you're right, the software & hardware folk do seem to be coming from different universes (sliders? :)

-- Drew Parkhill/CBN News (y2k@cbn.org), April 15, 1999.



I've never had the privilege of working in the kinds of settings you folks seem to routinely inhabit.

So, I wonder about the expertise, competence and reliability of quality assurance within smaller companies (1-5Mil annual revenue) who build and deploy mission-critical systems. That is, systems like those interconnecting to make up your county's E911 system.

~C~

-- Critt Jarvis (middleground@critt.com), April 15, 1999.

It's common sense, Critt, as you'd expect. That is, milage varies. Some small orgs are superb and/or have folks in them with dedication and smarts. Others lack rudimentary competence. Others fall between. Sorry, not very profound I know!

-- BigDog (BigDog@duffer.com), April 15, 1999.

Moderation questions? read the FAQ