Tracking mean time between failures/fixes

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Both Ed Yourden, and, I believe, Infomagic, noted the importance of the relationship between mean-time-between-failures and mean-time-to-repair. Are any of the IT industry analysts tracking these numbers? It would seem logical for each enterprise to generate this data, but is anyone collecting and collating (under the y2k standard cloak of anonimity)? This relationship should give us a way to quantify the situation.

-- Pinkrock (aphotonboy@aol.com), August 09, 1999

Answers

Pinkrock,

There's a subtle nuance here: MTBF involves measurements of the product of a software development effort; a more common version of such a measurement is defects per function point, or defects per thousand lines of code.

The typical software development organization measures very little of anything -- and whatever they do measure is associated with the product. Thus, they measure how many lines of code they produced, how many programmers were required, how many person-months, and how many dollars it cost to produce all of it.

Meanwhile, mean-time-to-repair (MTTR) is more concerned with the process of software development, which VERY few organizations bother measuring. Your MTTR figures will be better if your design, coding, and testing processes are good ones; they'll be better if you have good testing tools, good configuration management processes, etc.

A full-fledged metrics initiative, in which both product and process are measured carefully, is usually associated with SEI level-4 organizations. I haven't seen the figures from the SEI lately, but I believe that less than 10% of U.S. software organizations are at this level of sophistication.

Bottom line: with the exception of one or two organizations that I've worked with, the Y2K-related MTBF and MTTR figures are not available. And if they're not available, it means that organizations are simply guessing (or hoping) that as of Jan 1st, they won't be overwhelmed with a large quantity of bugs, and that they'll be able to fix "old" bugs faster than "new" bugs occur.

At the very least, it will take a lot of heroic effort from the maintenance programmers; and if we're lucky, maybe 80-90% of the large organizations will be able to overwhelm the problem with brute force. As for the other 10%, not to mention the SME's who are clueless about all of this ... well, rather than speculate, let's just wait for another 144 days and see what happens.

Ed

-- Ed Yourdon (HumptyDumptyY2K@yourdon.com), August 09, 1999.


Thanks, Ed. I thought it sounded to simple. In one of my previous lives, I was responsible for maintenance of a mining heavy equipment fleet. We would track the time between service, failure, etc, time to repair, productivity, etc. We treated those machines as if our lives depended on them. Because they DID. Now our lives depend on a bunch of machines which have not been maintained or serviced properly, and it is UNKNOWN what will happen. I, like you, now have my own small assortment of machines on which I will stake my life (and my small children's lives). The STAKES will guide my diligence in maintaining these systems. Thanks for all your hard work, Ed. And the rest of you too, you know who you are.

-- Pinkrock (aphotonboy@aol.com), August 09, 1999.

There is another subtle nuance in Ed's posting that explains why so few organizations collect, and manage to, these types of numbers.

Ed mentions that such metrics are an indication of an SEI level 4 organization and that few companies have achieved a level 4 (or, even more rare, level 5) on the SEI scale. The implication here is that most of the rest of the organizations out there not only don't collect these metrics but most likely don't know how to collect them in an accurate and useful manner.

I've always wondered which is worse: no metrics or the wrong metrics (wrong meaning either measurements that have no bearing on work quality/productivity or measurements that are poorly collected and offer erroneous results). Frankly, I've always suspected that wrong metrics are probably more damaging that no metrics. Ed, off the top of your head, have you seen anything from any of the metrics gurus that addresses that question?

-- Paul Neuhardt (neuhardt@ultranet.com), August 09, 1999.


Moderation questions? read the FAQ