This is sort of a availability/sl management question. We are running into a wall with idetnifying a suitable metric that helps us and our business unit stakeholders determine SL compliance.

Problem Statement: Currently uptime reports do not accurate reflect the true customer impact and there is concern that some of these numbers might mislead management.

Example: We reported 99.95% uptime during one month, however there were probably 10 events that had variuos impact to customers. Some outages may have impacted only a small fraction of users who were still able to log right back into the system and continue the work. The outage itself lasted 20 minutes and was specifically centered around a sinle web server in a configuration that involes 6 others.

Question: What is the industry norm for measuring total uptime of a ecommerce system? NOTE: One idea we are tinkering with is the notion of customer impact minutes. This ties each minute as a failure opportunity and the total customer down minutes are figured in to get a percentage of "success".

Regards, SteveR SLM Analyst

Steve Rodriguez


Steve, My initial impression is that you may be making this harder than it needs to be. Some basic statistical methods using synthetic transactions ought to give you a reasonable estimate of overall availability, taking into account partial outages, etc. The bottom line is whether or not there was an impact on a user.

Rick Sturm President Enterprise Management Associates

Rick Sturm

