FLINT: How do different curves of recovery look to you?

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Do you think you are often arguing with "doomers" over two different areas of y2k scenario envisioning? You are largely thinking of one; they of the other. I wonder if you could compare the two areas for us.
Here's a quote from you off another thread:
"Realistically, I can't imagine *any* large company or government organization ever reaching compliance, much less reaching it in time. The 'compliance' issue is born of binary thinking. Instead, what remediation and testing do is reduce an organization's exposure. This can't be reduced to zero, and no amount of compliance declarations or lack of such declarations is going to change this. In practice, the goal of remediation and testing is to contain the bugs within manageable limits. Fix on failure and contingency planning are intended to boost the containment capability.
"Don't be fooled by the Milne presumption that nothing lies between 100% compliant and 100% defunct. All of life falls between these extremes. Few of us scored 100 on every test in school, yet most of us graduated. And we continue through life the same way, doing our best and falling short of perfect but no so far short that we can't function at all.
"So 'compliant' is a very relative term, much like 'clean' or 'tall' or 'healthy'. And if we achieve our goal, and the bugs are manageable for the most part, we've come close enough. Yes, times might be hard for a while. But 'hard' is relative too.
-- Flint (flintc@mindspring.com), June 04, 1999. "
Obviously, the remediation efforts will have some effect on functioning within organizations. You are saying, of course, that their workaround and contingency plans will keep them mostly functioning if they have prioritized their mission critical efforts correctly.
I wish we could draw curves here, but I imagine that you picture a fairly robust "curve of recovery" for an organization hit by y2k errors internal to it. In other words, 98% compliant gets you 99% functionality. 95% gets you 97%. 90% gets you 95%, etc. (For the sake of picturing the curve, 50% gets you 70% functionality, etc.)
People or departments within an organization work together daily, know each other to some extent, have a command and decision structure, and have a specific mission that individuals can work toward with or without IT accompaniment. That would be the the principal argument for a functionality level above the remediation success level.
I think most of us would picture this as a curve of recovery or functionality bending UPWARD, as a function of the thoroughness of remediation and testing.
The societal interplay of organizations may have a different curve of recovery, and I think most thoughtful TEOTWAWKI scenarios depend on the idea of that curve being not-so-robust as the organizations' internal curves.
Organizations compete, modulated by economic success in the marketplace. Organizations rely on JIT supply lines from other organizations. Organizations are vulnerable to failures of external "utilities" (power, banking, telecom) interrupting their ordinary business processes. Few workarounds exist any longer to replace these utilities. "Islanding" can cut off normal business interrelationships. There are many possible points of "breakage" in the robust functioning of society. Which losses would feed upon one another, to "snowball" into a lower societal level of functioning than even the average organization had dropped to?
For lack of certain knowledge, two principal scenarios compete on this forum and elsewhere: Major breakdowns of those supporting utilities (power, banking, telecom), and minor "local" interruptions.
(And I think we principally wonder whether there would be societal breakdown and major loss of life under the first, and how likely the event of a recession or depression under the second.)
But what I imagine we're almost all seeing is that the curve of recovery is less, or negatively sloped, for society as a whole, than it is for organizations. 95% of overall functioning by organizations -- DUE TO WHATEVER CAUSE (IT failure, banking failure, employee demoralization, etc.) -- gets you 90% societal functioning. 90% gets you 80%. And, for the sake of picturing the graph, 70% gets you 50%.
So, the individual organization's curve of functioning would look convex -- drops slowly (at first anyway, like going over a waterfall), and the societal curve concave (like a ski jump, drops fast, then bottoms). (I know Ive just reversed the upward/downward images of the curves, but, without drawing it out, Id imagine that "recovery" and "functionality" would be two different curves pictured sorry to be fuzzy about it, but I bet youll get your own images pretty quickly.)
I think that people who debate with you carry one picture in their minds, probably the more "doomly" of the two, while you carry the other. Any thoughts on those different imaginings, and their relative validities?
BTW, I think the biggest single factor in relieving the early TEOTWAWKI scenarios of last year is that electric utilities have not reported finding major embedded systems problems that threaten a grid shutdown.
I believe the biggest new negative is the mere occurrence of discussing bank runs and central banks' cash stockpiling in the context of y2k and fractional reserve banking, a confidence problem only needing y2k as a spark of doubt.

-- jor-el (jor-el@krypton.uni), June 05, 1999

Answers

I'm heading to Philadelphia for weekend, but this could be a fascinating thread. Let's not flame Flint too much (just enough to keep his edge up) --- wish I could contribute, have lots of ideas. Anything we come up with will be very "soft" but that's the nature of the beast.
Flint --- where DO you stand, as of today, on the 1 to 10 scale? U.S.? World? If the proverbial gun was being held to your head? There is a usefulness to these scales and we ALL know they are soft and dynamic.
Back Monday....

-- BigDog (BigDog@duffer.com), June 05, 1999.

I assume you've seen Dick Mill's S-Curves which represent levels of robustness of a given system, but newbies should check them out at http://www.albany.net/~dmills/scurves.htm and of course the humorous ones are worth looking at too.
Where it all get's complicated is that every link in a supply system has a slightly different level of robustness, and substitute sources have their own level of fragility, so if you plan on supplier B, if A fails, and B's own supplier fails, things get interesting and are impossible to quantify.

-- Ken Seger (kenseger@earthlink.net), June 05, 1999.

This presents a better than usual opportunity for mental gymnastics, for those who have time for them (DGIs, DWGIs, those prepped for TEOTW). Those who don't have time (like children looking wistfully through the window when they should be doing homework) are advised to do something that will physically advance the decisions they have made about 2K.
The clock is ticking. Time will tell. Have you asked the right questions, and found the correct answers? There will be a test... .

-- Lee (lplapion@hotmail.com), June 05, 1999.

jor-el:
You raise good points. Let me start by saying I can't see the future. I can try to generalize from past events and experiences, recognizing that what we face isn't quite analogous to any of them.
Lane Core wrote recently of an exhaustive assessment/test in a large refinery. The number of noncompliances actually identified was relatively small (perhaps 60) although of course it took a lot of time and resources to determine this. Out of these 60, the number of noncompliances that would have at the very least reduced production significantly was 6. Of those 6, 2 would have shut it down.
Now, what would have been the real-life experience at this refinery of NO remediation had been done, nor any assessment or testing at all. Well, it would have shut down for the more proximate of the two reasons. Since I don't know what those reasons were, I can't guess at the possibility/difficulty/speed of any workaround. Nor can I know whether an unexpected shutdown would have caused collateral damage, or to what extent.
But a couple of things stand out here: To return to full production, six problems would need to be addressed, and the rest could wait or be ignored. It seems entirely possible that the losses experienced by an untouched refinery would be bearable. Also, there is a commonality of systems among refineries (quite different from site-customized mainframe code). This point is often missed. Chances are very good that many, perhaps most, refineries use effectively the same two failing systems, in the same way. Find the problem and fix for those two, and this is communicated widely by the vendor.
Where I work, most of our equipment is basically off-the-shelf, with no one-of-a-kinds. Some of those big machines would have failed, but we were notified which ones and what to do to prevent it. Someone else had done the buttwork, of course. We were the beneficiaries. And this has been happening throughout many industries, piece by piece. A lot of transferance goes on among industries like manufacturing, utilities, refining, etc. This has a bit of a snowball effect on remediation in these areas.
So we have two issues: how quickly can physical plants be brought up to speed based on what we've learned so far, and how quickly can physical plants recover, based on error rates and magnitudes we've learned so far. This end of things doesn't look so bad. So when I read stories saying "When we looked, we found some big problems", unlike others here, I find this positive. It means someone looked, it means someone learned, it means many others who didn't look won't have to.
To me, reports of testing are gratifying indeed. I know from my own experience that a little testing goes a long way. The killer bugs show up very quickly, and hopefully can be repaired quickly (yes, there are exceptions. If you missed a field in a file and have to recompile every program that uses that file, then oops). Also, most tests, even simple ones, require a great deal of preparation -- test data properly aged, test environments properly set up, etc. So when I read about any IT testing, I assume that in most cases all the preliminary stuff has been done. From there, getting from 70% compliant to 99.9% compliant can go relatively quickly.
I'm not talking here so much about demo-tests and PR events, but about the day-to-day remediation and testing process and progress going on all the time. It isn't glamorous, nobody writes news stories about it, it generates no public announcements for us to debate over, and it's absolutely critical. Recognize also that there is a kind of bug half-life in testing. Find 100 the first week, 50 the second week, 25 the third week, 12 the fourth week, etc. Eventually, the bugs you're finding are both infrequent and minor (or only happen in obscure or degenerate logic paths). But you *never* reach a point where you stop finding bugs altogether. So at what point should you declare compliance? (And a footnote about declaring compliance -- it's become a lose-lose proposition. If you don't do it, the loonies will jump up and down about the lack of declarations. If you *do* declare complaince, then it's self-reported and doesn't count anyway.)
I admit I can't speak from experience about but IT shops. If Cory Hamasaki says they'll all fall over and can't be set back up, and if he finds no disagreement about what terrible shape they're all in, then I have to believe him.
When we start getting into the issue of just what we can live without, and how much efficiency we can lose and still function, all I can say is we'll find out, one way or the other. I know that I've come to regard some things as essential that I never needed before, and learned that they weren't (again) when I lost them.
I also tend to feel that if you're still alive and it looks like you can stay that way for a while, everything else is an inconvenience to one degree or another. We *will* have inconveniences. But people are problem solvers. Problems we'll have, no question. Not all solutions will be optimal, and many will be temporary. I've been placing my economic recovery time estimate at 5 years. I regard this as worst case.

-- Flint (flintc@mindspring.com), June 05, 1999.

Good start. I'll print and take it out with me.
Sick kid. Mustn't be caught at this by GI wife who sounds a lot like Lee above. (I work while they sleep.) But I love those gymnastics! I also think the topic has a lot to do with the doomer-polly dynamic on this forum. And therefore, its usefulness to many of us. That's why I addressed it to you, because I trust you to work at responding thoughtfully, and help move us in a good direction.

-- jor-el (jor-el@krypton.uni), June 05, 1999.

Flint and jor-el
Thought I would throw this little tid bit in. The link below is from testimony from power - telcos - health reps. in Canada. It gives a some what technical explination of the problems and solutions in those industries.
My province is at the bottom of the page and the power co. is completed its remediation and planning to roll the clocks over in the summer. The regional Telco is "service ready" which seems to indicate that they are functioning in a four digit enviorment already. Our new phone bills indicate there has been a major change in their systems.
As you can imagine this is good news. The health industry though has major problems.
INDYEV124-e

-- Brian (imager@home.com), June 05, 1999.

I guess I'm disappointed. You didn't really tackle my original question. Now I'm trying to think back to what I've read of your posting career over the last four months and remember whether you've ever written much about the odds for problems in the overall society.
I was taking off from two points -- your comments about the need for organizations to "contain" the problem without being perfectionist about it, and another thread that asked for people's 1-10 scale of expectations currently. It seemed like so many had an 8-10 and gave comparatively trivial reasons to justify it. And I think you may have chimed in on that thread and been poorly received.
(I'm going to be interrupted here any minute, so I'll post and run when it happens, and return later. I also need to re-read the S- curves that Ken pointed to.)
So I thought that you may speak mainly of INTERNAL states of readiness, and others focus mainly on the OVERALL effect of many organizations operating at a lower functioning level than they normally do. And as Ken said, since they're "impossible to quantify", I was just looking for a curve that shows our intuitive expectation about how things will work (or not.)
My first job as a s/w engineer newbie was Bug Librarian for a database project, and then liaison with QA, then writing the documentation that others left undone. So I know about the cycle of bug-hunting.
My thought last year about y2k was that this would be the first time in history that EVERY running app in the world would have to be re- compiled and re-installed on EACH platform using it by a fixed date. The universality of that requirement, WHATEVER THE NUMBER OF BUGS found and fixed or still remaining, would be a challenge to perform successfully without introducing new errors. Apparently, all those old manuals have been found?
So at this point, we are dealing with the remediation AS IT HAS BEEN DONE. (And whether Gary North should receive the President's Medal for good citizenship as modern Paul Revere is another discussion.) Everyone is doing SOME program -- and whether they have found enough of the bugs to be worth the risk of opening up their systems to re- compile and re-install depends on the extent of their individual search effort.
I'm surprised! You end with an "economic recovery time estimate at 5 years"! (But then you call it "worst case".) That's no polly prediction, 5 years. So you MUST assume that society (in the economy, at least) is more fragile than the IT functioning of individual organizations.
BD talked about gun to the head forecast. (Oops, sick kid. Gotta go.)

-- jor-el (jor-el@krypton.uni), June 06, 1999.

Maybe Lee's right. Mental masturbation. My thought is that the forum wastes so much time fending off troll attacks, and personal stuff back and forth (some of which is wonderful, most neutral or negative), that a little y2k discussion couldn't hurt.
Discussing the BASIS of all the troll/polly back and forth. A majority probably don't care to discuss anyway.
Anyway, if I had to bet all I had on one number, I'd probably pick 4, with 5 a second choice.
Another basis for disagreement on this forum (and elsewhere) is what you are PREDICTING vs. what you are PREPARING for. Since the cost of this "insurance" is cheap, and edible, I would say I'm probably preparing for 8. Already moved out of city, now readying the farm in so many ways -- that's overwhelming enough without y2k considerations.
The difference between BETTING and INSURING. I don't think people get that either, and so they argue, and flame. I can bet 4, while insuring 8. I can cover more numbers with a small additional bet. Stakes vs. odds.
I also think people's arguments divide unthinkingly between infrastructure vs. economic breakdowns. (oops, gotta go -- tractor's here to mow our field -- back later)

-- jor-el (jor-el@krypton.uni), June 06, 1999.

jor-el:
Sorry I can't address your question as clearly as you'd like. I do enjoy the difference between betting and insurance. Like you, I'm insuring against about an 8. I'm betting on about a 3.
I don't think these bell curves are very descriptive. I see every organization's remediation efforts as hyperbolic curves, asymptotic to zero (or really, to some arbitrary bug-incidence rate, which we never get below because systems never hold still).
But remediation percentages make me nervous. On another post, I said that I could probably take a million-line program and introduce 10,000 errors nobody would ever notice or care about, or I could introduce one single error that would kill the organization before they found it (and cleaned up after it). So what's important is what these bugs *do*, not how many there are.
And that's why I emphasized the importance of testing. BIG y2k bugs show up quickly and obviously. If we can squash those, we can live (not well, but live) with all the little ones. And it doesn't take much testing to find the big ones.
I also feel that the difference between NO remediation and partial remediation will translate into a big difference in functionality later. By definition (as I understand it) we can live (not well, again, but live) with big errors in noncritical systems, and noncritical errors in critical systems. What we can't live with is critical errors in critical systems. Those are where the firefighting will be focused. And everything depends on what those errors cause, and how well they can be handled.
At this point, I think it's nugatory to argue about perfection (if indeed it ever wasn't). To use a military metaphor, we will take casualties. The goal isn't to avoid all casualties. The goal is to win the battle. Keeping casualties to a minimum is one way to win battles, but not the only way. And I think we will win this battle, after which we'll mourn those lost. There will be losses.
Sometimes I think of y2k like I think of automobile travel. We (in the US) suffer over 40,000 deaths a year on our roads. Yes, we do little things to ameliorate this (speed limits, seat belts, crash testing). And maybe without these, the statistics would be worse. What's important is, we consider such losses *acceptable*. We live with them. We all continue to drive every day despite them. And I think most of us will make it through y2k OK, even without preparations. Some peoples' preparations will prove necessary. For others, they won't be adequate. Luck of the draw.

-- Flint (flintc@mindspring.com), June 06, 1999.

Peter de Jager made a comment in this article that surprised me a bit...
http://www.pcworld.com/current_issue/article/0,1212,10673,00.html
[snip]
END IN SIGHT: PC World asked several Y2K heavyweights when the problem will pass. They unanimously agreed that most--but not all--of the headaches will be behind us by the first days of 2001. Countdown Y2K coauthor Peter de Jager predicted that problems will trickle through to 2003, "the year tax returns will cover the 2001-to-2002 time period." Gartner Group analyst Lou Marcoccio says "lawsuits could last for three to ten years." But Greenwich Mean Time's Karl Feilder proved most pessimistic, predicting that "around 10 percent of businesses will fail as a result [of Y2K]; thus the impact to them will be permanent."
[snip]

-- Linkmeister (link@librarian.edu), June 06, 1999.

Now it's sick wife, too. I'm beset! y2k is upon me TODAY! OK, back here to my "sanctuary". (I'm looking up "Symptoms Online" or something like that, for her. -- I really did, but quicker than she's expecting.....)
DAMN! I just read that "Pollies will cause Panic" thread before it was about to go off the menu screen. Exhausting! Why do you do it, Flint? The lack of distinction between concepts is amazing. (One early thought is that the troll sites are invading disguised as dogmatic GIs now. Anything to dim the quality of discussion.)
Apparently, it is important to dumb down sufficiently so that NO ONE misunderstands the need to prepare due to a lack of certainty over the likelihood or extent of the event necessitating preparation. Otherwise, there will be "blood" on someone's hands, for providing excuses for non-preparation.
Also, I've seen my share of cultish thinking, inside and outside of religions. No thank you.
There will be a test, as Lee said above. Unfortunately, for some people, it will be the rest of their lives (long after 2000), having to live without the use of logical faculties. [I've deleted a lot more I ranted on about -- Flint, some people just come here to argue. You and I probably come to hear ourselves talk, and pat ourselves on the back for being "smarter" than a few others. I oughta quit right now and get back to being the family nurse, but....]
To finish one more concept, infrastructure vs. economic breakdown. A lot that gets cited as a potentially fatal Infomagic-type breakdown is really economic second-order effects. Sure, we read in The Grapes of Wrath how people starved in a Depression. But TEO-whatever came on originally as more than just an economic slump, but a threat to your life's underpinnings.
For example, the blocking of ships carrying cheap clothing from China is an economic blow; even ships bringing oil from the Gulf. But your town water system failing to screen e. coli (the bacteria, not the yourdonite) from your water can deal a truly fatal blow. People's arguments sloppily lump all these together, when at least these two major categories ought to be kept separate.
(I would put a maybe-interesting economic distinction in governmental readinesses. Social Security messing up its check-writing would hurt some recipients, but in the absence of societal breakdown, we would probably help the needy elderly to get by until a fix was complete. But what of the IRS, if it were known to be unable to effectively police its tax collecting functions? There go T-bonds, and the money markets, (besides the highly-remediated Social Security's source of funds.) Fragilities upstream of other functions are more interesting than a recital of all the derivative functions' states of readiness.
For me, it's down the the level where y2k presents only somewhat more of a threat than an economic meltdown (which has grown). But then, the stock market has shot up even higher before its coming plunge, public psychology over economics has become even more lunatic, and I've been reading about -- YES! -- U.S. Banking HISTORY! The Federal Reserve -- It's Origins and Powers! How MONEY is CREATED (and destroyed!) F-R-R-R-R-R-R-ACTIONAL Reserve (hah!) Banking!
The economic stability now looks to me as mysterious a black box as the overall y2k outcome, all interlaced with human psychology. It could all end quietly, but it could all come unglued, and it seems people can sometimes turn about in the blink of an eye. We better not presume to be watching them and knowing all manner of things about them and what they won't do.
And now this thread, too, goes over the edge of time, into the great dustbin of yourdonite verboplasm.

-- jor-el (jor-el@krypton.uni), June 07, 1999.

Moderation questions? read the FAQ