Readiness, Compliance and what Actually Errors were Found!

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

My question is, for all those organisations that have spent big $$$$ and say their compliant ready etc etc. When you checked your company after the Y2K compliance work was done, what happened?
Did it work perfectly, what type of errors still occurred? If you could you share this with the rest of the world so then we all know what type of glitches could take place and focus in on them. It could provide better contingency plans also.
Is the world still too short of programmers to fix the problem? Will telecommunication problems not fixed in other countries bring down the global telecommunication?
What percentage do you give for a world recession?

-- Tracy Rice (price@sia.net.au), April 29, 1999

Answers

You have a great question. I hope a lot of folks post a reply with there knowledge on this matter. The front page of the business section of the Chicago Tribune recently there was an story about one of the FAA computer systems, it explained how after fixing and labeling the computer "compliant" they turned up the clocks and CRASH! They ended up finding over 150,000 lines of bad code! These were mistakes entered by the people fixing them!

-- BigGuy (supersite@acronet.net), April 29, 1999.

What we found...
1. It was tedios to look for all of the problems. We used a scanning tool on our code that we thought was already clean. We got an awful lot of "false positives" and stumbled across a couple "false negatives". Tedious, tedious work.
2. Setting up a test environment was WAAAYYY more difficult than anyone imagined. We thought you just had to buy a bunch of parallel equipment. Moving all the components, getting things configured, copying databases, setting permissions took a lot of effort. In fact, setting up the test environment took as much time (less effort, because there were fewer people working. As much time, because the work was linear, not parallel).
3. Once we ran the tests, not much happened. Most problems were cosmetic (report dates, truncated fields). Even if we hadn't remediated, we found very few critical errors. Even the critical ones weren't any worse for us than a typical production problem.
I should point out that we do very little transaction processing...that's the typical heavy lifting that most of the heavy iron guys are concerned about (e.g. reservations, accounting, rail car dispatch).
On the one hand, I know that many companies won't get the testing done just because of the time, money and effort required. On the other hand, not too much consequence.
It seems that application software won't be too bad. We have lots of people working on it, they have fixed production problems before, the people maintaining this stuff know what they're doing and how to fix things that do break.
At this point, I'm more concerned that we've overlooked something from an infrastructure point of view. What if a compiler isn't compliant? Then we depend on a third-party company to fix it. If it broke for us, it broke for all the other companies using their product. How long will it be before it's our turn? There are several choke points that could be affected...LAN, 3270 emulation software, compilers, DBMS, and of course the big ones like electricity, water and telephone.

-- Jim Smith (JDSMith1@hotmail.com), April 29, 1999.

It is a tough question to answer and leads to the classic Y2K testing problem. You will not be able to get an answer based on a very simple fact. Code that is being "tested" (unit, system or acceptance) is running in a fairly controlled environment and scripted "tests" are only as good as the people who think them up. Lots of logic is not even touched during regular testing; there are just too many variables based on _combinations_ of data. Once that code has moved up the testing chain and is finally moved into production, the time warp disappears. Production is what runs the organization so the date must remain at current date (no look ahead to 2000 or beyond). The changed code will now (for the most part) remain dormant until the date naturally moves to Y2K.
It always amazes me how talking heads mention "testing" as though you simply walk up and turn a test dial and "bam" its done. Testing is incredibly complex and unless you've spent many a night/morning baby sitting them you haven't got a clue.
Organizations can state whatever they please but the digit heads know that the big show happens when the dates creep into production naturally. I've seen amazing (that's the only way to describe it) things happen at 4 am. If you love puzzles and magic, try big iron IT.

-- br14 (br14@comeback.now), April 29, 1999.

Dear Brotherl4:
There's an anecdote I filed away from the '70s, waiting for it to undergird an appropriate post.
In the late '70s I was using a medical records system that generated clinical encounters in the ER in a Kaiser-Permanente facility in San Fran East Bay (Walnut Creek.) It was S-100 based (a predecessor of IBM's PC) hardware which I designed and assembled, including touch panel data entry; the software was an 80-menu, menu-driven package that I wrote in structured assembly language. I used the system successfully for two years on a daily basis.
The entire system was on a rolling table, box, CRT, KB, printer & linear power supply --- every nite I stored it in a nearby utility closest/every morning I rolled it out into the hallway --- that was my 'open-air' office. It also gave high visibility to my project, on purpose. Whom did I attract that way? Lots and lots of programmer geeks & IT guys (our group serviced the Lawrence Livermore Labs' nerdheads.) Got lots of comments & discussions out of it when they found out their doc had designed it.
The most impressive of my conversations was with a guy who took a hard look at my system as the nurse guided him past into one of my treatment rooms. After I took care of his (not too serious) medical problem he warmed up to a real interesting conversation. Seems he's the guy who developed the airlines' reservations system (Help a doddering Alzheimic here, folks -- is the name of it Sphere, or some such system?) We had a real nice geek talk. But by far, burned into my memory, was the intensity of his words as he described
HOW LONG IT TOOK TO DEBUG THE SYSTEM: ----- TEN YEARS.
A postscript: I spent 5 years working part-time in the HMO's computer R&D dep't in Oakland, helping introduce PCWeenie solutions, after the devastating experience of million dollar failures using S/360 and 370 iron (Newbies: that's geek for IBM mainframes.) I pored over stacks and stacks of green/white computer printouts, old protocol and testing manuals of the leftover project carcass I found in the library, and became friends with the ex-headgeek. All of this postmortem work simply confirmed what my patient above had described, and what you've detailed to the readers of this Y2K forum.
One last tidbit: the head of our R&D dep't was an MD, who never held in his hand either a soldering iron, or a code sheet. Apropos I've got a personal anecdote illustrating the impact of no-nothing management on project deadlines -- but maybe in another, later post.
Br14, can one successfully persuade readers of what's involved in Y2K mainframe remediation if they've never experienced what you have?
---- Tnx for your words,
Bill

-- William J. Schenker, MD (wjs@linkfast.net), April 29, 1999.

Dr. Bill, I believe you're referring to SABRE.

-- (mass@delusions.com), April 29, 1999.

There have been some very interesting comments on this subject: real meat! I have concluded that most of the systems out there will, after all the management wordsmithing, wind up operating in a 'fix-on-failure' mode, because realistic testing is too expensive and time consuming. Well hey, that's what the big iron guys do every night now! The difference is that they don't get an abend every five minutes. So the potential is there for the system to mush down over a period of weeks of incessant interrupts with loss of productivity. One thing is for sure, we are going to see a hell of a lot more abends in operations departments than we have ever seen before.

-- roy scruggs (rscruggs@triada.com), April 29, 1999.

In some of the latest health care Congressional testimony, buried, is some interesting HCFA info on remediation and testing and errors introduced into the code.
Diane
Statement of Mr. Joel Willemssen
Director, Accounting and Information Management Division
General Accounting Office
04/27/99

http://com- notes.house.gov/cchear/hearings106.nsf/ 768df0faa6d9ddab852564f1004886c0/ 8227de35056cef1885256760005ff369?OpenDocument

Or see this thread ...
Medicare Providers Still Aren't All Ready For Y2K

http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id= 000lzG

(Warning ... lots to wade through).

-- Diane J. Squire (sacredspaces@yahoo.com), April 29, 1999.

Stomp it off....

-- Blue Himalayan (bh@k2.y), April 29, 1999.

--and to go further--all that groovy fixin going on from way back when was when the power was on, all the water worked, family wasn't sitting home in a dark cold house, with zip protection, etc. It's the TOTALITY of zillions of little to medium failures all happening at the same time. Here's a test for all the IT guys--goto your mainframe, and fix it after you've pulled the breakers and turned the water and city gas off to your building. Oh ya, walk to work, too. And before you go, turn off the power at your house, and the water there, too, and set fire to the neighbors house next door, and tell your spouse to just "deal with it-do a work around". Now THAT might be a realistic test, one that doesn't exist in a vacuum of "just the code" or just the hardware. Sure, you can do amazing things-I grew up a puter brat-dad was a pig iron mainframe guy almost his whole adult working life, and I remember the multiple-day shifts and the endless schooling, and bringing home work and stuff. It's HARD! And it requires an amazing amount of brain power. But it has to be done in an environment where you have a guarantee of everything working around you, with the only broken "stuff' is the immediate problem at hand. We had just a blizzard once, and i can guarantee you that tthat puter suffered serious neglect after the power was off and the old genny ran out of diesel. zillion bucks to fix that badboy. And there are ZIP guarantees of everything else intact, in fact the opposite appears to be more likely, as far as I can determine. That's why fix on failure ain't gonna work. You can change a flat if you got level road and a place to safely pull over. You can't change a tire on your car when it's stuck on the side of a hill slipping in the mud. Got to have stability around you and all the other infrastructure support. I see all the work being done now as heroic, but not enough, not in time, at least for any sort of continuation of "normal" society as we know it now. And am quite content to deal with that-my work around is being a Survivalist, and that's a mindset you get in advance and always work at it, not something you hurry up and do at a "failure". It's just not good enough to rely on big companies or big governments words, the bosses dictate what gets said and done and worked on, and when and how hard, and sometimes(a lot, actually) the Peter Principle applies as to bosses.....

-- zog (zog@avana.net), April 29, 1999.

Moderation questions? read the FAQ