My response to Chicken re. silent data corruption

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

On the "I'd live to see some proof" thread, Chicken Shit asked for "proof" of my statement that

" Problems encountered, if we are lucky, would be immediately obvious. However, if they are not obvious, it usually means they are more serious, and are silently corrupting the global data network."

Chicken, for the sake of your customers, I hope you know a lot more about wiring houses than you do about computers.

First of all, I ask "Would you rather your PC be hit by a virus that prevents it from booting, or one that slowly and silently corrupts your entire hard drive over the course of a month or two?"

Second, my quote referred to current and future y2k problems. And note that the stream of y2k problems "officially began" 1 July, barely a week ago. [Gartner Group, Jan 99]

Lastly, take a look at what the experts think. This is from the recent Air Force Magazine Midnight Crossing article:

Deputy defense secretary John J. Hamre, the Pentagon's point man on the Y2K problem, referred to uncertainty in a press interview. "Probably one out of five days I wake up in a cold sweat, thinking [Y2K] is much bigger than we think," said Hamre, "and then the other four days, I think maybe we really are on top of it. Everything is so interconnected, it's very hard to know with any precision that we've got it fixed." Hamre testified last year, "Frankly, I think we'll be lucky if on Jan. 1, 2000, the system just doesn't come on, because then we'll know we have a problem. Our bigger fear is going to be that the system seems to work fine but the data is unreliable. That's a far worse problem."

-- a (a@a.a), July 11, 1999

Answers

Thanks 'a' for frying Chiken Little.

Didn't know she was in the electrical trade. I've been a contractor for 20 years, have some programming experience, but the majority of trades people I talk to, haven't a clue anymore than the rest of the population.

There was a very interesting thread in regard to an MIT study on learning, and how 4 categories seem to fit most all people. If you could dig that up, it would be most helpful

As for Chicken, time is wasted and counterproductive. Not that I wouldn't hope that she would see the problem an it's implications, but he/she dosen't WANT to see them. It's like talking Catholicism to a Jew. The Talmud to a Hindu.

Bob P

-- Bob P (rpilc99206@aol.com), July 11, 1999.


Mr. Chicken!! At last I have found your calling! A romex hand! LOL! An Out of class/with two years apprenticeship going for you (if that). I had been wondering about why you wouldn't talk to a Journeyman Inside Wireman about jobs!

Yes, I would imagine that a 4 way switch would be a little difficult for you. (you have to use a pair of 12/2 WG cables to achieve the number of wires needed to effect the hook-up.)

Shoot! And here I was actually sweating, wondering if I could hold up my end of a debate with you. After all I have only worked on the more simpler(I thought) parts of a power generation complex.(Actually though, I have worked at the power houses long enough to be able to say "been there, done that". When asked to do something on one of them.)

Of course I did start out on the bigger jobs in 72' on the PAR/MSR site in N. Dakota (the anti-balistic missle system that Nixion killed in the first SALT treaty).

And I dabbled a little time away back in 74' at the Gray Hound Towers in Phonix, Az. Helping to install the first phases of the internet sytem we are using right now.

And of course, in both places and so many others I have been around. You would find all that "Big Iron" that the programmers like Mr. Cory Hamasaki and the other programmers try to explain to you about. But you will not find any of these things around a housing complex. That is for certain and for sure.

Incidently, you do have me beat in one respect! I have only "roped"in three houses. And I hated each of them equally. But to be fair...The housing contrators would not want to pay me my scale, not when they can pick'em up off the street for about 7-9 dollars an hour. Just about a fourth the scale of a journeyman inside wire man.

Let's see now...How does that old saying go.."Two wires Hook'em up! Three wires F..'em up! Four wires drag'em up...

Boy do I feel sorry for the Journeyman Wire man that was stuck with you for a partner. It must have been like keeping Cheeta out of the ball of string; keeping you from making an A$$ out of yourself in front of the pros.

HaaaaHaaaaHaaaaHaaaaHaaaa ROTFLMAO Gezzz Mr. Chicken Little! You sure are riot Shakey

-- Shakey (in_a_bunker@forty.feet), July 11, 1999.


Boy you guys really gave it to chicken little, lol! But I have to be honest here aaaa, your statement also baffles me: " Problems encountered, if we are lucky, would be immediately obvious. However, if they are not obvious, it usually means they are more serious, and are silently corrupting the global data network."

Having worked in y2k on several projects involving both software and embedded systems, I know of no such examples myself, even in reading all of the available industry information. Could you guys therefore humor me, and actually answer chicken littles question? (you never really did you see....)

Regards,

-- FactFinder (FactFinder@bzn.com), July 11, 1999.


Dear Mr. Fact Finder, Sir I tried to find out just what it was that Mr. Chicken wanted to know in the feild of embeded systems, as they relate to valves, solonoids and PLC's. With out sucess, I am afaird.

Being somewhat in contact with peole who look for craftsmen to fill new jobs,I sometimes get the REAL information about jobs. Not the pablum that is pandered to the public, for it's contented consumpton.

The embeded systems' problems are already occuring and have been occuring for the past about ten months now.

Where the RTC in a chip is concerned, it can let go at any time now, and far into the unforseeable future. And weither or not the PLC or the computer/mainframe was put on a time clock and artifically ran forward or not, causses this sliding scale of possible future/present/and past failures of the embeded chips.

I catalogue the Hamilton, Ind. Power house as the first casuality...The vibration sensors embeded in the turbine pedistal failed to shut the turbine down causing the turbine to blow it'd cooling envelope.

Another occured earlier this year just out side of K.C. when a safty valve on a power house's boiler failed to open and the boiler blew it's top off.

The Ford Dearborne explosion and fire was due to a valve closing on a fuel line when it should not have in their on site power house.

There are more such coinsidences, too many for me to type. Just remember. An embeded system, if it fails. Does so with some times dynamic results.

Am I going to work on any of these jobs I have been asked to fill? No way. I remain

Shakey in a bunker @ forty.feet

-- Shakey (in_a_bunker@forty.feet), July 11, 1999.


FactFinder: I posted a list of 55 problem dates from 1997 to 2002. We have barely passed 18 of them. I am not going to try to give an example of a programming error that will be caused by these dates that A) remains hidden long enough to B) inflict severe damage. This is similar to Hoff's request to give one example of contaminated data in the financial sector or Poole's request to name one non-compliant chip that will cause a malfunction. Conversely and in retrospect, I can easily give examples of how even more bizzare computer foulups end up torching teenagers, locking mayors in elevators, or dispersing feces on parade grounds. Don't try to use human logic with computers; they'll outsmart you every time.

What I will say, however, is that the code now being readied for production is gonna be full of bugs. Serious bugs. Guaranteed. Even the stuff that has been fully tested and QA'd still has an average of 15% of the bugs intact. And the stuff that is nottested, well, it'll be much worse. And the stuff that is not even fixed (fix on failure), well, I think you get the point.

Now remember that all of this code will be going live, in every company, in every industry, in every country in the world, in the same time period, in parallel with the other effects of the Year 2000 and its associated economic problems.

-- a (a@a.a), July 11, 1999.



factfinder... fancy meeting you here.

precisely which question of cl's are you referring to... he became a 'tad' confused towards the end of that thread.

but, you have answers. why don't you click on the "i'd love to see the proof" thread and scroll down to simon richard's explanation *and* examples. i believe that it is the fifth from the bottom.

that should assuage your 'curiosity.'

marianne

-- marianne (uranus@nbn.net), July 11, 1999.


In my experience, when data gets corrupted, it is not something like changing $98.45 to $18.45. It is changing $98.45 to 4@$%^%. Such data corruption does not silently go about "silently corrupting" any other data it comes in contact with, it just causes an immediate wreck. As some people pointed out on the original, thread, a virus can go about corrupting other data, but aren't a virus and corrupted data two different things?

-- walt (walt@lcs.k12.ne.us), July 12, 1999.

Walt,

yes, a virus [cause] and corrupted data [effect] are two different things.

-- J (jart5@bellsouth.net), July 12, 1999.


FactFinder:

To paraphrase 'a's answer to you:

"No"

-- Hoffmeister (hoff_meister@my-deja.com), July 12, 1999.


"Our bigger fear is going to be that the system seems to work fine but the data is unreliable. That's a far worse problem." - John J. Hamre, the Pentagon's point man on the Y2K problem,

"Data corruption is a non-problem." - Hoffmeister, resident pollanna, PooleFoole, NormNanny, and all around SAPsucker

-- a (a@a.a), July 12, 1999.



"still no"

-- Hoffmeister (hoff_meister@my-deja.com), July 12, 1999.

OK fellas...here's a hypothetical.

We're performing an attack on N. Korea forces Jan 5 2000. A weapons control computer is interfacing to a C4I net to obtain order of battle information. The windowing approach seems to have worked, since the system is operational, Built In Test returns nothing but the usual anomalies, and the targeting system has a green light. Missile away!

But a small routine in one of the interface modules that validates Time of Day for intercomputer communications was not windowed correctly, and is actually reporting a value based on Jan 5 1900. The error propagates, an incorrect telemetric value is computed by a Time Difference Of Arrival mechanism, and the missile is instead launched toward a battalion of S. Korean and American troops. 455 people die. A million N. Korea soldiers, propelled by the resulting confusion in the Allied war room, advance on Seoul.

I could sit here all day and spout out meaningless examples such as these in banking, telecom, energy, and every other sector. Apparently you missed my point Hoff. If we knew where the bugs were going to crop up, they wouldn't crop up, now would they?

BTW - if you think this scenario if far fetched, check the FF casualities and computer mishaps during the Gulf War and think again.

-- a (a@a.a), July 12, 1999.


Several of you really took the time to respond to my post above with sincere replies, and I appreciate that. To be honest, I didnt expect that. I therefore want to take the time to respond sincerely.

I too take exceptions to a statement like this:

" Problems encountered, if we are lucky, would be immediately obvious. However, if they are not obvious, it usually means they are more serious, and are silently corrupting the global data network."

Specifically, I disagree with the following: 1. Y2K Problems that are not obvious usually means they are more serious In performing y2k testing in software and embedded systems, I have found quite the opposite to be true. Other industry information I have read concerning test results is consistent with my findings. A typical example is the Westinghouse RVLIS/ICCM system used in some nuclear plants. Initial indications were that there were no y2k problems with the system, however more extensive testing by a utility indicated that an obscure date usage of a day of the week/date stamp by a maintenance terminal failed a y2k date test (actually, this test failed EVERY year as a second utility discovered). FYI, this was a very trivial bug that has been there since installation, and has absolutely no effect on the systems ability to function. Another problem that I saw was a minor date error in a log file or date stamp on a report, not always obvious unless you looked at these files and print out all the reports available. . Generally, the harder the bug is to find, the more trivial the bug. I say generally of course, because there are no absolutes in Y2k because there are no absolute programmers.

1. Not-obvious Y2k problems could corrupt the global data network. I take exception to this because I do not know what the global data network is :) Is it the Internet? LOL Seriously, I interpret that you mean corruption of some larger data network. I have no problem if you are saying that corrupted date data passed along to another system (software program is the most likely candidate), I have seen this too. Its the old garbage in, garbage out thing. FYI, I have found that usually all the devices/programs continue to function fine with the date errors, not always of course, it depends on how dates are used. For plant control systems, dates are almost never used for controlling or anything other than date stamps. For software applications, often the dates are used for searching, sorting, etc. and so some functions of the software become corrupted. For an Access database connected to an SQL database, the errors can thus be transmitted. These are localized networks, not global.

What I really take exception too are several claims (not by you, aa@a) I have seen on various y2k sites, in several embedded system white papers and frequently in the Y2K forums: 1. Y2k bugs can act like viruses and propagate over wide networks. 1. Minor Y2k bug failures can cascade through systems and build, eventually causing significant problems.

The above two claims are IMNSHO, myths without basis. But thats a topic for another time.

Several other comments from others warrant response: Shakey - The embeded systems' problems are already occuring and have been occuring for the past about ten months now.  Yep. The first one I know of occurred on January 1, 1999 and had the potential to cause date stamp problems with hundreds of control room recorders if power was lost for a period of time (Westronics model 2100 and DDR10). Fortunately, the problem was minor (the data was unaffected and the recorders would have continued to function with the wrong date), and corrected with an EPROM upgrade. By the way, ebedded system y2k bugs are almost always minor date stamp problems that do not affect the devices primary functions  I have found this in testing, in vendor testing, and in industry resources I have used in my Y2k work. I have found the same comment from authorities in other industries as well (if challenged, I can provide links). On the other hand, there are some rare actual embedded system device functional failures (yes, lol, I can produce manufacturer and model numbers, but Im not gonna go to the trouble unless Poole personally challenges me ;)

As far as identifying embedded system chips, it just doesnt work that way, I havent seen a system that uses only one chip  an embedded system device is made up of several chips to get a whole working device (complete with firmware programming, and infrequently dates may be used in the program). Yes, the RTC clock chip is a single chip that has a y2k bug, but show me where its used by itself with no insies and outsies lol.

Shakey -  Just remember. An embeded system, if it fails. Does so with some times dynamic results.  Cant disagree here. Can say that I havent found a y2k bug that causes a serious problem with power plant safety, operation, or performance. Can say that I havent found evidence of such in all of the industry reports I have read. I HAVE read articles indicating that some Canadian hydro plants had some y2k bugs that may have affected operation  I just havent seen the evidence (and I do love the facts). So yes there may be y2k bugs that cause serious problems in embedded systems with serious results, in fact I would be surprised if there arent some, but they are obviously very, very rare (yes, I have seen all of the unidentified failure reports, but please, give me a manufacturer and model, lol. I actually checked in on one of the Y2k failure reports posted here months ago, an IV drip pump, and found it to be absolutely false based on the vendors information and testing. I wrote the website that was listed as the source with my documentation, and they revised the site).

aa@a s example hypothetical y2k failure   But a small routine in one of the interface modules that validates Time of Day for intercomputer communications was not windowed correctly, and is actually reporting a value based on Jan 5 1900. The error propagates, an incorrect telemetric value is computed by a Time Difference Of Arrival mechanism, and the missile is instead launched toward a battalion of S. Korean and American troops. 455 people die. A million N. Korea soldiers, propelled by the resulting confusion in the Allied war room, advance on Seoul. 

Wont happen, telemetry will be using timing pulses/clock, not a date calendar algorithm.

Please continue to check on what I say, feel free to challenge (as though you need for me to tell you that, lol) because I am human, and therefore will be wrong on occasion. As a matter of fact, I have admitted to such in at least one post. You can be sure, that I will check on your Y2K information as well. And please, dont quote those silly white papers that are full of myths. Those myths just keep on cascading and cascading, sigh..;)

Regards,

-- FactFinder (FactFinder@bzn.com), July 12, 1999.


FF: The TDOA schema we use employs GPS, which is subject to WNRO and Y2K processing error. Other components of the system do indeed receive latent TOD data via IP sockets.

-- a (a@a.a), July 13, 1999.

Dear Friiend Fact Finer.

Sir I bow to your obious superior knowledge about both soft ware and computers. But as I said once before, the power generation complexes cira 70's and 80's were built with international effort and equipment. This means also that in some areas of a plant. There were PLC's and their attendant systems which were/are of both asian and european design and manufactor.

I am sure that you are aware of "sub sets" of PLC systems...Which are interlocked with each other so that if one has a problem or failure, it will "talk" to another (s" PLC and tell it to shut down also. And the "new" preceived condition is moved at the speed of light (the speed of an electrical impulse on a wire, or in the moe modern PLC"s at the speed of light litterally as they now use fibre optics to "talk" between themselves and their sector controller, which in turn either talks to the main frame in the "ivory tower" (a term used by construction workers for the main control room over looking the turbine deck) or else lights up a "baby spot" a term used, again by electricans for an alert light on the main console. Or else causes a meter to give off the pertinant readings of the newly developing conditions in that subset area.

But most of the readers here have never seen the inside of a 750 megawatt coal-fired generator. So I"ll try and give them some idea of the size of this precision built machine.

The Intermountain project located at Delta,Utah is my example. Comstock electric out of California bid and received the contract for pulling the cables; power, sensor etc. For the main building and the scrubbers.(this still leaves out the switch yard, coal handling and other out lying buildings.

-- Shakey (in_a_bunker@forty.feet), July 13, 1999.



(sorry for the interuption..my machine went crazy and posted before I was finished)

The amount of wire bid on to be pulled was 93,000.000 miles!!!!!(that's ninty three million miles). The actual wire pulled was a bit more, say 17 thousand miles worth.

The number of sensors,PLC's etc. Are almost uncountable unless you have the information programmed into a computer.

The differing types of controllers (so called black boxs) came from three continenents. But the Intermountain power project also has one highly noteable distinction....It was the first US built power generation complex to transmit pulsating D/C on it's transmission lines. All three phases were put on one, repeat one wire, a 750 MCM if I remember correctly.

The concept of this pulsating D/C system was/is patiented by a Swiss/Swede company. Basically they invented a way to cut the bottom off a sine wave A/C generated electrical system. And send only the top of the sine wave down stream...1,200 miles down stream to Los Angelus, California. Where it is converted back to A/C. Why go to all this trouble? Amazing,for me at least at that time. By using the pulsating D/C method, the powers that be/the owners were able to send the total out put from the plant on one wire, and with no sub stations needed between the plant site and LA. At an exceptional savings. This of course leaves the plant unable to put it's power into the conventional national grid system.

Intermountain leased the D/C conversion from the Swiss and it was them, who built the system,hired their own sub contractors etc. Jelco Electric received the awarded contract for doing the electrical, which included all computer hook ups, sensor arrays, and of course their own embeded systems....It was/is their design you see. Included in the contract agreement was that the Swiss would be the only ones to replace or up grade the system for the duration of said contract.

But I have written enough, at least to give those who have not been in or around a large power complex and idea of the scope of just one of them, combined there are about 7,800 of them,in various magawattages in place around the country.

Shakey

-- Shakey (in_a_bunker@forty.feet), July 13, 1999.


Moderation questions? read the FAQ