Beach - a little clarity goes a long way

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Ok, I know this topic has been posted and cross examined adnauseam. However, I would like to briefly summarize the Bruce Beach article and counter arguments. I am hoping to achieve at least the illusion of clarity about this topic. Mostly, I am hoping that if I missed a point someone will enlighten me.
BB says that thanks to buggy RTCs (Secondary clocks)lurking in some embedded chips that we may experience serious system failures.
Rebuttal says that: RTCs would not fail because 1. If the chip is not receiving power it cannot keep time and will therefore reset itself to baseline. 2. The clock may be started at any time and so you can't assume it would go bad at 1/1/2000. 3. Most systems when they do use a chip's time functions to monitor an event would use what Beach called the Primary clock (which as I understand it works somewhat like a metronome)and which is blissfully unaware of what a year is.
All in all the gist of the argument seems to boil down to this. Do chips with RTCs have lithium batteries and do they know what time it really is?

-- R (riversoma@aol.com), April 13, 1999

Answers

R:
Pretty close. As a general rule, any time/date used by any system that must be kept in sync with the outside world must have a way to set that time and date. Clocks must get an initial setting, and they tend to drift quite a bit over time. Both require resetting, so this must be possible.
Event monitoring can use any of a boggling variety of methods, which can include multiple clock sources -- many heartbeats. What time source is used depends on many things, like length of interval, price of parts, and access to appropriate dividing mechanisms. I have kept the date for periods of years using a clock that ticked 6 million times a second, because the hardware allowed me the capability to 'sample' that clock at intervals of my choice.
Some RTC's are supported with batteries (not always lithium). Others are not, so when power is removed they forget everything. But this doesn't matter (despite what Beach says) because in such an application you don't care about the date, only about the interval. I'm not familiar with any RTCs supported by batteries that are not settable. If the clock isn't settable, it can't be relied on to know what time it really is outside. And if you can't rely on it for that purpose, you don't *use* it for that purpose.

-- Flint (flintc@mindspring.com), April 13, 1999.

If you read Dean's post in the "Gary North siginificant post" thread, you'll see that a small capacitor is all that is needed to keep a timer going for a long while. These devices need very tiny amounts of power to continue to run. <:)=

-- Sysman (y2kboard@yahoo.com), April 13, 1999.

Sysman:
What you (and Dean) say is both true and irrelevant. For 'battery' substute 'any continuous power source'. The same arguments hold just as well.

-- Flint (flintc@mindspring.com), April 13, 1999.

Good evening Flint. I'm only pointing out that a battery is not always needed, hence not as obvious if you're talking about continuous power.
By the way, I've been chasing you for a few days (grin) to get your comment on the "external clock" Beach discussed. Please see your other reply and my question at the end of this thread, and give me your opinion. Thanks. <:)=
Straight Info on Refineries
PS - Im watching new answers, so you can post here or there.

-- Sysman (y2kboard@yahoo.com), April 13, 1999.

Sysman:
After reading Beach's 'clarification' I'm more confused than ever about that external clock. Apparently he's referring to an RTC chip somewhere on the board external to the processor itself. So OK, the 'primary' clock is the CPU's oscillator input. The 'external' clock is the RTC. But now, what's the 'secondary' clock? Would this be an external timer chip of some kind? Would this be the effective result of a processor sampling mechanism (I've kept time by programming a divisor off the CPU clock, enabled internal interrupts every Nth processor clock after division, and 'ticked' a software clock every Nth interrupt. Is this a 'secondary' clock?).
So I'm frustrated. In general, if you can't set it, it doesn't keep real time -- it's an interval timer in practice. But I'd dearly love it if Beach could find an offending device and send me the schematic. THEN I could explain what the hell he's referring to.

-- Flint (flintc@mindspring.com), April 13, 1999.

RTC's are only some of what Beach was dealing with in his article. The other (more subtle) issue was about 'soft' clocks and clock data structures which may be spawned from ROM upon initialization. The simultaneous processes are what form the RTOS and the program or system which it supports. More tomorrow.

-- David (C.D@I.N), April 13, 1999.

Flint,
You're confused, and you do this stuff! Now you know how I feel, let alone the average non-geek John Doe. I'm not sure yet that Beach knows what he's talking about, but he did open a can of worms here, and he's got a few people agreeing with him.
My concern with the external clock is that it is being used to synchronize other clocks, hence may cause synchronized overflows and possible failures. But as you say, I'ld sure like to see an example, so we can at least take a shot at figuring how critical it is. Maybe we should send him an e-mail asking to be more specific. He did respond pretty quick when I asked him for a copy of his reply to Paul. <:)=

-- Sysman (y2kboard@yahoo.com), April 13, 1999.

I feel as if I've wandered into the e=mc squared forum...
'Can't wait for the eventual translation!
In the meantime, I'll just go back to scratching my head over a recent article from the "Oil and Gas Journal."
The Oil and Gas Journal
Year 2000 glitch presents problem of unprecedented scope for petroleum industry.
BY Anne Rhodes
03/22/99
As petroleum firms around the world work, both literally and figuratively against the clock to make their operations Year 2000 compliant, a number of obstacles remain.
Software portfolios, operational control systems, and facilities must be readied for the date change. These tasks involve a huge testing and remediation workload, and they are complicated by a scarcity of both the resources and the authority needed to perform them.
Complicating matters is the necessity of preparing for the ubiquitous threat of legal liability in the event that a company's lack of preparedness causes problems for customers or suppliers.
The Year 2000 problem reaches well beyond information technology (IT). It is a business viability issue. Critical systems must continue to work during the millennium change in order to ensure that operations, and therefore profits, do not cease.
Time is running out. Only 15 (huh?) months remain in which to deal with the Year 2000 issue--even less time, in some cases, as data that contain dates in 2000 and beyond can be introduced into computers and other business systems at any time.
(Snip)
Petroleum companies are at high risk for experiencing a failure of some sort because they use a broad range of software, some of which was designed in-house, and because they operate literally thousands of devices that contain embedded computer chips.
(Snip)
It is unlikely that any firm will be able to find every single line of software code and every microchip that contains date sensitivity. It is therefore necessary that each company finds a level of business risk it can live with.
(Snip)
Note: One of the references in this article was the following piece by: Freeman, Leland G., "Year/2000 Black Holes: What You Can't See Can Hurt You," Year/2000 Journal, May/June 1998, p. 75.
'Don't know if this relates to anything. Just passin' it on.

-- FM (vidprof@aol.com), April 13, 1999.

The trouble with Beach's "essay" is that he doesn't use the terminology that someone who knows what they're talking about would use. At best, it's a layman's essay about something he doesn't fully understand.

-- Doomslayer (1@2.3), April 14, 1999.

Embedded systems have requirements which are not reflected in PC's. Most of them run operating systems from ROM. Upon initialization these OS's rebuild themselves in RAM. There can be facilities to 'suspend' the system, kind of like suspended animation, to a low power operational state. They awake when external circuitry calls to them. The Operating Systems are typically RTOS's or Real Time Operating Systems and now tend to be object oriented in nature. Its all machine language instructions flying around inside the box..very orderly but not very friendly to people peering inside to figure out what is going on.
RTC's are Real Time Clocks and tend to be modules on the board which keep things going including system state, time/date, etc. But the programs which make the 'box' behave in its special way must have data structures which can grab and hold these real time clock inputs. There is the possibility of using software processes as timers as well. Typically the programs which run the 'box' are crammed into as small a memory space as they can me (or alternately they expand to fill the space available).
If the data structures which handle date are not created from the outset to be large enough to handle a four digit year then you have the exact same problem as typical software. The only problem is that there is often no source code, no technical expertise to reprogram, not equipment to reprogram with, no memory space to do it in.
Yes you can ascertain if date functions are on the chip set but can you garrantee that that is all there is?
Look at network routers. They are pure black boxes. They contain tens or hundreds of processors. They run like a swarm of bees in a closed hive. Do you want to tell me how you will reprogram that sucker???

-- David (C.D@I.N), April 14, 1999.

David, Thank you for the great explanation. The more I hear the more I realize that this is such a VAST industry that no one person can possibly be an expert on all of it. Near as I can tell there is an unknowable number of buggy chips (somewhere between 25 million and 17.5 billion) that are being used in every possible application. These chips will fail in an almost infinite variety and severity of ways.
In addition, although many will fail on 1/1/2000, many more will fail at random over the coming decade.

-- R (riversoma@aol.com), April 14, 1999.

A better summary of the Beach article is that someone posing as a microprocessor expert makes an unsupported statement that there are embedded systems which have clocks in them which either cannot be reset and/or cannot be tested and thus they will fail. The numerous posts have shown this is not the case.
Part of the issue here is the assumption that people are working in a vaccuum with no outside assistance. If you think about electronic devices, you can lump them into two main categories: products and systems. My differentiation of the two is based on programmability -- systems can be, products cannot. Now programming covers a wide range of things but I am trying to keep this simple. Your coffee maker is a product. Even though the manual says you can "program it" to turn on at any time, you are not really programming it. You are merely changing the setpoint of when it should turn on. The embedded chips and processors inside are not accessible to you. But, they are identical to those in thousands of other coffee makers and the maker has already tested them in the factory and verified that they are Y2K compliant.
Now, a coffee maker is not going to get anyone upset if it fails nor instill confidence if it doesn't. What about a control valve on a natural gas pipeline? Again, this is a product, not a system, so you simply ask the manufacturer (or check their web page) for Year 2000 compliance information to see if your valve is OK or not. If you have several hundred, you probably will test one or two to confirm what the manufacturer told you but not every single one. So, even if Beach had a point about embedded systems and secondary clocks hidden in ROM or PROM that could not be accessed or tested by the end user, it doesn't matter because it is the same as the day it left the factory and the manufacturer will be able to test it and provide th information needed.
A system is programmable and thus accessible to be tested so, again, Beach's premise is invalid here. In any kind of a programmable system, there are tools available to search for date references and functional calls to date routines to identify the potential problem areas which can then be reviewed closely to see exactly how the date is used. Just because a system uses a 2-digit year, it is not destined to fail at the rollover. Unless you have an application which does some sort of calcualtion using the difference between 2 absolute dates, there is no reason for the system to fail. If a program reads the date and puts in the header of a graphic display or printed report, it doesn't matter whether it says 00, 2000, or 1900. The human operator looking at the display or the manager reading the report know what year it is. Most of the so called "Y2K failures" reported are nothin more than these types of cosmetic issues which can be fixed in due time, before or after the rollover, with no impact to normal system operation.
There are lots of systems and applications out there with very real Y2K compliance problems which may indeed cause serious problems if not rectified. That is where everyone's energies and efforts should be focused, not on red herring such as the Beach report.
RMS

-- RMS (rms_200@hotmail.com), April 14, 1999.

RiverSoma, if that was you on csy2k a ways back, just want to say enjoyed your posts. Glad you're here now.

-- Leska (allaha@earthlink.net), April 14, 1999.

Let's walk through an example, intended to attempt to clarify some of what's being discussed here. This is only one, very simple example, for pedantic purposes only.
OK, say you go out and purchase a 'doorbell box', a little (hypothetical) box that fits in the palm of your hand. You wire your doorbell lines to terminals on one side of the box, and RS232 (serial port) DC levels come out the other side. You plug this box into the serial port connector in the back of your PC.
So far, it won't do anything because you need a driver. This is a piece of software that programs the serial ports to enable interrupts, and reads the serial data when the interrupt occurs and notifies the application of what happened and passes up the data. (also the driver typically checks for errors of various kinds with the serial port hardware, and might handle multiple types of input, etc.)
In addition to the driver, the 'doorbell box' came with a windows applicaton. This application installs the driver, displays a picture (icon) of a door bell on the screen, and waits for a ring. When the doorbell rings, maybe the bell icon changes color and blinks, and the speakers make some noise.
Bingo! Your PC has now become the controller for an embedded system! NOTE that your PC is now PART of an embedded system.
Now, this doorbell application program probably has lots of bells and whistles (so to speak). It might have a monitoring program to give you data on how often the bell rings, and the time/date stamp for every ring it has ever received, and how long it's been since the bell last rang, and so on.
So let's say there's a y2k bug in this application. This bug causes the monitoring program (after rollover) to tell you that the bell hasn't rung for -3,245,682 years, and that on average the bell rings 5,342,111 times per second. And the last time it rang was 1000 years from now!
Is this a noncompliant embedded system? Yes, it is. Is there any hardware problem? Not at all. Are clocks involved? Probably several. Has your embedded system become worthless? Not entirely. The doorbell still rings and the icon still blinks, and the message box telling you which bell has been pressed is still OK. But your history is hosed.
From the user's perspective, this is an embedded system failure. From the engineer's perspective, this is lousy programming unrelated to the device itself. From the remediator's perspective, if the compliant upgrade requires that the 'doorbell box' be replaced because the compliant version of the software only applies to a different box, then it's a hardware issue even if no problem existed in the original hardware per se.
Are you in trouble? That depends entirely on what you were using that doorbell history for.
Now, let's say the box had a clock (and some ROM and a microcontroller) in it, and when you installed the software originally, the application told the driver to tell that microcontroller to sync the box time to the computer time. And let's say the PC handled the rollover properly and the box did not, and this is the cause of the weird history. This is a Beach scenario, I think. Two clocks were involved and you could only set one.
In this case, the solution may be to reinstall the software that came with the package, and you're back to normal, except that your bell history has one very strange data point, and you need to delete that one. And the solution may be to get a compliant box, because after rollover the ROM in the microcontroller doesn't work right anymore. Of course, the consequences of these problems, and their solutions, are widely variable.

-- Flint (flintc@mindspring.com), April 14, 1999.

Flint,
Nice analogy with the doorbell. It is definitely the way I understood some embedded systems to work. I expect that in many many systems chip failures will cause bizarre little problems. Like having gremlins running loose through your infrastructure.
However the systems which seem most at risk are fail safe systems. The power plant industry is one of the most safety conscious in the world. As are many industries which use toxic chemicals. They may still pollute the planet but they do have regulatory agencies breathing down their necks from time to time checking on the safety of their facilities.
As a result, these industries often have elaborate safety check and back-up systems. It would make a great deal of sense that a system monitoring events for safety would have a date function. In fact, it would really have to. It would be of monumental importance to know exactly when a possible breach in safety occured.
It is these safety systems that seem to me to have the greatest potential to make our lives way too exciting. There are thousands of those little chips with the awesome responsibility of initiating shut- down when they detect an anamoly.
The fact is that public and industrial safety has become a huge multi billion dollar industry and the technology is comparitively cheap. Most people would prefer not to wipe-out their customer base and also no one wants to get sued. So we are riddled with these fail-safe systems. Even in areas where it is not so critical, the cost of the technology is still a deal if it lowers your insurance premiums and prevents liability.
Another area rife with insurance costs and liability risks is the medical field. So here too, chips are used in safety monitoring applications.
So taking best case scenario (25 million buggy chips in mission critical systems)lets say only 1 million of those are part of safety monitoring systems that have the ability to shut down an entire operation. Lets say only 100 thousand of those actually initiate shutdown. That is still 100,000 mission critical systems SHUT DOWN!!!
OK forget 100,000. Lets say its 10,000. Feel any better? Heck! Lets call it 1000. Imagine that only 1000 buggy chips out of the entire 50 billion in use will shut down a "mission critical system" in America.
Somehow I don't find even those extremely conservative numbers comforting.

-- R (riversoma@aol.com), April 14, 1999.

R-
There is one problem with your scenario. Yes, safety systems do use a clock to pinpoint errors and faults but they use a master clock in the control system, not individual clocks in each sensor or actuator connected to the process. When a valve fails, it does not tell the control system when it failed, just that it failed. The control system reads the input and logs an alarm with a time tag indicating when the fault indication was received.
RMS

-- RMS (rms_200@hotmail.com), April 14, 1999.

RMS,
While some systems work in the manner you describe quite a few use the consistancy between the microchip clock and the control clock as a way of determining anamolies. Particularly in cases which include a parts/systems servicing schedual as part of a safety routine.
These functions are more obvious in medical equipment but are just as useful to lawyers in any industry. Anytime you have libility involved you want date functions.

-- R (riversoma@aol.com), April 14, 1999.

R-
But now we are talking about something completely different. Scheduled service is handled differntly than on-line fault detection. If a device has an internal timestamp to remember its last calibration interval, then that could generate a false alarm indicating that calibration or servicing was past due. But, it would not trigger a shutdown just because the service level had been exceeded. In these cases, the scheduled dates or service alarms are used to generate work orders for the maintenance department. If you are talking about liability issues, they could also be used to document that proper maintenance procedures (according to the device manufacturer) have been followed. But is no case will a device shut itself off or trigger a partial or complete system shutdown just because it thinks it needs servicing.
RMS

-- RMS (rms_200@hotmail.com), April 14, 1999.

RMS, While some systems do work in the manner you describe many others do not. This has been the main issue in failed y2k testing. Fail-safe systems which initiated planet-wide shut downs when an anamoly was detected. It sounds like the systems you are describing may make it just fine through the roll-over. That is if the plant can still function with whatever other buggy ware is rambling around it's bowels.

-- R (riversoma@aol.com), April 15, 1999.

Like I said before, an anomaly between the date in a final control device and the master system clock will NOT cause a control system to initiate a shutdown sequence. If you have evidence otherwise, I would love to see it.

-- RMS (rms_200@hotmail.com), April 15, 1999.

Moderation questions? read the FAQ