Bugs Found in Products Using Intel Chipset

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Wnder what's causing the Pentium III shortage?

Bugs Found in Products Using Intel Chipset

Michael Kanellos CNET News.com 2/22/2000

Intel executives can be forgiven for wondering, what next?

The Santa Clara, Calif., chipmaking giant has discovered a bug that affects some server and workstation computers incorporating recently released Intel chipsets, just when it seemed like the company was digging out from a series of manufacturing snafus in 1999. Though the glitch occurs somewhat rarely, three circuit board ("motherboard") designs have been canceled in response.

Intel is moving to correct the error and will work with computer makers to resolve any current product issues, according to spokesman Dan Francisco.

Still, the bug is not likely to sit well with hardware manufacturers, who had to endure shortages and product delays last year. The shortcoming also could prove a boon to chipset start-up ServerWorks, which makes components that compete with the problematic Intel parts.

Some server and workstation makers are experiencing data corruption errors with systems containing Intel's high-end 840 and 820 chipsets and also one of two ancillary chips, the Memory Repeater Hub (MRH) or the Memory Translator Hub (MTH). The latter two chips are actually to blame, Intel said.

The two chipsets were designed to "talk" to an advanced memory design called Rambus, and are typically put into computers that also contain Rambus memory. Rambus, however, has been at the center of controversy because of its higher price, and manufacturers have resisted using the design despite Intel's endorsement. The MRH, which is paired with the 840, and MTH, which comes with the 820, essentially let computer makers use standard computer memory because the chips can take signals from standard memory and translate them to Rambus signals for the chipsets.

Further, the error only occurs when a system also incorporates Error Correction Code (ECC) technology. Server makers generally adopt ECC, which helps prevent data from becoming corrupted as it shuttles between the processor and other components, while workstation vendors occasionally use it.

The number of systems affected by the bug is limited, asserted Intel's Francisco. The majority of customers that have adopted the 820 and 840 chipsets are using Rambus memory, and therefore aren't using the translator chips. In addition, "the majority of the 820 customers do not enable ECC," he said.

Intel said it will correct the defect the next time it revises its manufacturing process, in the next "spin," to use industry parlance.

Nonetheless, some will likely be unhappy. Hewlett-Packard is one high-profile company that makes products within the danger zone. HP and others also had to deal with last year's problems with the 820 and 840 chipsets, which were delayed more than once for different reasons, forcing manufacturers to alter their road maps.

Intel also had difficulties manufacturing adequate volumes of the high-end "Coppermine" Pentium IIIs. The shortage continues to linger, although the company maintains that the end is in sight. "In Q1 we will catch up on everything," said Pat Gelsinger, vice president of Intel's desktop products group.

While painful for Intel, ServerWorks may benefit. The start-up specializes in chipsets for Intel-based servers. The key difference is that the company's chipsets don't speak Rambus. Instead, they are designed to work with standard memory and don't require a translator chip.


-- Carl Jenkins (Somewherepress@aol.com), February 22, 2000


Don't look up Carl. L.L. from Hell has been unleased again. By the way, I Thank You for all the information you have provided and continue to provide. Thank You very much.

-- Instant (Bre@kfast.com), February 22, 2000.

"generally adopt ECC, which helps prevent data from becoming corrupted"

Yup, I'ld say that's true. Doesn't sound too good with:

"experiencing data corruption errors"

So a user spends big bucks for a top-of-the-line machine, ECC memory, RAID disks, just to protect his data, and gets hit with this? Boy, I wouldn't be a very happy camper. Ain't technology grand?

I heard a story years ago about ECC memory. Don't know if it's true, or urban myth. You deceide:

ECC memory has been used on big mainframes for decades. It is pretty nice, because it keeps enough extra bits with each "word," that it is able to "correct" any single bit error, including a "parity bit" error. "Multi-bit" errors are still "fatal" but these "hardly ever" occur.

The story goes, that in the early days of "big" mainframes, those with 1 Meg. or more of main memory (yes, 1 meg.), IBM had trouble keeping all of the memory running for more than a few hours at a time. A "single bit" memory error would stop the multi-million $$$ box cold in it's tracks.

So they "invented" ECC memory in order to fix these "single bit" errors on the fly. And since "multi bit" memory errors did in fact turn out to be very rare, the solution worked well.

"Data Integrity" <:)=

-- Sysman (y2kboard@yahoo.com), February 22, 2000.

Stay with Pentium II or AMD. Pentium III has a "spy ID number" burned into each chip, so your machine can be identified remotely.

-- A (A@isA.com), February 23, 2000.

Thanks for the information and the comments

-- Carl Jenkins (Somewherepress@aol.com), February 23, 2000.

Various things.

CNET are spouting out of the wrong orifice. It's old news (2 weeks or 2 months depending on which of two bugs they are referring to). It affects Intel's processor support chipsets. They have not been "cancelled", merely put on hold until the problem is fixed. Other companies such as Via also make chipsets. Intel's bigger problem is that thet can't make enough fast Pentium IIIs and are rationing supply; AMD are slaughtering them on price now. Check out http://www.theregister.co.uk for accurate timely reporting of the silicon business.

ECC protects you against one-but errors within the DRAM chips. Quite apart from faults, there is an ineradicable minimum level of memory errors caused by cosmic rays passing through the DRAMs and transiently short-circuiting 1 bits to 0 bits. About one bit per fortnight per chip; rare, but how much is your data worth? The ideas of ECC and Parity go WAY back. Check digits were in use in the days of manual (or abacus-based) book-keeping, and the maths of ECC codes predates computers.

ECC does absolutely nothing to protect you against faulty logic in the processor or memory controller. Some old computer designs used to put parity checks on internal data busses, but AFAIK no microprocessors do. Disk buses such as SCSI still do, so a faulty disk cable can't scramble all the data passing down it without that being noticed. And ECC codes are heavily used for storage and retrieval of the data on the disk's platters themselves. You probably couldn't make a disk work without them.

-- Nigel (nra@maxwell.ph.kcl.ac.uk), February 24, 2000.

Moderation questions? read the FAQ