Last week's mini-Y2K: What went wrong?

greenspun.com : LUSENET : Y2K discussion group : One Thread

COMMENTARY--The Y2K Bug was so-called, of course, because it triggered on a date. What made it so dangerous was that it was in very old code--written when the trigger date was so far in the future that it couldn't possibly be a risk.

Last week saw two other date-sensitive events that seemed to cause widespread disruption. Among the problems: users of a Microsoft Navision Axapta ERP system saw response times soar; online banks in Singapore reportedly went offline, or at least refused to do any banking; some Java applications crashed, Norton AntiVirus had a breakdown; and at least one person thought he had been fired by e-mail.

The problems stemmed from VeriSign's certificate business. Certificates are arguably as crucial to the sound working of the Internet as are notions of date and time. Without certificates we would have no way of knowing that the site we are using is secure, or even its true identity.

That little yellow padlock that appears at the bottom of your browser every time you access a secure part of a Web site hides a wealth of information, and in particular the certificate path. Indeed, the only way we can trust the certificate itself is by knowing its genealogy. Who issued it? How do we know that the issuer is trustworthy? Who issued the certificate that says the issuer itself can be trusted?

These are important questions, and we rely on the answers for certainty and peace of mind that the applications and Web sites we are using can be trusted. Suppose, for instance, you use antivirus software which every so often downloads a new batch of virus signatures. You want to be absolutely confident that these virus signatures really do come from the antivirus software publishing company which they purport to come from, and just as sure that they have not been tampered with on their way to you. Certificates hold the answer.

Last week, we saw what can happen when that chain of trust, from the certificate issuing authority, through to the software publisher (or Web site) and over the Internet to our servers and PCs, breaks down.

On Wednesday morning, when Symantec's Norton AntiVirus product--installed on thousands if not millions of PCs--trundled off across the Internet to pick up the latest load of virus signatures, it came back behaving in a distinctly odd manner. Users reported instances of their PCs locking up or slowing down so much as to be unusable; Symantec itself said that Microsoft Word and Excel were refusing to start.

Elsewhere on the Internet at about the same time, users began noticing other strange behavior.

"I had my Outlook crash for no good reason," wrote one ZDNet UK reader. "There was a cryptic message saying that my credentials were no longer valid, which is pretty scary in a corporate environment. You don't know if you have a virus or are being laid off!" To add insult to injury, this correspondent found himself locked out of his company's Microsoft Navision ERP system, and unable to log in to his online bank. Other correspondents found their Java applications simply failed, and yet more found themselves presented with odd error messages when trying to access secure areas of their own corporate Web sites or other e-commerce sites.

There seems to have been two distinct and separate problems, both of which were date-related, and both of which should have been flagged up a whole lot more prominently than they were. And both of which transpired on Jan. 7, 2004.

VeriSign, which according to its own figures issues some 25 percent of digital certificates in Europe and indeed a good number worldwide, holds the key to both problems.

The Norton AntiVirus problem was caused, says VeriSign, by the expiration of a certificate revocation list (CRL) called Class3SoftwarePublishers.crl, on Jan. 7. As applications--including Norton AntiVirus--attempted to check the list so they could verify that the certificates they were checking were still valid, they got little help, and so tried again. The effect of all those copies of antivirus software--and other applications that we didn't hear about--repeatedly checking whether the certificates were valid, was to increase traffic to VeriSign's CRL server one-hundredfold. In effect, VeriSign suffered from a self-inflicted denial of service attack.

It is arguable that had Norton AntiVirus been designed better, this denial of service attack on VeriSign's servers would not then have backfired and stalled PCs across the world, but there is not space to get into that debate here.

So anyway, onto the second problem, which was again caused by an expiration date--this time by one of VeriSign's own root certificates (also known as root certificate authorities). Root certificates are the parents of those certificates used to sign secure Web sites and other code. Normally this all works fine, but if a root certificate expires then, in effect, so do all its children. The result is that any code requiring a certificate to run or to prove its authenticity, cannot do so if the parent of the code's certificate has lapsed.

At first glance, the idea of a VeriSign root certificate expiring sounds laughable--comparable even to Microsoft letting its .com domain name lapse--but in this case VeriSign knew that the root certificate was to be pensioned off. The company says that all global server IDs issued since December 2001 had a new root certificate, and has been providing instructions on how to manually install it. Obviously it is in VeriSign's best interest to ensure that its customers are using the latest (valid) certificate authority, just as it is in its interest to ensure that the CRL server is accessible at all times, whatever the level of traffic. However, e-mails we have received suggest that the company didn't do enough.

"I purchased my Java Code Signed certificates from Verisign in October 2003," wrote one correspondent. "There were no warnings I received indicating any action on my part was necessary.

Additionally, it was more than error messages for users trying to access secure areas, JAVA applications that relied on these Verisign Code Signed Certificates simply failed." Another wrote: "Two of three certificates we purchased in 2003 had this problem. Neither my network admin nor I were ever notified by [VeriSign], and since the SSL information is sent via e-mail, they obviously had our e-mail addresses."

VeriSign says that the expiration of the Certificate Revocation List was unrelated to the expiration of the root certificate. There is no reason to doubt that. However, the company could and should have done more to warn its customers and the Internet community at large of both issues. On Friday, VeriSign was notably reticent about the issue, only posting an advisory to address the Norton Antivirus issue late in the day. Indeed the only forewarning I'm aware of came via Cryptonomicon, who noticed an incidental entry on Jupiter Research's Microsoftmonitor Weblog by senior analyst Joe Wilcox.

In his blog, Wilcox indicated that this may not be the first time we've seen such problems in recent history, pointing to the problems experienced by Microsoft SharePoint customers back in November 2003. According to Microsoft, the problem that affected installations of SharePoint Services on Nov. 24 was due to "code that verifies the signatures of the dynamic-link libraries (DLL) that are installed with Windows SharePoint Services." At the time Microsoft said this was due to an error in the verification algorithm that did not permit the signatures of the DLLs to be verified, but as Wilcox noted after some poking around, certificates issued for Microsoft for the purpose of code signing expired on Nov. 24, exactly the same time as SharePoint Service decided it no longer wanted to be installed.

Obviously, Microsoft, Symantec, Java writers, online banks, and managers of ERP systems all need to be more proactive about certificates. I'd like to imagine that both Microsoft and Symantec have learnt their lessons. But the onus must really fall on those organizations who want to be the most trusted of the trusted. Remember, nobody issues certificates to VeriSign--it issues its own, indicating that we should trust it implicitly. Trust is something that has to be earned, and the only way to do that is to be open.

Next time, VeriSign needs to be even more proactive, and work with its partners and customers ahead of time. Just like we all did with Y2K.



-- Anonymous, January 14, 2004

Answers

VeriSign at fault in Norton glitch

Symantec on Friday blamed VeriSign for problems with its security software products that left users' PCs unresponsive and unstable.

The problems caused a flurry of angry posts to the of support forums from users saying they would ditch Symantec's Norton AntiVirus. Some users of the Norton products reported that their PCs locked up or slowed down after downloading the latest virus definitions on Wednesday and Thursday. Symantec itself reported that "after January 7, 2004, your computer slows down, and Microsoft Word and Excel will not start."

But the glitch is not down to Norton AntiVirus, according to Symantec. The Cupertino, Calif.-based company said in a statement on its Web site that the problem "appears to be related to VeriSign receiving an unusual number of requests by Windows-based clients to download a certificate revocation list (CRL) on January 7-8, 2004. This increase in traffic resulted in intermittent VeriSign CRL server availability."

Norton AntiVirus products routinely verify the integrity of system components that use VeriSign-issued certificates. Neither the Mountain View, Calif.-based company nor Symantec could immediately explain the exact sequence of events, but according to the statement on the security software maker's Web site, copies of Norton AntiVirus installed on PCs were unable to achieve the authentication they required because of the unavailability of VeriSign's server. "Therefore, customers experienced delays and instabilities," Symantec said.

Hinting that it was not the only company whose products were affected, Symantec said it "and other vendors" were "cooperatively working with VeriSign to mitigate this situation."

Symantec issued a quick fix for the problem. The fix involves deselecting the option to check for a publisher's certificate revocation in the Internet Explorer browser.

Despite Symantec's protests that it is not to blame, the episode may have created bad publicity for its Norton AntiVirus product. "I am now strongly tempted to trash Norton AV in favour of something more user-friendly and which doesn't slow down the opening of every damned thing in sight!" one poster wrote. "I have been having 16-plus-second delays if I right-clicked on anything--even after a system reboot," another wrote. "I am not happy and have installed Sophos, instead." This individual also expressed discontent with Sophos, "as updates seem incredibly confusing...I shall now try McAfee."

Late on Friday, VeriSign posted an explanation on its Web site, saying the problem with the Certificate Revocation List, which affected Norton AntiVirus, was not connected to an Intermediate Certificate Authority expiration issue, which caused problems for secure Web sites at about the same time last week.

The company said requests to its server suddenly increased a hundredfold, as a result of Windows clients trying to download the CRL. "We immediately took steps to increase capacity and determine the root cause," VeriSign said. It said that within 24 hours, it had increased capacity on crl.verisign.com tenfold to handle the increased request load.

"VeriSign regrets any inconvenience that may have resulted from this period of increased demand," the company said in its online statement. "In addition to increasing capacity, VeriSign has made certain modifications to the CRL distribution logic to more effectively handle subsequent widescale CRL downloads and continues to work with those that may have experienced response delays as a result of the increased demand. We also continue to work with industry leaders, partners and the technical community to encourage promulgation of the use of alternative validity determination mechanisms, such as the online certificate status protocol, which may be less susceptible to these kinds of periodic events."

C/NET

-- Anonymous, January 14, 2004


Moderation questions? read the FAQ