Even in the computer world, sometimes the cure can be worse than the disease.
Networking Security Information Services (NSIT) learned this lesson when they discovered that the anti-virus software intended to prevent a dangerous influx of corrupted e-mails was in fact the cause of the service interruption on Thursday.
Bob Bartlett, director of Enterprise Network Services and network security director of NSIT, said NSIT diagnosed the anti-virus software as the apparent cause of the shutdown on Friday. This software has blocked approximately 1.4 million viral messages since September 2003, Bartlett said, but last week, it encountered a viral message that had not previously been processed, triggering a type of bug known as a "memory leak."
The result was that the filtering program continued to claim memory on Plaisance, the machine that allows members of the University community to read their mail.
"Eventually, so much memory was taken up that the system could no longer function, and we could not submit commands to the system to either observe the reason for the system problems or shut the system down in a way that would ensure that data was not lost," Bartlett said. "As a result, we had to handle the situation much more slowly than we would have liked."
Due to the cautious approach taken by NSIT, the mail system became unavailable for several hours while officials and engineers worked to regain control of it. Bartlett said that no inboxes were lost and less than 30 of about 20,000 showed any indication of a problem.
Bartlett added that NSIT is taking several precautions against future service interruptions.
"First, the software that was being used has been reviewed and fixed. Second, we have been looking at other anti-virus software. And lastly, we are in the process of redesigning the mail system," said Bartlett. "In addition to an architecture designed by the systems staff, we are investigating several commercial options."
Bartlett said a vendor would install a new system on campus around March 22, which NSIT will be testing as a possible replacement for the current system. Bartlett noted that both systems are designed to minimize the potential damage from an incident like the one that occurred last Thursday.
Greg Jackson, vice president and chief information officer at NSIT, said he was not sure such incidents could be completely prevented in the future, so long as "idiots" were sending around viruses. "The clear thing we can do is learn from things like this," he said. "The question is: Do you let viruses take down the University or do you act? You most definitely act."