Unplanned Outage – Email – January 28, 2012

We have a report of e-mail trouble. We are investigating.

UPDATE 10:05am: E-mail is \”up\” but running extremely slowly leading to time-out problems. We are continuing to work the issue.

UPDATE 11:08am: We are repairing the e-mail system\’s database. The system will continue to be slow for a while. The problem appears to be a latent fault introduced when our previous storage system failed on January 12, 2012. That hardware has since been replaced, but a flaw in the data had migrated to the new system.

UPDATE 11:48 AM: E-mail service is actually down at the moment, while data is being repaired. We are working to restore service ASAP.

UPDATE 14:00: E-mail service is running again, but still very slow. It may be turned off again at some point, as troubleshooting is still underway.

UPDATE 16:00: E-mail service has been restored. We have located a source of delays in our storage and eliminated it for the time being. Our apologies for the extended inconvenience.

Unplanned Outage – Email – January 28, 2012 Read More »

Unplanned Outage – January 16, 2012

We are still having trouble with the storage array that hosts our virtual infrastructure. We had a short outage on the CS websites but those sites are all back up again as of 12:35pm. Here is a list of things that are currently down:

1) The submission server that runs the check submit script on dropbox.cs
2) opus – the public server

We will post updates to this page as more information becomes available.

UPDATE 1:32pm: The submission server that runs the check submit script on dropbox.cs is now online again.

UPDATE 1:33pm: opus is now online again.

Unplanned Outage – January 16, 2012 Read More »

E-mail / Web – Unplanned Outage – January 12, 2012

As of 7:00am, E-mail and the main web pages are down. We are investigating.

UPDATE 8:05am: Problem is isolated to an old storage server that is used by our virtual machine servers.

UPDATE 8:35am: Storage server is coming back online. After that, all the virtual machines will need a clean restart.

UPDATE 8:45am: Storage server is still having trouble. No ETA yet.

UPDATE 9:35am: We are waiting for a return call from the storage server vendor.

UPDATE 10:05am: Systems are beginning to come back online. Simultaneously, the storage system is rebuilding and remirroring the data. During this time, we expect that systems will be slower. We are also working to stabilize the systems.

UPDATE 11:20am: Systems are basically online. Storage system rebuilding (and associated slowness) continues. E-mail was delayed but none lost.

E-mail / Web – Unplanned Outage – January 12, 2012 Read More »

Downtime: Tue, December 20, 2011

On Tuesday, December 20, 2011, we will have a scheduled downtime from 4:00am to 8:00am EST.

This downtime affects all users of the department\’s computing and networking infrastructure.

Scheduled work includes:

  • Moving several infrastructure file systems to our new storage system. This requires a configuration change to most of our servers that can only be done when the systems are idle.

Downtime: Tue, December 20, 2011 Read More »

Downtime: Saturday, October 29, 2011

On Saturday, October 29, 2011, OIT will have a scheduled network outage from 5:00am to 11:00am EDT.

OIT will be replacing a core component of the campus network infrastructure the morning of Saturday, October 29, 2011, from 5:00am to 11:00am. This is the first Saturday of fall recess.

During the work, the CS network will be disconnected from the outside world. If you are in the CS building during this time, you will be able to access local CS resources. E-mail in or out of the department will be queued and delivered when connectivity is restored.

For details, see the OIT posting.

Downtime: Saturday, October 29, 2011 Read More »

Downtime: Tue, November 1, 2011

On Tuesday, November 1, 2011, we will have a scheduled downtime from 4:00am to 8:00am EDT.

This downtime affects all users of the department\’s computing and networking infrastructure.

Scheduled work includes:

  • Security/bug-fix updates to Zimbra E-mail systems
  • Security/bug-fix/configuration updates to Apache/PHP web servers
  • Minor configuration update to support connection to new HPCRC facility
  • Minor configuration update to facilitate ongoing transition to new file server

During some of the 4am-8am window, most of the services (e.g., E-mail, web, cycle servers) will be unavailable. E-mail destined to the department will be queued and then delivered at the end of the maintenance window.

Downtime: Tue, November 1, 2011 Read More »

IMAP / Webmail Outage – Unplanned – 2011/08/26

Due to a hardware problem with our VMWare Infrastructure, the IMAP and Webmail service for the department is currently offline. We are actively working to remedy the situation, but do not presently have an ETA for a fix. More information will be posted as we have it.

Update (23:54): The broken host has been successfully removed from operation, and things should be working on another host now. IMAP and Webmail service should be restored in the next few minutes.

Update (00:00): IMAP and Webmail service have been restored. It may take a short while for queued e-mails to be delivered, but no messages should be lost.

IMAP / Webmail Outage – Unplanned – 2011/08/26 Read More »