Princeton CS Status

Brief Emergency Downtime Today, July 2, 2009

All / scott

To apply a critical security patch, we will bring down the e-mail server for a few minutes at 1:30pm today.

The downtime is expected to be roughly 5 minutes.

Update: 1:36pm – E-mail is back up.

Brief Emergency Downtime Today, July 2, 2009 Read More »

Summer 2009 Downtime Schedule

All / scott

Here is the maintenance schedule for the summer:

Tue,	Jun	16,	2009, 4:00am-8:00am
Tue,	Jun	30,	2009, 4:00am-8:00am
Tue,	Jul	14,	2009, 4:00am-8:00am
Tue,	Jul	28,	2009, 4:00am-8:00am
Tue,	Aug	11,	2009, 4:00am-8:00am
Tue,	Aug	25,	2009, 4:00am-8:00am

During these times we will be performing a variety of update, installation, and maintenance tasks.

Summer 2009 Downtime Schedule Read More »

Because our mail server was sending large amounts of spam, we have temporarily turned off the SMTP portion of the system while we identify, isolate, and correct the problem. As a result, outgoing mail is disabled. Users are able to receive mail. Users can also create draft messages but these must be manually sent after the SMTP server is re-enabled.

Update 9:07am, April 3, 2009:
Outgoing mail is re-enabled. Downtime was approximately 10 minutes.

Outgoing Mail is Down Read More »

Downtime: Tuesday, March 17, 2009

All / scott

On Tuesday, March 17, 2009, we will have a scheduled downtime from 4:00am to 8:00am EDT.

This downtime affects all users of the department\’s computing and networking infrastructure.

Scheduled work includes:

Moving data in user home directories to a new disk array
Updating the software for our e-mail system
Rebooting the c2 cluster to pick-up some recent configuration changes

This work is part of normal maintenance.

Downtime: Tuesday, March 17, 2009 Read More »

Downtime: Tuesday, January 27, 2009

All / scott

On Tuesday, January 27, 2009, we will have a scheduled downtime from 4:00am to 8:00am EST.

This downtime affects all users of the department\’s computing and networking infrastructure.

Scheduled work includes:

Reconfiguring the wireless network
Updates to our NetApp filer
Updates to the OS on the cycles machines
Updates to our e-mail system

This work is part of normal maintenance. Note that this downtime overlaps the OIT shutdown of all campus network services. For details, see their posting.

The change that will affect users the most is the reconfiguration of our wireless infrastructure. After the change, all wireless users should use the \”csvapornet\” SSID. For details see the CS Guide.

Downtime: Tuesday, January 27, 2009 Read More »

Connectivity Problems

All / scott

Due to a yet-to-be identified source, we are seeing very large bursts of connections to large numbers of outside IP addresses. These hour-long bursts occurred at approximately 1:00am and 7:00pm on Sunday, and 1:00am and 7:00am on Monday. These events filled the firewall connection table and disrupted connections for about 3 hours each.

Update: While the source has been identified, we have not been able to reach the user. The traffic began again at 1:00pm today. We have disabled that port. You may notice some delays for a few more minutes while the network settles.

Connectivity Problems Read More »

Downtime: Thursday, December 18, 2008

All / scott

On Thursday, December 18, 2008, we will have a scheduled downtime from 8:00am to 10:00am EST.

This downtime only affects direct and indirect users of the project file server. This includes the web servers, cycle servers, c2 cluster, ftp server, and the ftp mirror.

Note that e-mail, networking, the CVS server, and the database machines will remain operational during this time.

As one of the steps to clean up the file system mess, we will do a final sync between our temporary storage and our re-built production storage.

Downtime: Thursday, December 18, 2008 Read More »

Downtime: Thursday, December 11, 2008

All / scott

On Thursday, December 11, 2008, we will have a scheduled downtime from 4:00am to 8:00am EST.

This downtime only affects direct and indirect users of the project file server. This includes the web servers, cycle servers, c2 cluster, ftp server, and the ftp mirror.

Note that e-mail, networking, the CVS server, and the database machines will remain operational during this time.

As one of the steps to clean up the file system mess, we will do a final sync between our problematic storage and temporary storage. We will then put the the temporary storage into production until we rebuild our storage pool.

Downtime: Thursday, December 11, 2008 Read More »

Downtime: Wednesday, December 10, 2008

All / scott

On Wednesday, December 10, 2008, we will have a scheduled downtime from 4:00am to 8:00am EST.

This downtime only affects users of the beowulf clusters (c2, c3, and hbar). All other services (e.g., e-mail, web, databases, file servers, and cycle servers) will remain operational.

Scheduled work includes:

Move the nodes in the test cluster (c3) into the production cluster (c2).
Upgrade the production cluster (c2) to Rocks 5.
Decommission the hbar cluster with its single compute node.

With the participation of Jennifer Rexford, Fei-Fei Li, and David Blei, we are adding 14 additional nodes to the cluster. Eleven of these nodes have 16 GB RAM (instead of 8 GB RAM) and 8 cores per node (instead of 4 cores per node).

The hbar cluster was created specifically so that users could experiment with an 8-core machine. The expansion of c2 makes hbar obsolete. As a result, we will decommission hbar.

Downtime: Wednesday, December 10, 2008 Read More »

Emergency Downtime: Tue, Nov 25, 2008

All / scott

TODAY, Tuesday, November 25, 2008, we will have an emergency downtime from 1:00pm to 2:00pm EST (note time).

During this time we will be shutting down our infrastructure so that we can (1) revert to a previous version of the software on our faulty file system, and (2) move some of our service/infrastructure file systems to a temporary volume on an alternate file server.

We are taking these actions to help alleviate some of the file system problems we are experiencing. These changes should make the department web sites, FC 010 lab, moodle, and CVS more stable. Accessing the project file space should be no worse than it is now; our expectation is that it will be better, but still under par.

Update 1:42pm: Systems are back on-line. With additional information and upon further consideration, we opted to only perform part (2) above at this time. The department web sites, FC 010 lab, moodle, and CVS should be more stable and more responsive. We postponed reverting the file system software to allow our vendor additional time to debug the problem. We are likely to revert to a previous state early tomorrow morning.

Emergency Downtime: Tue, Nov 25, 2008 Read More »