CS Ionic Cluster Downtime, Thursday, October 19, 2023, 7:30-9:30

Date: Thursday, October 19, 2023 (7:30-9:30)

Who is affected:
All users of the CS Department Beowulf high performance computing cluster,
known as ionic.

What is happening:
CS Staff will upgrade the cluster management and job scheduling system
Slurm and its database. No reboot will be necessary; thus, we expect to
finish the upgrade earlier than this window. However, the wide time frame
acknowledges the uncertainties involved.

Why is it happening:
The upgrade is necessary to patch against an urgent security bug.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Ionic Cluster Downtime, Thursday, October 19, 2023, 7:30-9:30 Read More »

CS Storage System Downtime, Tuesday, August 29, 2023, 07:30-10:30

Date: Tuesday, August 29, 2023 (07:30-10:30)

Who is affected:
All users of CS Department storage systems, including project spaces, home
directories, and web spaces.

What is happening:
The central file storage cluster will be rebooted a few times during this
window in order to facilitate physical upgrades. During the reboots, file
services will be interrupted, but will resume after the cluster finishes
its boot. This will affect access to the cycles login hosts, CS Department
web services, and CS Department SMB/CIFS services. Email services should
not be affected.

Actual outage time is not expected to encompass the full 3 hour window, but
may occur sporadically during this period.

A reservation has been placed on the ionic cluster to hold all jobs that
would overlap with this maintenance. Jobs will automatically start again
after completion of the work.

Why is it happening:
The storage cluster will be upgraded during this window. The upgrades will
modernize the backend network of the cluster, as well as add new all-flash
nodes to speed up front-end operations.

This, combined with recent front-end network upgrades, will continue the
improvements to the cluster necessary to prepare for the arrival of the new
SEAS HPC cluster that will be hosted in CS.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Storage System Downtime, Tuesday, August 29, 2023, 07:30-10:30 Read More »

CS System Downtime – Project Web Server, Wednesday, July 26, 2023, 08:00-10:00

Date: Wednesday, July 26, 2023 (08:00-10:00)

Who is affected:
All users of the CS Department project web space service.

What is happening:
CS Staff will upgrade the web project servers to the latest Springdale 9
distribution.

PHP on this server will be upgraded from version 8.0.13 to 8.1.14, and
Phusion Passenger, the system which allows for support of web application
frameworks, will be upgraded from version 6.0.14 to 6.0.18. There are
several incompatibility changes between the PHP versions, and some project
websites will need code upgrades/adjustments to work properly on the new
server. You can read more about the changes between the PHP versions on
these pages:

www.php.net/manual/en/migration81.php
www.php.net/manual/en/migration81.deprecated.php
www.php.net/manual/en/migration81.incompatible.php

Note the “Backward Incompatible Changes” link, which is worth reviewing to
prepare for your site update.

We don’t anticipate any Phusion Passenger breaking changes; however, if
you’d like to review some of the newest features, please review the
following link(s).

blog.phusion.nl/2023/06/12/passenger-6-0-18/

CS Staff is performing a basic review of each project website on the
upgraded web server, and /most/ sites appear to be in good working order.
We will contact site owners directly for sites with obvious compatibility
issues to advise on expected changes. However, as it is impossible for us
to review all possible aspects of your site, we strongly encourage you to
review your site after the upgrade on July 26 to ensure it is working as
expected, as well as review the PHP changes before the upgrade to
anticipate changes you may need to make.

Please note that the above changes apply ONLY to the project websites.
Personal (“tilde”) sites and any other content hosted under
“www.cs.princeton.edu” are not yet affected by this upgrade. If you are
concerned that your site may need substantial change and would like to
review it using the new web server before the upgrade, please contact
[csstaff at cs.princeton.edu] for assistance.

Why is it happening:
This is part of the routine maintenance of the web servers and will bring
newer versions of installed tools and software.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS System Downtime – Project Web Server, Wednesday, July 26, 2023, 08:00-10:00 Read More »

CS Ionic Cluster Downtime, Wednesday, July 26, 2023, 05:00-17:00

Date: Wednesday, July 26, 2023 (05:00-17:00)

Who is affected:
All users of the CS Department Beowulf high performance computing cluster,
known as ionic.

What is happening:
CS Staff will upgrade the ionic cluster to the latest Springdale 9
distribution. In addition, cluster management and job scheduling system
slurm and its database will be upgraded.

SPECIAL NOTE: As we are reloading the Linux servers, all local disk storage
will be wiped, thus resulting in a loss of any data stored in the /scratch
partition. If you have data in /scratch that needs to survive the reload,
please ensure it is copied somewhere safe before the start of the
maintenance.

Please note that the downtime window is significantly longer than our usual
windows due to the high-touch nature of OS reinstallations. We expect to
finish the upgrades earlier than this window, but the wide time frame
acknowledges the uncertainties involved.

Why is it happening:
This is part of the routine maintenance and will bring newer versions of
installed tools and software.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Ionic Cluster Downtime, Wednesday, July 26, 2023, 05:00-17:00 Read More »

CS Cycles Downtime, Wednesday, July 26, 2023, 08:00-10:00

Date: Wednesday, July 26, 2023 (08:00-10:00)

Who is affected:
All users of the CS Staff-managed public login systems, including the
cycles, courselab, and armlab systems.

What is happening:
CS Staff will upgrade the user-accessible servers in our infrastructure,
including cycles, courselab, and armlab.
The systems will be upgraded to the latest Springdale 9 distribution for
the x86_64 architecture and RockyLinux 9 distribution for the aarch64
architecture (i.e., armlab).

To help ensure a smooth transition, we currently have the new distribution
installed on the following servers for your testing. Please keep in mind
that these servers are only reachable from inside the CS network.

cycles-test
courselab-test
armlab-test

SPECIAL NOTE: As we are reloading the Linux servers, all crontabs will be
deleted. If you have crontabs that you wish to persist, you will need to
back up your crontabs before the downtime and restore them after.

Why is it happening:
This is part of the routine maintenance of the publicly-accessible systems
and will bring newer versions of installed tools and software.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Cycles Downtime, Wednesday, July 26, 2023, 08:00-10:00 Read More »

CS Storage Maintenance, Wednesday, June 14, 2023, 08:15-10:15

Date: Wednesday, June 14, 2023 (08:15-10:15)

Who is affected:
Users of CS Department Disk Storage Facilities, including home directories,
project spaces, and web servers.

What is happening:
During this window, the CS Department’s primary storage cluster will have
its network relocated and upgraded.

NO OUTAGE is expected, but some users may experience brief pauses in
service or the need to disconnect and reconnect mounted filesystems.

Why is it happening:
This is part of a larger project upgrading the capacity of the CS
Department’s research network. This change will vastly increase the
available network bandwidth to the department’s central Isilon storage.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.
Sincerely,
CS Staff

CS Storage Maintenance, Wednesday, June 14, 2023, 08:15-10:15 Read More »

[downtime] Major Outage – Unplanned Power Failure in HPCRC

Good morning.
You may notice that CS systems have had a significant outage overnight. This was due to major power problems at the campus data center facility at Forrestal. All of our systems in the HPCRC were unexpectedly powered off and suffered various other power effects throughout the evening and night, and some did not recover on their own.
CS Staff members have been on site at the data center and also working remotely to recover systems this morning. At this time, all major systems are expected to be online again. Some ionic cluster nodes are still down, but are being brought back.
If you notice any persistent issues with CS systems, please let us know and we will do our best to address them. Thank you for your patience.
-CS Staff

[downtime] Major Outage – Unplanned Power Failure in HPCRC Read More »

[downtime] CS Network Downtime, Wednesday, May 31, 2023, 08:00-10:00

Date: Wednesday, May 31, 2023 (08:00-10:00)
Who is affected:
All users of CS Department computing facilities and services, including cycles, ionic, web services, email, DNS, and wired networking.
What is happening:
During this window, the core switch handling CS Department network traffic at the HPCRC will be replaced. The actual outage time for any particular service or network access point should be only a few seconds, and the total outage window is likely to be shorter than announced. However, owing to the uncertain nature of technological change, outages may occur throughout this window and may be up to several minutes in length.
Why is it happening:
This upgrade will replace a 12-year-old core switch with a new, much faster device. This is the first stage of more upgrades upcoming this Summer, primarily focused on increasing network capacity for the ionic HPC cluster, the Department’s central storage cluster, and other related systems.
We will post updates to the status page: www.csstaff.org as necessary.
If this downtime will cause you undue hardship, please contact csstaff@cs.princeton.edu immediately, so we can discuss options to reduce any negative impact. Your patience is appreciated.
Sincerely,
CS Staff

[downtime] CS Network Downtime, Wednesday, May 31, 2023, 08:00-10:00 Read More »

Email Service Outage *Unplanned Delay*

The email server upgrade scheduled for this morning has run into unexpected issues. As a result, email service is not working properly. We are working to correct the situation as quickly as possible, and will update here as new information becomes available.

Update 08:57 – We continue to work to recover the mail systems, but they will not be ready in the original scheduled window. We apologize for the inconvenience.

Update 10:02 – We are working with the vendor to recover the mail systems. We apologize for the inconvenience.

Update 15:18 – We now believe the service is back to normal operation. Most incoming emails were likely queued and have probably been delivered by now. If you have ongoing issues, please reach out to CS Staff. Thank you for your patience!

Email Service Outage *Unplanned Delay* Read More »

Scroll to Top