CS Ionic Cluster Downtime, Wednesday, July 26, 2023, 05:00-17:00

Date: Wednesday, July 26, 2023 (05:00-17:00)

Who is affected:
All users of the CS Department Beowulf high performance computing cluster,
known as ionic.

What is happening:
CS Staff will upgrade the ionic cluster to the latest Springdale 9
distribution. In addition, cluster management and job scheduling system
slurm and its database will be upgraded.

SPECIAL NOTE: As we are reloading the Linux servers, all local disk storage
will be wiped, thus resulting in a loss of any data stored in the /scratch
partition. If you have data in /scratch that needs to survive the reload,
please ensure it is copied somewhere safe before the start of the
maintenance.

Please note that the downtime window is significantly longer than our usual
windows due to the high-touch nature of OS reinstallations. We expect to
finish the upgrades earlier than this window, but the wide time frame
acknowledges the uncertainties involved.

Why is it happening:
This is part of the routine maintenance and will bring newer versions of
installed tools and software.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Ionic Cluster Downtime, Wednesday, July 26, 2023, 05:00-17:00 Read More »

CS Cycles Downtime, Wednesday, July 26, 2023, 08:00-10:00

Date: Wednesday, July 26, 2023 (08:00-10:00)

Who is affected:
All users of the CS Staff-managed public login systems, including the
cycles, courselab, and armlab systems.

What is happening:
CS Staff will upgrade the user-accessible servers in our infrastructure,
including cycles, courselab, and armlab.
The systems will be upgraded to the latest Springdale 9 distribution for
the x86_64 architecture and RockyLinux 9 distribution for the aarch64
architecture (i.e., armlab).

To help ensure a smooth transition, we currently have the new distribution
installed on the following servers for your testing. Please keep in mind
that these servers are only reachable from inside the CS network.

cycles-test
courselab-test
armlab-test

SPECIAL NOTE: As we are reloading the Linux servers, all crontabs will be
deleted. If you have crontabs that you wish to persist, you will need to
back up your crontabs before the downtime and restore them after.

Why is it happening:
This is part of the routine maintenance of the publicly-accessible systems
and will bring newer versions of installed tools and software.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Cycles Downtime, Wednesday, July 26, 2023, 08:00-10:00 Read More »

CS Storage Maintenance, Wednesday, June 14, 2023, 08:15-10:15

Date: Wednesday, June 14, 2023 (08:15-10:15)

Who is affected:
Users of CS Department Disk Storage Facilities, including home directories,
project spaces, and web servers.

What is happening:
During this window, the CS Department’s primary storage cluster will have
its network relocated and upgraded.

NO OUTAGE is expected, but some users may experience brief pauses in
service or the need to disconnect and reconnect mounted filesystems.

Why is it happening:
This is part of a larger project upgrading the capacity of the CS
Department’s research network. This change will vastly increase the
available network bandwidth to the department’s central Isilon storage.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.
Sincerely,
CS Staff

CS Storage Maintenance, Wednesday, June 14, 2023, 08:15-10:15 Read More »

[downtime] Major Outage – Unplanned Power Failure in HPCRC

Good morning.
You may notice that CS systems have had a significant outage overnight. This was due to major power problems at the campus data center facility at Forrestal. All of our systems in the HPCRC were unexpectedly powered off and suffered various other power effects throughout the evening and night, and some did not recover on their own.
CS Staff members have been on site at the data center and also working remotely to recover systems this morning. At this time, all major systems are expected to be online again. Some ionic cluster nodes are still down, but are being brought back.
If you notice any persistent issues with CS systems, please let us know and we will do our best to address them. Thank you for your patience.
-CS Staff

[downtime] Major Outage – Unplanned Power Failure in HPCRC Read More »

[downtime] CS Network Downtime, Wednesday, May 31, 2023, 08:00-10:00

Date: Wednesday, May 31, 2023 (08:00-10:00)
Who is affected:
All users of CS Department computing facilities and services, including cycles, ionic, web services, email, DNS, and wired networking.
What is happening:
During this window, the core switch handling CS Department network traffic at the HPCRC will be replaced. The actual outage time for any particular service or network access point should be only a few seconds, and the total outage window is likely to be shorter than announced. However, owing to the uncertain nature of technological change, outages may occur throughout this window and may be up to several minutes in length.
Why is it happening:
This upgrade will replace a 12-year-old core switch with a new, much faster device. This is the first stage of more upgrades upcoming this Summer, primarily focused on increasing network capacity for the ionic HPC cluster, the Department’s central storage cluster, and other related systems.
We will post updates to the status page: www.csstaff.org as necessary.
If this downtime will cause you undue hardship, please contact csstaff@cs.princeton.edu immediately, so we can discuss options to reduce any negative impact. Your patience is appreciated.
Sincerely,
CS Staff

[downtime] CS Network Downtime, Wednesday, May 31, 2023, 08:00-10:00 Read More »

Email Service Outage *Unplanned Delay*

The email server upgrade scheduled for this morning has run into unexpected issues. As a result, email service is not working properly. We are working to correct the situation as quickly as possible, and will update here as new information becomes available.

Update 08:57 – We continue to work to recover the mail systems, but they will not be ready in the original scheduled window. We apologize for the inconvenience.

Update 10:02 – We are working with the vendor to recover the mail systems. We apologize for the inconvenience.

Update 15:18 – We now believe the service is back to normal operation. Most incoming emails were likely queued and have probably been delivered by now. If you have ongoing issues, please reach out to CS Staff. Thank you for your patience!

Email Service Outage *Unplanned Delay* Read More »

[downtime] TONIGHT: CS Building Power Outage, Monday, March 13,

Today is the day for this scheduled power shutdown. Please see the below announcement and remember to power down and unplug any and all equipment you control in the CS Building or Friend Center before ending your day today!

Thanks for your time and attention.

Sincerely,
CS Staff

—– Forwarded Message —–
From: \”csstaff\”
To: \”downtime\”
Sent: Friday, February 3, 2023 10:06:44 AM
Subject: [downtime] CS Building Power Outage, Monday, March 13, 2023, 22:00-02:00

Date: Monday, March 13, 2023 (22:00-02:00)

Who is affected:
ALL users and occupants of the CS Building (35 Olden St) and Friend Center

What is happening:
From 22:00 (10PM) until 02:00 (2AM) on the night of Monday, March 13, 2023,
ALL power to the Computer Science building and the Friend Center will be
cut.

Emergency generator power will remain, so emergency lighting and the
building network will remain powered. ALL OTHER POWER will be off. It is
VERY IMPORTANT that all equipment, including that in labs and in Room 002,
be powered off on Monday evening prior to the shutdown. This includes
computers, printers, copiers, or anything else that runs on electricity.
Sensitive equipment will further benefit from being unplugged or physically
switched off in order to avoid any effects from possible fluctuations in
power quality during the work.

Why is it happening:
In August of 2022, a sprinkler head in the basement power vault opened,
flooding the main power feeds for the building with water and causing
damage to a main breaker. Since that time, the building has been operating
on a single breaker from a redundant set while the damaged breaker was sent
away for repairs. This outage will be used to re-install the repaired
breaker and return the building to normal operating status. Until this
repair is completed, the building power is more vulnerable than usual to a
long-term outage in the event the single remaining breaker is compromised.

We will post updates to the status page: http://www.csstaff.org
as necessary.

Note that this outage DOES NOT affect the CS computing infrastructure,
which is housed in the Forrestal Campus data center. All departmental
computing and network services are expected to continue unaffected.

If you have questions or concerns about this outage, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] TONIGHT: CS Building Power Outage, Monday, March 13, Read More »

[downtime] CS Email Downtime, Thursday, March 16, 2023, 07:00-09:00

Date: Thursday, March 16, 2023 (07:00-09:00)

Who is affected:
Users of CS Department email services

What is happening:
During this window, the CS Department email servers will be upgraded. The
actual outage for any given account will be relatively brief, but the
overall work may be longer. The outage for any particular account may occur
at any time during the scheduled window.

The expected outage behavior is that you may be unable to read email on
your account for several minutes. Sending email may also be interrupted for
some configurations. If you find your account behaving strangely, wait a
few minutes and reconnect or reload, which should reestablish expected
behavior.

Why is it happening:
This upgrade will apply security and maintenance updates to the mail
servers as part of routine system hygiene.

We will post updates to the status page: http://www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] CS Email Downtime, Thursday, March 16, 2023, 07:00-09:00 Read More »

[downtime] REMINDER: CS Building Power Outage, Monday, March 13,

This is a reminder that we are ONE WEEK away from this scheduled power shutdown. Please see the below announcement and ensure you have a plan to power down and unplug any and all equipment you control in the CS Building or Friend Center before the start of the outage on Monday night.

Thanks for your time and attention.

Sincerely,
CS Staff

—– Forwarded Message —–
From: \”csstaff\”
To: \”downtime\”
Sent: Friday, February 3, 2023 10:06:44 AM
Subject: [downtime] CS Building Power Outage, Monday, March 13, 2023, 22:00-02:00

Date: Monday, March 13, 2023 (22:00-02:00)

Who is affected:
ALL users and occupants of the CS Building (35 Olden St) and Friend Center

What is happening:
From 22:00 (10PM) until 02:00 (2AM) on the night of Monday, March 13, 2023,
ALL power to the Computer Science building and the Friend Center will be
cut.

Emergency generator power will remain, so emergency lighting and the
building network will remain powered. ALL OTHER POWER will be off. It is
VERY IMPORTANT that all equipment, including that in labs and in Room 002,
be powered off on Monday evening prior to the shutdown. This includes
computers, printers, copiers, or anything else that runs on electricity.
Sensitive equipment will further benefit from being unplugged or physically
switched off in order to avoid any effects from possible fluctuations in
power quality during the work.

Why is it happening:
In August of 2022, a sprinkler head in the basement power vault opened,
flooding the main power feeds for the building with water and causing
damage to a main breaker. Since that time, the building has been operating
on a single breaker from a redundant set while the damaged breaker was sent
away for repairs. This outage will be used to re-install the repaired
breaker and return the building to normal operating status. Until this
repair is completed, the building power is more vulnerable than usual to a
long-term outage in the event the single remaining breaker is compromised.

We will post updates to the status page: http://www.csstaff.org
as necessary.

Note that this outage DOES NOT affect the CS computing infrastructure,
which is housed in the Forrestal Campus data center. All departmental
computing and network services are expected to continue unaffected.

If you have questions or concerns about this outage, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] REMINDER: CS Building Power Outage, Monday, March 13, Read More »

Scroll to Top