CS Cycles/Ionic/Neuronic System Downtime, Tuesday, January 7, 2025, 07:00-15:00

Date: Tuesday, January 7, 2025 (07:00-15:00)

Who is affected:
All users of the CS Department Beowulf high performance computing clusters,
known as ionic and neuronic.

All users of the CS Staff-managed public login systems, including the
cycles, courselab, and armlab systems.

What is happening:
Ionic and neuronic nodes will have Nvidia, CUDA, and kernel drivers updated
to fix GPU-related failures. In addition, cluster management and job
scheduling system slurm and its database will be upgraded. No data loss is
anticipated. After the upgrade, machines will be rebooted.

Cycles, courselab, and armlab machines will be rebooted during this window
to clear some defunct user processes interfering with research work.

Why is it happening:
Ionic nodes are experiencing various GPU-related failures. To address these
problems, we will be updating Nvidia, CUDA, and kernel modules.

Additionally, some user processes have entered a defunct state, hindering
research activities. To resolve this, a system reboot is necessary to clear
these processes.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Cycles/Ionic/Neuronic System Downtime, Tuesday, January 7, 2025, 07:00-15:00 Read More »

CS Cycles/Ionic/Neuronic System Downtime, Tuesday, July 2, 2024, 06:00-17:00

Date: Tuesday, July 2, 2024 (06:00-17:00)

Who is affected:
All users of the CS Department Beowulf high performance computing cluster,
known as ionic.

All users of the CS Staff-managed public login systems, including the
cycles, courselab, and armlab systems.

What is happening:
During this window, all CS managed systems (cycles, ionic, neuronic,
courselab and armlab) will be upgraded to the latest Red Hat Operating
System – 9.4. In addition, cluster management and job scheduling system
slurm and its database will be upgraded. No data loss is anticipated.

SPECIAL NOTE:
As we are reloading the Linux servers, all crontabs will be deleted. If you
have crontabs that you wish to persist, you will need to back up your
crontabs before the downtime, and restore them after.

In addition, all local disk storage will be wiped, thus resulting in a loss
of any data stored in the /scratch partition. If you have data in /scratch
that needs to survive the reload, please ensure it is copied somewhere safe
before the start of the maintenance.

Why is it happening:
This is part of regular maintenance to keep systems up-to-date.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Cycles/Ionic/Neuronic System Downtime, Tuesday, July 2, 2024, 06:00-17:00 Read More »

CS Mailman Upgrade, Monday, June 10, 2024, 07:00-10:00

Date: Monday, June 10, 2024 (07:00-10:00)

Who is affected:
All email recipients of the CS mailing lists.

What is happening:
CS Staff will upgrade the CS mailing list server as well as the Mailman
Suite to the latest version.

The web interface for the list server will undergo significant changes.

We do not expect any loss of data or mailing lists configurations.

Why is it happening:
Mailman will be upgraded from version 2.1.12 to 3.3.9.

This is part of maintenance to enhance software performance and security.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Mailman Upgrade, Monday, June 10, 2024, 07:00-10:00 Read More »

2024-01-29 – Unplanned Outage

Several services are suffering unplanned outage this morning, including DNS and Web services. At this time, staff are aware, en route, and looking into the issues. More updates will be published as we learn more.

07:50 Update – A problem was located and mitigated with the DNS servers. All services should be returning to normal at this time.
08:10 Update – We are still having issues with the CS DNS servers. We are still working on the issue.
08:59 Update – We are still working on the CS DNS issues. You can using the wireless EDUROAM network to connect to things outside CS.
09:38 Update – We have tracked down the issue for the CS DNS server and things should start returning to normal. The CS clusters are currently offline until we can track down an issue.

10:02 Update – The clusters are back online. All services should be returned to normal.

2024-01-29 – Unplanned Outage Read More »

CS Database Downtime, Monday, January 8, 2024, 07:00-10:00

Hello, everyone.

Reminder for the upcoming scheduled maintenance.

Thank you, and Happy New Year,
CS Staff

—– Forwarded Message —–
From: “CS Staff” <csstaff@cs.princeton.edu>
To: “downtime” <downtime@lists.cs.princeton.edu>
Sent: Wednesday, December 20, 2023 3:26:21 PM
Subject: [downtime] [rescheduled] CS Database Downtime, Monday, January 8, 2024, 07:00-10:00

Due to an unforeseen scheduling conflict, this downtime, previously
announced for Tuesday, is being rescheduled by one day to Monday,
January 8th, 2024.

Please contact CS Staff if it causes you undue hardship.

Thank you,
CS Staff

—– Original Message —–
From: “csstaff” <csstaff@cs.princeton.edu>
To: “downtime” <downtime@lists.cs.princeton.edu>
Sent: Wednesday, December 20, 2023 11:10:35 AM
Subject: [downtime] CS Database Downtime, Tuesday, January 9, 2024, 07:00-10:00

Date: Tuesday, January 9, 2024 (07:00-10:00)

Who is affected:
All users of the CS Department ”publicdb” database server, including
any dependent web properties and all CS Department Beowulf high-performance
computing cluster users, known as ionic.

All users of CS Department administrative web properties (Dropbox, CS
Guide, the Main website, etc.)

What is happening:
During this window, the ”publicdb” database server will be replaced
with a newer server. All existing MariaDB databases will be migrated to the
new server, so no data loss is anticipated. However, while Slurm jobs will
continue, new jobs cannot start during the migration.

In addition, the database server underlying the administrative systems
will be upgraded and replaced. During the upgrade, all database-dependent
administrative systems will be unavailable. This includes the CS Dropbox
service, the main website, the CS Guide, ADM, and any content feeds
provided by CS Staff.

Why is it happening:
The old servers running MariaDB 10.1.24 will be upgraded to newer ones
running MariaDB 10.5.22.

phpMyadmin web interface will be upgraded from version 4.4.14 to 5.2.1.

This is part of regular maintenance to enhance system performance and
security.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Database Downtime, Monday, January 8, 2024, 07:00-10:00 Read More »

[rescheduled] CS Database Downtime, Monday, January 8, 2024, 07:00-10:00

Due to an unforeseen scheduling conflict, this downtime, previously
announced for Tuesday, is being rescheduled by one day to Monday,
January 8th, 2024.

Please contact CS Staff if it causes you undue hardship.

Thank you,
CS Staff

—– Original Message —–
From: “csstaff” <csstaff@cs.princeton.edu>
To: “downtime” <downtime@lists.cs.princeton.edu>
Sent: Wednesday, December 20, 2023 11:10:35 AM
Subject: [downtime] CS Database Downtime, Tuesday, January 9, 2024, 07:00-10:00

Date: Tuesday, January 9, 2024 (07:00-10:00)

Who is affected:
All users of the CS Department ”publicdb” database server, including
any dependent web properties and all CS Department Beowulf high-performance
computing cluster users, known as ionic.

All users of CS Department administrative web properties (Dropbox, CS
Guide, the Main website, etc.)

What is happening:
During this window, the ”publicdb” database server will be replaced
with a newer server. All existing MariaDB databases will be migrated to the
new server, so no data loss is anticipated. However, while Slurm jobs will
continue, new jobs cannot start during the migration.

In addition, the database server underlying the administrative systems
will be upgraded and replaced. During the upgrade, all database-dependent
administrative systems will be unavailable. This includes the CS Dropbox
service, the main website, the CS Guide, ADM, and any content feeds
provided by CS Staff.

Why is it happening:
The old servers running MariaDB 10.1.24 will be upgraded to newer ones
running MariaDB 10.5.22.

phpMyadmin web interface will be upgraded from version 4.4.14 to 5.2.1.

This is part of regular maintenance to enhance system performance and
security.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

[rescheduled] CS Database Downtime, Monday, January 8, 2024, 07:00-10:00 Read More »

CS Database Downtime, Tuesday, January 9, 2024, 07:00-10:00

Date: Tuesday, January 9, 2024 (07:00-10:00)

Who is affected:
All users of the CS Department ”publicdb” database server, including
any dependent web properties and all CS Department Beowulf high-performance
computing cluster users, known as ionic.

All users of CS Department administrative web properties (Dropbox, CS
Guide, the Main website, etc.)

What is happening:
During this window, the ”publicdb” database server will be replaced
with a newer server. All existing MariaDB databases will be migrated to the
new server, so no data loss is anticipated. However, while Slurm jobs will
continue, new jobs cannot start during the migration.

In addition, the database server underlying the administrative systems
will be upgraded and replaced. During the upgrade, all database-dependent
administrative systems will be unavailable. This includes the CS Dropbox
service, the main website, the CS Guide, ADM, and any content feeds
provided by CS Staff.

Why is it happening:
The old servers running MariaDB 10.1.24 will be upgraded to newer ones
running MariaDB 10.5.22.

phpMyadmin web interface will be upgraded from version 4.4.14 to 5.2.1.

This is part of regular maintenance to enhance system performance and
security.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Database Downtime, Tuesday, January 9, 2024, 07:00-10:00 Read More »

CS Ionic Cluster Downtime, Thursday, October 19, 2023, 7:30-9:30

Date: Thursday, October 19, 2023 (7:30-9:30)

Who is affected:
All users of the CS Department Beowulf high performance computing cluster,
known as ionic.

What is happening:
CS Staff will upgrade the cluster management and job scheduling system
Slurm and its database. No reboot will be necessary; thus, we expect to
finish the upgrade earlier than this window. However, the wide time frame
acknowledges the uncertainties involved.

Why is it happening:
The upgrade is necessary to patch against an urgent security bug.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Ionic Cluster Downtime, Thursday, October 19, 2023, 7:30-9:30 Read More »

CS Storage System Downtime, Tuesday, August 29, 2023, 07:30-10:30

Date: Tuesday, August 29, 2023 (07:30-10:30)

Who is affected:
All users of CS Department storage systems, including project spaces, home
directories, and web spaces.

What is happening:
The central file storage cluster will be rebooted a few times during this
window in order to facilitate physical upgrades. During the reboots, file
services will be interrupted, but will resume after the cluster finishes
its boot. This will affect access to the cycles login hosts, CS Department
web services, and CS Department SMB/CIFS services. Email services should
not be affected.

Actual outage time is not expected to encompass the full 3 hour window, but
may occur sporadically during this period.

A reservation has been placed on the ionic cluster to hold all jobs that
would overlap with this maintenance. Jobs will automatically start again
after completion of the work.

Why is it happening:
The storage cluster will be upgraded during this window. The upgrades will
modernize the backend network of the cluster, as well as add new all-flash
nodes to speed up front-end operations.

This, combined with recent front-end network upgrades, will continue the
improvements to the cluster necessary to prepare for the arrival of the new
SEAS HPC cluster that will be hosted in CS.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff

CS Storage System Downtime, Tuesday, August 29, 2023, 07:30-10:30 Read More »

Scroll to Top