On Tuesday, June 10, 2008, we will have a scheduled downtime of the entire CS computing and networking infrastructure during normal business hours beginning at 6:00am. We don\’t have an exact completion time but anticipate that everything will be back up before 2:00pm.
Who is affected:
- This downtime will impact all users of the departmental infrastructure. We will power down all equipment in room 218 including the network (wired and wireless), web servers, mail servers, compute servers, and file servers.
What is happening:
- We are upgrading our battery-backup power system for room 218. This includes replacing the existing UPS (40kVA, 208V, 3φ) with a larger unit (80kVA, 480V, 3φ). Because we are reconfiguring the system to operate at 480V and installing a new bypass switch, we must power-down the entire machine room to perform the work.
Why it is happening:
- Due to continued growth of the department\’s infrastructure, we reached the capacity of our current power configuration. This new configuration will allow us to install additional infrastructure equipment to meet the department\’s needs
Update 2:10pm:While the work is moving along smoothly, it is taking longer than anticipated. Our new estimate to be online is 5:00pm.
Update 5:05pm:A circuit breaker has failed in the new UPS. We are working with the vendor to get a replacement unit ASAP. We are hopeful that this will be first thing in the morning. We\’ll know later tonight about the specific ETA. We do not anticipate the systems coming back online tonight.
Update 6:25pm:The field engineer was able to track down a replacement circuit breaker in Virginia. It will be shipped overnight and is due to be in the building by 8:30am. The earliest we anticipate being back online is 11:00am. However, there are still several unknowns so this is still only a lower-bound.
Update Wednesday, 12:25pm: The field engineer installed the replacement circuit breaker. Unfortunately, it exhibited the same problem and now he is debugging the system. We put in a call to the vendor and the engineer\’s supervisor is on his way (with additional spare parts, if needed) to assist with the troubleshooting. We are now simultaneously working to get the new UPS online and weighing our options in the event that things continue to drag out. One option is to bring things back online without any protection from power hits; this is risky as without protection, a power event can bring down the room and damage equipment. The last time this occurred, it degraded our systems and resulted in a series of failures over a period of weeks; some of the failures led to the permanent loss of user data. In addition, we will need to bring the room back down to complete the UPS installation in any event. We understand that we must get the systems back online ASAP.
Update Wednesday, 5:25pm: Our new UPS is up and running. We have begun our normal start-up procedures. We anticipate being online at approximately 6:00pm.