Article #882: Degraded performance of several systems
We have seen a significant wave of these events this morning, September 21. For the most part, this wave seems to have been linked to a storage probl...
We have seen a significant wave of these events this morning, September 21. For the most part, this wave seems to have been linked to a storage probl...
UPDATE As of 5:30 pm. Friday, 5 August, 2016, we believe the problem affecting access to the Data Depot has been corrected. Thank you for your patienc...
As of 3:20 pm, the self-service tool is back in action. An issue with the database backing authentication was discovered and repaired. Original messag...
As of 7:30 pm, all methods for connecting to Data Depot have been restored to working order. All connections with Samba (Network Drive mappings: datad...
Engineering Computing Network (ECN) will be performing scheduled maintenance this weekend on several ECN server resulting in their unavailability for...
The underlying storage has been fixed, and all these clusters have been returned to normal operations as of 10:00pm EDT. As of Tuesday, June 7th, 201...
Networking to and from campus, and around large parts of campus are down. Many services are unreachable at the moment. We will provide updates as they...
Networking to and from campus, and around large parts of campus are down. Many services are unreachable at the moment. We will provide updates as they...
The problem is now RESOLVED after the reboot of a router. ======= The network serving Snyder is currently experiencing issues. Attempts to log in to t...
The Isilon filesystem was restored to normal service and all affected clusters had it remounted as quickly as was sustainable by the filesystem. This...
UPDATE: The issue with Carter's scratch filesystem has been resolved. The filesystem is now available. Job scheduling on the cluster has been resum...
Job scheduling on Hansen has returned to normal. This concludes the outage. Original Message: Hansen is not currently scheduling any new jobs. A file...
The scratch storage on Carter and Scholar has been returned to normal operations. The rebuild process will be continuing in the background, so we wil...
Update As of 5:20pm the standby and standby-c queues have been started and their jobs are being scheduled for execution. The standby queues on Hansen...
Outage RESOLVED A misconfiguration that caused an unneeded IB driver to be loaded was fixed. Peregrine-1 is back online. Job scheduling is on. Origi...
As of Monday, March 7th, 2016 at 12:30pm EST, the Peregrine1 cluster is unavailable due to a failed network switch in its datacenter. This switch is...
Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect ag...
The Depot filesystem checks have all completed cleanly and the Depot has been fully returned to normal operations. All queues on all clusters are sch...
As of 9:15 PM, the Snyder and Rice clusters have been brought back into service after cooling was brought back online. Front-ends are operational and...
There was an issue with the cluster's gateway switches, causing infiniband traffic to be incapable of IP over infiniband. This also caused an instabil...