Article #1021: Email notifications from Research Computing website broken
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing we...
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing we...
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, an...
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you...
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated t...
*** Update *** As of 7:00 pm, the problem on the scratch system has been corrected, and scheduling has resumed on all three affected clusters - Rice,...
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the...
The Data Depot file system was sporadically available for 2 hours today. Some jobs running on the Community Clusters paused during the instability but...
Halstead nodes continue to come back online. While the cluster is operating normally, the total amount of available nodes is not yet at full capacity...
The Fortress archival storage system is currently experiencing intermittent connectivity. We expect the situation to be resolved by approximately 1pm....
The scratch filesystems serving Carter, Hammer, Rice, Scholar, and Snyder started behaving abnormally this morning. This may have affected some jobs,...
The Research Data Depot has been restored to service. A portion of the systems serving the Research Data Depot have suffered a failure. Some systems u...
The scratch filesystem serving Hammer, Rice, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
Following the security updates on Halstead, an issue was discovered that prevented multi-node MPI jobs from running properly. Scheduling on Halstead h...
The scratch filesystem serving Conte is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until th...
System monitoring has revealed intermittent issues connecting to the Research Data Depot on Thursday January 19. When this issue occurs, users will ex...
Following the restoration of power to the affected building, the EXRC cluster has been returned to service on Thursday, December 22nd, 2016 at 2:45pm...
UPDATE As of 7:50 pm, Wednesday, 14 December 2016, this issue is completely resolved. UPDATE As of about 6:00 pm another problem has been found in the...
Update: Engineers were able to isolate the problem and restart the necessary systems. The Data Depot should be available again. Halstead users should...
Job scheduling was paused on Radon between 6 pm and 7 pm this evening. Node monitoring processes marked most nodes offline around 6 pm, preventing new...
This issue has been resolved. Original Message: A portion of the systems serving the Research Data Depot have suffered a failure. Some systems using D...