Media/Filesender/FTP outage

Ticket Details

Ticket ID 1312119
Subject Media/Filesender/FTP outage
Status open
Date Start Mon Apr 09 15:00:00 2018
Date End Not set
Scheduled NO

Problem Description

A disk failure in the Media Hosting storage cluster the Media Hosting and Filesender services are offline.

Problem Effects

Both the Media Service and Filsender will be offline until the storage cluster has fully recovered.

Last Update Message

Thu May 03 12:25:35 2018

Media has been restored to full availability, and we continue to monitor the service.

Affected Clients

n/a

Affected Services

Video Streaming

Transaction History

Date Created Message
Mon Apr 09 17:09:17 2018 A disk failure in the Media Hosting storage cluster has degraded performance of the Media Hosting service. Live and popular content held in the caching layer is not affected. Uploading of new content and content not viewed recently will have unreliable service until the storage cluster has fully recovered. The estimated fix time is 6pm on the 9/4/2018. Updates will be added as new information is available.
Mon Apr 09 19:02:23 2018 Due to ongoing issues with the storage cluster the HEAnet Media and FTP services are off-line. Restoration work is ongoing, but there is no estimated time for service recovery at this point.
Mon Apr 09 22:52:31 2018 The issues with the storage system have resulted in a loss of service to Media Hosting, Filesender and the HEAnet FTP Service. Service will be resumed as soon as is possible on Tuesday 10th April 2018.
Tue Apr 10 10:14:56 2018 Disks within the storage system were replaced overnight, and reboots are taking place currently. We will have an update to follow.
Tue Apr 10 10:46:11 2018 work is continuing to restore services, with an estimated time of return at 1400. We will update this ticket again this afternoon.
Tue Apr 10 14:02:43 2018 we are still working to restore services after a disk failure in our storage array. Work is continuing and we will update with progress on the hour.
Tue Apr 10 15:16:10 2018 We are continuing to investigate issues with our disk array which are causing the outage of service to media, FTP and filesender. We apologise for any inconvenience that the lack of availability of services has created.
Tue Apr 10 17:34:30 2018 We have identified a hardware issuer with one of the servers on our storage array. Our hardware vendor has been called in, work will continue overnight and we will have a further update in the morning.
Wed Apr 11 09:57:12 2018 Troubleshooting is proceeding with the hardware support vendor. At this time, we do not anticipate full service restoration before close of business today. We will have a further update on progress before noon. Our apologies for any inconvenience this outage may cause.
Wed Apr 11 11:46:14 2018 We are continuing to troubleshoot this issue with Dell, we have no update on service restoration at this time, next update will be before 1400
Wed Apr 11 12:39:05 2018 We are currently working on hardware replacement on our storage cluster with the vendor. We are still unable to provide an estimate of when service will be restored. Our next update will be circa 1600
Wed Apr 11 16:13:55 2018 Hardware replacement has been completed, however the service remains offline. We are continuing to troubleshoot and still have no estimated time of recovery. We will have another update before close of business.
Wed Apr 11 17:13:17 2018 We have brought the Filesender service back online, however the prior content is currently unavailable. Users will have to re-upload any content that they wish to share at this time. We are continuing to work to restore the original Filesender content and to get Media and FTP online. We are making software changes overnight to achieve this and will have a further update tomorrow morning.
Thu Apr 12 09:55:04 2018 Reconfiguration is proceeding but the rebuild is running slower than anticipated. We do not currently expect this to complete before the weekend. In parallel, we are working on restoring the media service using a new storage backend. This will allow new content uploads. Access to existing content is still pending the storage rebuild above. Filesender is now available for new file transfers. We apologise for the inconvenience caused by this outage, and continue to work toward resolution as quickly as possible.
Thu Apr 12 12:50:37 2018 We expect to have media service operating with a new storage backend by the end of today. Note that while this will allow new content uploads, access to existing content is still pending the storage rebuild. Work continues on the storage rebuild but we do not have an estimate for restoration at this time.
Thu Apr 12 16:34:33 2018 Media storage is now available for new uploads. Work continues on restoration of existing content but unfortunately we are still without an expected time for resolution. FTP is still unavailable. Our next update will be in the morning.
Fri Apr 13 09:36:27 2018 Recovery process for existing content has been proceeding overnight and continues to be monitored. We do not have an ETA for completion but expect to have a further update early this afternoon.
Fri Apr 13 16:41:20 2018 Media and Filesender services have now been restored including access to previous content. Please notify the NOC if you encounter any further problems. We will continue proactive monitoring and maintenance over the weekend, which we do not envision will be service affecting. ftp.heanet.ie will remain offline over the weekend.
Fri Apr 13 17:04:52 2018 Further problems have been observed with media and filesender. We are investigating and will update when we have further information.
Fri Apr 13 18:03:14 2018 Services have been restored and we continue to proactively monitor.
Mon Apr 16 10:10:53 2018 Media and Filesender services have been stable over the weekend. We are monitoring these closely as we work to undertake further proactive maintenance to ensure the stability of these services. Work continues to restore FTP.
Mon Apr 16 11:12:22 2018 Our NFS server rebooted causing an outage to Media and Filesender services this morning from 10:09 – 10:28. We’re looking at root cause of this as part of our proactive maintenance.
Mon Apr 16 15:23:48 2018 Due to the proactive nature of the maintenance, the Media and Filesender services may have limited availability. Further maintenance to the upload and cached media services will be carried out today to restore stability to this service. We're migrating the Filesender service to an alternate platform to restore the service and minimise any further impact due to storage cluster issues. Filesender service will be available again before 23:00 tonight.
Mon Apr 16 17:46:34 2018 Maintenance and monitoring still ongoing. Next update will be tomorrow at 10:00
Tue Apr 17 10:10:37 2018 Filesender service has been migrated to an alternate platform and the service is available for use. Media service is still with limited availability use using alternate storage whilst HEAnet continue work on fully repairing this service. We will be able to fully restore the FTP service once the storage cluster issue is resolved.
Wed Apr 18 10:43:08 2018 There is no further change in status at this time. It is not expected that FTP will be available before next week.
Thu Apr 26 11:32:53 2018 FTP has been restored with a reduced number of mirrors, which we expect to retain for the foreseeable future.
Thu May 03 12:25:01 2018 Media has been restored to full availability, and we continue to monitor the service.