Dear friends,
Our new server "LION" went down on 09/16/2019 around 23:00 EST. Downtime was caused by two NVMe disks in RAID-1 array. Somehow data was melted. Last night we've tried virtually all tools: reboot, running FSCK (file system check) etc, nothing worked out. Hardware check did pointed us to the right direction: hardware replacement and complete account restore.
1. How did you restore all accounts? Did you move them again from old "Panther" server?
No, we restored accounts from local backup drive attached to the server. It was not affected because it was out of RAID array. We were lucky - server generated weekly backups just 1 day prior crash - September 15th, 2019.
2. Did you lose account data during restore?
All accounts have been restored successfully.
3. Someone sent email to me on 09/15 but I didn't receive it. Can I recover it?
If server goes down, email would be affected too. Most messages sent to the affected server will get bounced to sender. However, since we had working server, email went to "panther". Please login to panther server to see if there are any messages waiting. You can do via webmail: https://panther.westnic.net:2096
4. How to prevent email downtime?
You need secondary backup MX record with different hosting provider (for example, Google Apps). 1 MX record is a single point of failure. Most web hosting providers offer only 1 MX record.
5. I upgraded my Wordpress script on 09/15 but there are no changes.
Unfortunately, you need to run another upgrade. We restored websites using recent backup: 09/15 (generated between 2 - 6 AM EST).
5. How would you prevent such downtime in the future?
Firstly, we won't be using software raid on heavy servers (with 100+ cPanel websites). Damaged data can be written to secondary disk which may create unstable / unbootable server. We're currently using RAID-0 with local backups + offsite backups (all NVMe disks with excellent read/write speed). This should be sufficient for basic data protection.
Please be advised that you must maintain own backups on local computers or within cloud. Please do not store them on the same server! It's extremely important. While we didn't lose any servers in the past 17 years, there is a chance that backup file could be corrupted, lost or overwritten.
If you have large website (over 10 GB of data), please contact us for assistance. We'll generate backup between 4-6 AM EST (off-peak hours). Note: backups over 25 GB are not being processed automatically! If your account is larger than 25 GB, you need to maintain own offsite backups all the time.
Secondly, for the next 10 days, we'll keep an eye on new server. If there any critical issues, we'd know at once. All accounts will be moved back to "panther" until we resolve all issues with our Vendors.
6. I am getting 500 error on my website, it doesn't load!
Please check your configuration logs. If you see /west/username path, please change it to /home (replace west with home). Secondly, please login to cPanel > MultiPHP then select older version of PHP: 5.6. Most likely, your scripts or plugins became old/outdated, they still depend on old PHP 5.6. You should upgrade them because cPanel will drop 5.6 completely by the end of 2020.
If you have any questions about server transfer or downtime, please don't hesitate to contact us at any time via Help Desk.