Support Portal

How The Server Offers Resilience To Failure

Introduction

The servers are designed to achieve very high levels of reliability and resilience. This article sets out the types of failure and disaster that can occur and explains how the service and data will be restored following that type of failure or disaster. 

Service and Data Loss Scenarios and the Restoration Process

The most common service and data loss scenarios and the method by which service and data is restored for the Customer after the event are as follows:

What happens if a disk fails?

If a disk fails , the service will continue to operate as normal without interruption and users will still have access to all the data on the server.  The data stored on an individual disk is fully mirrored in real-time onto a separate disk so that there is always an up-to-date copy of all data available. 

Our Support Team immediately knows about any disk failure through the Cloud Management Platform and we then initiate the replacement of the failed disk.  This speed of replacement virtually eliminates any risk of a second disk failing before the first failed disk has been replaced and resilience re-established.

What happens if any other hardware component fails?

A single physical server is not resilient to hardware component failures (such as motherboard, RAM etc) other than failure of a single disk as has been discussed already. If a hardware component fails in a single server solution, the server ceases to function entirely until the failed component has been replaced. The Support Team detects such occurrences immediately and initiate the replacement of the failed component.  Once the failed component has been replaced, the server can be restarted immediately.

However, there is a High Availability (HA) solution which is a cluster of 2 physical servers, typically called nodes, which is resilient against failure of any hardware component, or even an entire physical server.

This configuration runs half the VMs on one node, and the other half on the other node, with all data transparently replicated between them in real time. If one node fails, the VMs running on that node will be started up automatically on the other node, and normal service will be resumed for those VMs within 5 minutes.

What happens if data becomes corrupt?

The corruption of data on a server is a rare event but may occur in extreme circumstances as a result of a defect in software running on the server.  In such circumstances, the data can be restored to its state at the time of its most recent local backup.  The mirrored copy of the corrupt data is of no benefit as the mirrored copy will itself contain the corruption.  As a result, the only option is to revert to the most recent backup of the data which means that changes made to the data in question since that most recent backup will be lost. 

The backup copy of the data from which the restoration ideally occurs is the most recent local backup of the data in question.  In some circumstances however, the corruption may have found its way into that local backup of the data, in which case it is necessary to restore a backup of the data that is held in the Cloud (assuming of course that the Cloud Backup and Recovery option has been purchased).  If the local backup of the data has become corrupt and the Cloud Backup and Recovery option has not been purchased, it may not be possible to restore the data to any previous state and the customer would need to create the data again from scratch or restore it from another external copy of the data that they may have created and retained.

We endeavour to restore data to its state prior to the corruption by the end of the next working day provided that the restoration can be performed using the local backup of the data.  Should it be necessary to revert to a Cloud backup of the data to find a non-corrupt copy, this timescale may increase depending on the extent of the investigation that is required to find a non-corrupt copy of the data.

What happens if a disaster occurs on the customer’s premises? 

A disaster is defined as an event (such as theft, flood or fire) which renders a server unusable and/or inaccessible for an indeterminate period.  Disaster recovery (DR) is the process of reinstating the server to its operational state prior to the occurrence of the disaster.  The activities that happen when a disaster occurs depend on whether the Cloud Backup and Disaster Recovery option has been purchased (which nearly all customers do purchase).

Servers without the Cloud Backup and Disaster Recovery option

If this option has not been purchased, only standard software images will be backed up to the Cloud; that is, none of the customer’s software images associated with any Custom VMs will be backed up to the Cloud nor will any of the Customer’s data.   Following a disaster and the replacement of the physical hardware on the customer's premises, we will load our standard software images (but none of the Customer software images and none of their data).  Responsibility for bearing the purchase cost of the replacement server hardware is described in our End User Licence Agreement.

Servers with the Cloud Backup and Disaster Recovery option

If this option has been purchased, the customer’s data and all the software images on the server (the standard software images as well as the images associated with any Custom VMs) will be backed up regularly from the server to the Cloud.  Following a disaster and the replacement of the physical hardware on the customer's premises, the software images associated with any Custom VMs and all of the customer’s data in the state it was in at the time of the commencement of the most recent backup of the data prior to the occurrence of the disaster will be restored to the server.  Responsibility for bearing the purchase cost of the replacement server hardware is described in our End User Licence Agreement.

With the Cloud Backup and Disaster Recovery option, we will also, immediately after the disaster and for an interim period whilst the server is replaced on the customer’s chosen premises, provide access in the Cloud for the customer’s users to their files that were previously stored on the server, to any Managed Applications and their data that were included in the subscription and to any Custom VMs. We will restore the Custom VM software images in the Cloud and the Service Provider will be responsible for recreating the data store for any Custom VMs from the backups that were taken of this data and retained in the Cloud.  This access to what is effectively a ‘Cloud instance’ of the server will remain in place until the server has been restored on the Customer’s chosen premises.

Was this article helpful? 1 out of 1 found this helpful
Have more questions? Submit a request
Powered by Zendesk