Fault Tolerance is the ability of a system (computer, network, cloud cluster, component, etc.) to continue functioning correctly without interruption during failures. Fault Tolerant systems (or components) prevent disruptions to a system that is considered Safety-Critical System (SCS), Life-Critical System or Mission Critical System. Usually, this requires an understanding of the single points of failure through the multiple critical execution paths in a running system.
The system characteristics of Fault Tolerance High Availability are related in that to achieve high availability, a system must address Fault Tolerance of components on the systems critical paths.
Fault Tolerant systems use redundant (i.e;, spare, backup) components to automatically become available in the event of a component failure to ensure there is no loss of service or data. The ability to use Failover mechanisms to quickly, smoothly and transparently transition to the redundant or backup systems requires a well designed system, with contingency plans and special management processes, hardware or software to ensure the transition. There are some Failover components which are acquired. For example:
Fault Tolerance needs to be considered in all disaster recovery plans or strategies. For example, Fault Tolerant systems can use the cloud for backups allowing critical systems to quickly be restored. Although these backups are not true immediate failovers, they can offer a longer timeline for fault tolerance recovery. Note: often these backup plans are not geographically local which is particularly important during natural or even human disasters. 1)