User Tools

Site Tools


dido:public:ra:1.4_req:2_nonfunc:14_reliability:04_faulttolerance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dido:public:ra:1.4_req:2_nonfunc:14_reliability:04_faulttolerance [2020/11/18 19:27]
52.45.51.99 ↷ Links adapted because of a move operation
dido:public:ra:1.4_req:2_nonfunc:14_reliability:04_faulttolerance [2021/08/11 13:25] (current)
murphy
Line 1: Line 1:
-====== 4.2.2.3 Fault Tolerance ======+====== 4.3.2.3 Fault Tolerance ======
 [[dido:​public:​ra:​1.4_req:​2_nonfunc:​14_reliability | Return to Reliability ]] [[dido:​public:​ra:​1.4_req:​2_nonfunc:​14_reliability | Return to Reliability ]]
  
Line 14: Line 14:
   * **Power sources** are ruggedized as fault tolerant by incorporating alternative sources and backups like [[dido:​public:​ra:​xapend:​xapend.a_glossary:​u:​ups]] and backup generators. A good description of this is provided in the [[https://​www.omgwiki.org/​ddsf/​doku.php?​id=ddsf:​private:​cookbook:​03_user:​03_tms | Tactical Microgrid Standard (TMS) ]] [[dido:​public:​ra:​xapend:​xapend.a_glossary:​u:​use_case|use case]],   * **Power sources** are ruggedized as fault tolerant by incorporating alternative sources and backups like [[dido:​public:​ra:​xapend:​xapend.a_glossary:​u:​ups]] and backup generators. A good description of this is provided in the [[https://​www.omgwiki.org/​ddsf/​doku.php?​id=ddsf:​private:​cookbook:​03_user:​03_tms | Tactical Microgrid Standard (TMS) ]] [[dido:​public:​ra:​xapend:​xapend.a_glossary:​u:​use_case|use case]],
   * **Hardware systems** are made Fault Tolerant by deploying identical or equivalent systems that can either be used instead of the original system or use in conjunctions with the original system used as an alternative. For example, a [[dido:​public:​ra:​xapend:​xapend.a_glossary:​s:​server|server]] can be made fault tolerant by using an identical server running in parallel, with all operations mirrored to the backup server.   * **Hardware systems** are made Fault Tolerant by deploying identical or equivalent systems that can either be used instead of the original system or use in conjunctions with the original system used as an alternative. For example, a [[dido:​public:​ra:​xapend:​xapend.a_glossary:​s:​server|server]] can be made fault tolerant by using an identical server running in parallel, with all operations mirrored to the backup server.
-  * **Networks** designed as Fault Tolerant by supporting multiple networks paths between any two [[dido:​public:​ra:​xapend:​xapend.a_glossary:​e:​endpoint|endpoints]] within the [[dido:​public:​ra:​xapend:​xapend.a_glossary:​l:​lan]] or [[dido:​public:​ra:​xapend:​xapend.a_glossary:​w:​wan]] are possible but the actual endpoint also needs to be duplicated (i.e., two Network Interface Cards (NICs)). It is also possible to use two different networks such as a [[dido:​public:​ra:​xapend:​xapend.a_glossary:​w:​wired | wired]], [[dido:​public:​ra:​xapend:​xapend.a_glossary:​w:​wifi]],​ [[dido:​public:​ra:​xapend:​xapend.a_glossary:​b:​bluetooth]],​ or [[dido:​public:​ra:​xapend:​xapend.a_glossary:​z:​zigbee]]. +  * **Networks** designed as Fault Tolerant by supporting multiple networks paths between any two [[dido:​public:​ra:​xapend:​xapend.a_glossary:​e:​endpoint|endpoints]] within the [[dido:​public:​ra:​xapend:​xapend.a_glossary:​l:​lan]] or [[dido:​public:​ra:​xapend:​xapend.a_glossary:​w:​wan]] are possible but the actual endpoint also needs to be duplicated (i.e., two Network ​[[dido:​public:​ra:​xapend:​xapend.a_glossary:​i:​interface|Interface]] Cards (NICs)). It is also possible to use two different networks such as a [[dido:​public:​ra:​xapend:​xapend.a_glossary:​w:​wired | wired]], [[dido:​public:​ra:​xapend:​xapend.a_glossary:​w:​wifi]],​ [[dido:​public:​ra:​xapend:​xapend.a_glossary:​b:​bluetooth]],​ or [[dido:​public:​ra:​xapend:​xapend.a_glossary:​z:​zigbee]]. 
-  * **Software** systems or components become fault tolerant when multiple instances of the software are running in parallel using either operating system threads or even more modern [[dido:​public:​ra:​xapend:​xapend.a_glossary:​c:​container|containers]] such as [[dido:​public:​ra:​xapend:​xapend.a_glossary:​d:​docker]] or orchestration software such as [[https://​kubernetes.io/​ | Kubernetes]]. For example, a database can be continuously replicated to other machines. If the primary database goes down, operations can be automatically redirected to the second database. Another example, would be use of orchestration software such as Kubernetes to automatically use an alternate application container on the same or different machine.+  * **Software** systems or components become fault tolerant when multiple instances of the software are running in parallel using either ​[[dido:​public:​ra:​xapend:​xapend.a_glossary:​o:​os|operating system]] threads or even more modern [[dido:​public:​ra:​xapend:​xapend.a_glossary:​c:​container|containers]] such as [[dido:​public:​ra:​xapend:​xapend.a_glossary:​d:​docker]] or orchestration software such as [[https://​kubernetes.io/​ | Kubernetes]]. For example, a database can be continuously replicated to other machines. If the primary database goes down, operations can be automatically redirected to the second database. Another example, would be use of orchestration software such as Kubernetes to automatically use an alternate ​[[dido:​public:​ra:​xapend:​xapend.a_glossary:​a:​app_container|application container]] on the same or different machine.
     ​     ​
  
-Fault Tolerance needs to be considered in all disaster recovery plans or strategies. For example, Fault Tolerant systems can use the cloud for backups allowing critical systems to quickly be restored. Although these backups are not true immediate failovers they can a longer ​horizon ​fault tolerance. **Note:** often these backup plans are not geographically local which is particularly important during natural or even human disasters.+Fault Tolerance needs to be considered in all disaster recovery plans or strategies. For example, Fault Tolerant systems can use the cloud for backups allowing critical systems to quickly be restored. Although these backups are not true immediate failoversthey can offer a longer ​timeline for fault tolerance ​recovery. **Note:** often these backup plans are not geographically local which is particularly important during natural or even human disasters. ​(( 
 +Mariah Timms, 
 +__AT&T outage: Internet, 911 disrupted, planes grounded after Nashville explosion. Get the latest updates__,​ 
 +Nashville Tennessean,​ 
 +5 January 2012, 
 +Accessed: 8 January 2021, 
 +[[https://​www.tennessean.com/​story/​news/​local/​2020/​12/​25/​att-outage-internet-down-hours-after-nashville-explosion/​4045278001/​]] 
 +))
  
-===== DDS Specifics =====+===== DIDO Specifics =====
 [[dido:​public:​ra:​1.4_req:​2_nonfunc:​14_reliability:​04_faulttolerance | Return to the Top]] [[dido:​public:​ra:​1.4_req:​2_nonfunc:​14_reliability:​04_faulttolerance | Return to the Top]]
  
-[[dido:​public:​ra:​xapend:​xapend.a_glossary:​d:​dds]] is [[dido:​public:​ra:​xapend:​xapend.a_glossary:​m:​mom]] software and as such can not directly help with power source, hardware or networks. However, because it is a many-to-many,​ [[dido:​public:​ra:​xapend:​xapend.a_glossary:​p:​p2p]],​ [[dido:​public:​ra:​xapend:​xapend.a_glossary:​p:​publish-subscribe]] [[dido:​public:​ra:​xapend:​xapend.a_glossary:​m:​midware|middleware]],​ it can be used to help monitor these components and can help make informed decisions regarding the proper operations of these components. For example, there can be redundant heat sensors on a chemical mixing tank, both publishing the current temperature of the tank. If one [[dido:​public:​ra:​xapend:​xapend.a_glossary:​s:​sensor|sensor]] fails, the [[dido:​public:​ra:​xapend:​xapend.a_glossary:​m:​monitorsw|monitoring software]] component for the tank can automatically use the the backup sensor without human intervention by configuring the [[ddsf:​private:​cookbook:​06_append:​02_quality_of_service | Quality of Service (QoS)]] parameters on a [[dido:​public:​ra:​xapend:​xapend.a_glossary:​t:​topic]] correctly. 
- 
-  * **Note:** There are several excellent examples provided in section [[ddsf:​private:​cookbook:​03_user]]. 
  
  
dido/public/ra/1.4_req/2_nonfunc/14_reliability/04_faulttolerance.1605745656.txt.gz · Last modified: 2020/11/18 19:27 by 52.45.51.99