Maintainability is the characteristic that represents the degree of effectiveness and efficiency with which a product or system can be modified to improve it, correct it or adapt it to changes in its environment and requirements. This characteristic is composed of the following sub-characteristics:1)
Maintainability is a characteristic of a system that is able to resume full operation after a failure in a component of the system. Components that are mission critical for the system can include equipment, machine, power, air conditioning, software, etc. Maintainability is expressed as the probability of recovery based on a specified time frame, usually done in terms of Five Nines (i.e., 99%, 99.9%, 99.99%, and 99.999%). For example, a 99.999% maintainability would be for 5 minutes, 15 seconds or less of downtime in a year. The downtime must include all the steps required to recover with full operational capabilities, so the time must include removal, diagnostics, assembly of resources required to perform the maintenance (i.e., parts, bays, tools, personnel, etc.) and the re-installation of the failed component.2) 3)
Two main aspects of Maintainability need to be addressed for all systems or components:
Maintainability is a projection of the downtime of a system. Given that it is, by its very nature, a projection, it should not viewed as a guarantee that a system will only be down for the projected amount of time. Maintainability must therefore rely on models to calculate the probability of failures for components based on actual failure rates for components in the past or test results of the components.
All these models are abstractions of reality; therefore, at best they are only approximations of reality. To the extent they provide useful insights, they are still very valuable. The more complicated the model, the more data necessary to develop precise estimations. The greater the extrapolation required for a prediction, the greater the imprecision. Also, obtaining all the data required as input to the models is difficult, time consuming, and may not even be very accurate.
Measuring the Mean Time To Repair (MTTR), is also used as part of Availability (see 4.3.2.2 Availability).
The MTTR, identifies the average time to restore a system or component after experiencing a failure or breakdown in the expected (i.e., specified) operating conditions. The formula for MTTR is:
A lower MTTR value corresponds to a higher level of maintainability and which means that maintainable systems take less time to repair.
All systems, regardless of their level of complexity, require maintenance. To reduce the impact of performing maintenance, using physical (hardware) modules (components) that require the fewest number of repairs in a given time frame and choosing hardware and designs that require the least amount of downtime is one way to reduce the impact of maintenance. Another way to reduce the impact of maintenance is to design a system that anticipates maintenance and provides redundancy to handle the downtime required for maintenance. Nevertheless, managing redundancy is not a trivial task and adds complexity to the overall system. The more components that require redundancy, the more complex the system becomes unless the Middleware can manage the transition seamlessly, easily and transparently.
There are a few major forms of redundancy DDS can help with:
Hardware redundancy | Means that there are multiple modules (components) that provide the same functionality available in the system at the same time. For many systems, a simple dual modular redundancy is sufficient to accomplish the job (i.e., two components). However, for life critical systems, some systems rely on a triple modular redundancy. These systems should have zero to minimum loss from downtime. |
---|---|
Information redundancy | Occurs when there are multiple information sources available, so that modules can use whichever source is active if there is an interruption in information. In a system that has only one network available to it, the information redundancy is of little use if the network needs maintenance. |