Intermittent fault


An intermittent fault, often called simply an "intermittent", is a malfunction of a device or system that occurs at intervals, usually irregular, in a device or system that functions normally at other times. Intermittent faults are common to all branches of technology, including computer software. An intermittent fault is caused by several contributing factors, some of which may be effectively random, which occur simultaneously. The more complex the system or mechanism involved, the greater the likelihood of an intermittent fault.
Intermittent faults are not easily repeatable because of their complicated behavioral patterns. These are also sometimes referred to as “soft” failures, since they do not manifest themselves all the time and disappear in an unpredictable manner. In contrast, “hard” failures are permanent failures that occur over a period of time. They have a specific failure site, mode, and mechanism, and there is no unpredictable recovery for the failed system. Since intermittent faults are not easily repeatable, it is more difficult to conduct a failure analysis for them, understand their root causes, or isolate their failure site than it is for permanent failures.
Intermittent failures can be a cause of no-fault-found occurrences in electronic products and systems. NFF implies that a failure occurred or was reported to have occurred during a product’s use. The product was analyzed or tested to confirm the failure, but “a failure or fault” could be not found. A common example of the NFF phenomenon occurs when your computer “hangs up”. Clearly, a “failure” has occurred. However, if the computer is rebooted, it often works again. The impact of NFF and intermittent failures can be profound. Due to their characteristics, manufacturers may assume a cause rather than spend the time and cost to determine a root cause. For example, a hard drive supplier claimed NFFs were not failures and allowed all NFF products to be returned to the field. Later it was determined that these products had a significantly higher return rate, suggesting that the NFF condition was actually a result of intermittent failures in the product. The result was increased maintenance costs, decreased equipment availability, increased customer inconvenience, reduced customer confidence, damaged company reputation, and in some cases potential safety hazards.
A simple example of an effectively random cause in a physical system is a borderline electrical connection in the wiring or a component of a circuit, where two conductors may touch subject to a minor change in temperature, vibration, orientation, voltage, etc. In computer software a program may fail to initialise a variable which is required to be initially zero; if the program is run in circumstances such that memory is almost always clear before it starts, it will malfunction on the rare occasions that the memory where the variable is stored happens to be non-zero beforehand.
Intermittent faults are notoriously difficult to identify and repair because each individual factor does not create the problem alone, so the factors can only be identified while the malfunction is actually occurring. The person capable of identifying and solving the problem is seldom the usual operator. Because the timing of the malfunction is unpredictable, and both device or system downtime and engineers' time incur cost, the fault is often simply tolerated if not too frequent unless it causes unacceptable problems or dangers. For example, some intermittent faults in critical equipment such as medical life support equipment could result in killing a patient or in aeronautics causes a flight to be aborted or in some cases crash.
If an intermittent fault occurs for long enough during troubleshooting, it can be identified and resolved in the usual way.

Troubleshooting techniques

Some techniques to resolve intermittent faults are: