Page 124 - DCAP103_Principle of operating system
P. 124
Unit 4: Process Management-III
A bad system call should never be able to take down the kernel. An RTOS should, therefore, Notes
employ opaque handles for kernel objects. It should also validate the parameters to all system
calls.
4.4.3 Fault Tolerance and High Availability
Even the best software has latent bugs. As applications become more complex, performing more
functions for a software-hungry world, the number of bugs in fielded systems will continue to
rise. System designers must, therefore, plan for failures and employ fault recovery techniques
of course, the effect of fault recovery is application-dependent—a user interface can restart itself
in the face of a fault, a flight-control system probably cannot. One way to do fault recovery is
to have a supervisor thread in an address space all its own. When a thread faults (for example,
due to a stack overflow), the kernel should provide some mechanism whereby notification can
be sent to the supervisor thread. If necessary, the supervisor can then make a system call to
close down the faulted thread, or the entire process, and restart it. The supervisor might also
be hooked into a software “watchdog” setup, whereby thread deadlocks and starvation can be
detected as well.
In many critical systems, high availability is assured by employing multiple redundant nodes in
the system. In such a system, the kernel running on a redundant node must have the ability to
detect a failure in one of the operating nodes. One method is to provide a built-in heartbeat in the
interprocessor message passing mechanism of the RTOS. Upon system startup, a communications
channel is opened between the redundant nodes and each of the operating nodes. During normal
operation, the redundant nodes continually receive heartbeat messages from the operating nodes.
If the heartbeat fails to arrive, the redundant node can take control automatically.
Figure 4.3: Redundancy via System Heartbeats
Active
Active
Redundant
LOVELY PROFESSIONAL UNIVERSITY 117