# American Journal of Engineering Research (AJER)2016American Journal of Engineering Research (AJER)e-ISSN: 2320-0847p-ISSN: 2320-0936Volume-5, Issue-3, pp-165-170www.ajer.orgResearch PaperOpen Access

# Over Coming of Errors in Tmr System Utilizing Scanchain Methods

# PARAMESHAPPA.G<sup>1</sup>, MADHUKAR.G.N.MALIGERA<sup>2</sup>, Dr. D. JAYADEVAPPA<sup>3</sup>

<sup>1,2</sup>Department of Electronics & communication, JSS Academy of Technical Education, Bengaluru, India. <sup>3</sup>Department of Electronics & instrumentation, JSS Academy of Technical Education, Bengaluru, India.

**ABSTRACT:** in this paper, we depict a Scan-chain-based triple modular redundancy and Scan chain based multiple error recovery technique for triple modular redundancy systems. Scan-chain-based triple modular redundancy identifies and corrects only one fault whereas the SMERTMR technique reuses scan-chain flip-flops invented for testability functions to observe and correct faulty modules within the presence of single or multiple transient faults. In this planned technique, the faults are identified at the output of the modules, but the faults which are latent are identified by monitoring the interior states of the TMR modules. Once the fault is identified, the output of fault free module will be copied into the module which is faulty. If in case any modules are identified as permanent fault then the whole system will enter into the Master/Checker configuration. The last method which is enhanced SMERTMR is implemented with the aim to improve the overall performance of system. Finally all three methods are compared with their performance time.

**Keywords:** scan chain, triple modular redundancy (TMR), scan-chain-based triple modular redundancy (ScTMR), Scan chain based multiple error recovery technique for triple modular redundancy systems (SMERTMR).

# I. INTRODUCTION

In present days the system which has to meet the specified function within the deadline are used in various situations such as nuclear power plants and spacecraft [2], [3]. Along such circumstance these systems are at a very high risk of getting affected by errors. So identification of these errors and rectifying them will be time consuming. Along with this it has to correct the faults within the specified deadline. Hence error identification and correction will be a major challenge to meet with in the system deadline. Therefore in order to achieve the reliability, the systems should be made available with the suitable error tolerant systems. Thus meeting reliability and achieving timing constraints are a bit of contradicting things.

For instance one of the method called roll back recovery, in this the retry mechanism in which the fault or error will be rectified but with the major disadvantage of crossing the deadline period [11].Redundancy (extra information) is one of the attribute to be made use during the transfer of message. With the help of the extra information the receivers will be in safer side to determine the correct data without any corruption.

In system such as life monitoring systems where the purpose of safety is critical, roll forward methodology like Triple modular redundancy (TMR) is one of the popular and most often used [4], [5] in such applications. The TMR will be holding three modules which are redundant (extra) along with a voter at the outputs of three modules.The main drawback of TMR is its failures, which refers to multiple errors caused in the system or a fault occurring in the voter. For instance error occurring in more than one module, if it is not corrected it will cause to failure of TMR. For applications which use retry methodology, they will be at a risk of consuming time more than the specified deadline time. Hence roll forward methodology is made use widely.

So roll forward methodology can be made used in a technique called ScTMR (Scan chain based TMR). This technique also suffers from major drawbacks. First, when a fault which is latent appears in the system the ScTMR cannot recover from this. The fault is recognized as latent only when the fault does not propagate to the output but will make the states of the modules to be mismatched.Second, ScTMRwill not be in a state of recovery if more than one fault appears in the modules.Hence Scan chain based multiple error recovery in TMR (SMERTMR) and enhanced part of SMERTMR has the feature of overcoming the errors or faults affecting more than one module in TMR.

www.ajer.org

2016

The rest of this paper is organized as follows. SectionII - SCTMR technique, Section III - Architecture of SMERTMR, Section IV- Enhanced SMERTMR, Section V - Concludes the paper VI – Simulated Results.

# II. SC TMR TECHNIQUE

The module which is faulty one is identified and corrected in ScTMR with the help of scan chain. In order to test the circuits scan chain is one of the effective method in the way of cost. In this methodology of scan chain a long sequence of flip flops will be made use of in the form of shift register and a multiplexer will be placed in front of these flip flops which will intern switch between mode of normal operation and mode of testing operation.

The major disadvantage of this system is that it cannot rectify more than one fault in the system. For instance if one of module is identified as faulty at the same time if a latent fault appears in other module then it cannot overcome these two faults. Ultimately the system will go in to the mode of unrecoverable state.



Figure 1 Sc TMR Block Diagram



Figure 2 Sc TMR State Diagram

### A. Sc TMR Voter

The major part in the ScTMR system which identifies the modules, whetherthey are faulty and also identifies the faults in comparator is voter. The voter is depicted in figure 3 which holds 3 comparators. The working of voter can be explained with an example. For instance if there is a fault in module 1 then the output of the comparators C12 and C13 will be high and C23 will be low. The error signals E12 and E13 will act as input to a AND gate which will produce the select signal to mux, this entire thing will act as output selector. With the help of Table 1 the modules which are faulty are identified and the correct output will be displayed.



Figure 3 Sc TMR Voter



| $E_{12}$ | $E_{13}$ | E23 | Faulty module | Output    |
|----------|----------|-----|---------------|-----------|
| 0        | 0        | 0   | -             | Output I  |
| 0        | 0        | 1   | Comparator_23 | Output I  |
| 0        | 1        | 0   | Comparator_13 | Output I  |
| 0        | 1        | 1   | Module 3      | Output I  |
| 1        | 0        | 0   | Comparator_12 | Output I  |
| 1        | 0        | 1   | Module 2      | Output I  |
| 1        | 1        | 0   | Module 1      | Output II |
| 1        | 1        | 1   | Unrecoverable | Х         |

# Table 1 Identifying Faulty Module and Selecting Correct Voter Output Using Error Signals

# **B. TRANSIENT AND PERMANENT ERROR RECOVERY MECHANISMS**

If there is a mismatch which will be detected by the voter. It will activate the error signal to alert the controller. Once the error signal is activated the system will be transferred from mode of normal operation to mode of recovery in which the module which was faulty will be recovered with the help of fault free module.

So in the mode of recovery the controller will configure the multiplexer and allows the scan chain to function. Also the SCI signal of module which is faulty will be connected to the SCO of the module which is free from fault. Along with this the SCI of fault free module will be connected to its own Scan chain output (SCO) signal. The entire thing which is described above is depicted in figure 4.

Once the system enters to the mode of recovery the counter will be initiated, if the counter reaches to zero then the system will exit from the recovery mode.

This system will utilize two internal registers called MRFM (Most-Recent Faulty Module) and NCF (Number of Consecutive Faults).



Figure 4 Sc TMR in recovery mode.

MRFM stores the module number which was faulty at the last output. If there is the same module which will occur as faulty consecutively then the NCF register will be incremented by one. Along with this if NCF crosses the certain predefined number then that module will be indicated as permanent fault.Or else the fault will be considered as transient.

When the module is designated as permanent fault then the entire system is degraded to Master/Checker configuration. Upon detection of permanent errors the Pr error signal will be activated.

# III. ARCHITECTURE OF SMERTMR

Once again the figure 1 can be taken as reference for SMERTMR system. The voter will generate the error signal and produces the same to controller where proper mechanism will be held in order to recover the module which was faulty. The whole scan chain along with SCI, SCO, and SCE will be totally controlled by controller.



Figure 5 SMERTMR State Diagram

The state diagram is depicted in figure 5. Initially the system will be functioning in the mode of normal operation. If any error is detected by the voter the system will go to the mode of comparison. In this mode the internal states of all the modules in the system will be compared. During this process if there is no mismatch then it will go back to the mode of normal operation or else it will go to the mode of recovery.

Hence when the mode of recovery finishes with success, the system will enter to the mode of normal operation or else if it detects a permanent fault then the system will go to the Master/Checker mode. In this mode if it finds any fault in the master or checker then it will go to the state of unrecoverable condition.

# A. Comparison Process

The following section will describe how the system will identify the fault in the system.

1) When there is no fault in any of the modules: if there are no mismatches between any of the modules, all the three counters will be set to zero

2) **If there is only one faulty module**: Consider that there are 'A' flip flops faulty in the module M and the rest of the modules N and O are fault free. Along such a situation one can find, that counters MN and MO will be incremented to 'A' whereas the counter NO will be of zero. Therefore by getting the values of the counters MN and MO the system will enter in to the mode of recovery and the faults will be corrected. Thus in this case the faulty module is M.

3) **In case of two modules which are faulty:**Assume that there are modules which are faulty such as M and N and the module O is fault free. Also consider that there are A, B set of flip flops faulty in the modules of M and N respectively. Thus there could be either no flip flops which are in similar or there would be more than one flip flops in similarity of incorrectness.

If there is no similarity of incorrectness between the modules then the value of counters will be having these values. The MO counter will be having the value of A and NO counter will be having the value of B and MN counter will be having the value in-between zero and A+B. Taking in to account of the number of mismatches value in counter the system will detect and correct the faults. Thus in this case the faulty modules are M and N. If there is a common flip flops which are incorrect then the system cannot identify the faults and hence cannot be corrected.

4) When none of the modules are free from fault: In such a condition the system can't find out the modules which are faulty. So the system goes to the state of unrecoverable state.

- 1. If counter\_mn=counter\_mo=counter\_no=0 then
- 2. Coming state = normal
- 3. Else if (counter\_mn=Counter\_mo) & (counter\_no=0) then
- 4. Coming state = recovery
- 5. Faulty module register = M
- 6. Else if (counter\_no=A) & (counter\_mo=B) & (counter\_mn=A+B) then
- 7. Coming state = recovery
- 8. Faulty module register = M,N
- 9. Else
- 10. Coming state =unrecoverable condition
- 11. End if

www.ajer.org

2016

# **Figure 6 Algorithm**

### **B.** Transient and Permanent Error Recovery Mechanisms

While the process of comparison is over, where the FLU(Fault Locator Unit) will detect the modules which are faulty and free from fault, next the system goes to the mode of recovery during which it will correct up to two modules which are faulty utilizing the scan chain.

The controller of SMERTMR in mode of recovery is depicted in figure 7. Along with this the scan chains are enabled with in the modules and the controller will configure the mux which is presented in the following section.

The SCO and SCI signals are interconnected in modules which are free from fault, whereas in the modules which are faulty the SCI of this module will be interconnected to the SCO of module which is free from fault.

Along the mode of recovery, once a mismatch is identified the equivalent count in the counter will be subtracted by one count, which is the reverse condition in comparison mode where the count will be incremented by one count when it comes across the mismatch. At the end of the mode of recovery if there is zero in all counters then it means that the recovery process is successfully done, if not the system will go to the unrecoverable condition.



Figure 7 SMERTMR in recovery mode

# IV. ENHANCED SMERTMR



**Figure 8 – Enhanced Smertmr** 

It can be seen from the figure 8 that the MUX, AND gate has been removed from SMERTMR circuit and it is been replaced by a encoder. With the help of this circuit the time for the input to reach the output will be reduced drastically. The comparison table of output time for different technique is been depicted in the table of conclusion part.

# V. CONCLUSION

A roll forward error recovery methodology named SMERTMR is presented to overcome from multiple errors. During which the system can overcome from more than one faulty modules, which can be transient or latent. The experiments show that SMERTMR can identify and correct up to 100% and 99.7% of multiple faults causing one and two modules respectively. Also it can be shown that the performance overhead of SMERTMR as compared to the ScTMR is less.Thus we can conclude that by replacing the multiplexers, and gates by encoder, we can achieve the faster output.



| Туре             | Output Time for the Frequency of 117.889 MHz |
|------------------|----------------------------------------------|
| ScTMR            | 20.611 ns                                    |
| SMERTMR          | 12.140 ns                                    |
| Enhanced SMERTMR | 3.308ns                                      |

### Table 2 - Output Performance - comparison table of different methodology

# VI. VI. SIMULATED RESULTS



Sc TMR Output

| Name                                   |           | Value | 100 ns  | 200 ns     | 300 ns  | 400 ns      | 500 ns      | 600 ns | 700 ns      | 1800 ns | 900 ns |         | 1,000 ns      | 1,100 ns |
|----------------------------------------|-----------|-------|---------|------------|---------|-------------|-------------|--------|-------------|---------|--------|---------|---------------|----------|
| <ul> <li>••••••</li> </ul>             | MR1[1:0]  |       | X       |            |         |             |             | 01     |             |         |        |         | X             | 10       |
|                                        | MR.2[1:0] |       |         | 00         |         |             | 1           | 9      |             | 00      | 10     |         | 11            |          |
| կել ս                                  | c         |       |         |            |         |             |             |        |             |         |        |         |               |          |
| Lie se                                 | col       |       |         |            |         |             |             |        |             |         |        |         |               |          |
| Lie se                                 |           |       |         |            |         |             |             |        |             |         |        |         |               |          |
| Lie se                                 | co3       |       |         |            |         |             |             |        |             |         |        |         |               |          |
| Ug ki                                  | 1         |       |         |            |         |             |             |        |             |         |        |         |               |          |
| Ц <sub>а</sub> ка<br>Ц <sub>а</sub> ка | 2         |       | ,       |            |         |             |             |        |             |         |        |         |               |          |
| Ц <sub>й</sub> ка                      | 3         |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 🕨 🌃 el                                 | 1[2:0]    |       | X 0     | 01 010 011 |         | 0 111 000   | 001 X 010 X | 011    | 100 101     | 110     |        | 0 / 10  | 1 100 1011    | 010 001  |
| 🕨 🍕 G                                  | 2[2:0]    |       | 000 000 | 01 010 011 | 100 101 |             |             | 110    |             |         |        | 01 (10  | 0 011 010     | 001 000  |
| 🕨 🌃 et                                 |           |       |         | 000        |         | X 001 X 010 | 011 X 100 X | 101    | 110 / 111 / | 000     | 001 0  | 10 ( 11 | 1 X 110 X 101 | 100 011  |
| lla d                                  | k –       |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 16 р                                   | 4         |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 1& ex                                  | 2         |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 16 D                                   | s .       |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 16 🗈                                   |           |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 1 🐻 P3                                 |           |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 16 E                                   |           |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 1 🔂 🙂                                  |           |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 16 в                                   |           |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 1 🛅 се                                 |           |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 16 🗠                                   | AL.       |       |         |            |         |             |             |        |             |         |        |         |               |          |
| 16 R                                   | scan      |       |         |            |         |             |             |        |             |         |        |         |               |          |
|                                        |           |       |         |            |         |             |             |        |             |         |        |         |               |          |

### **Enhanced SMERTMR Output**

### REFERENCES

- [1]. SeyedGhassemMiremadi, Senior Member, IEEE,HosseinAsadi, Member, IEEE, and Mahdi Fazeli, Student Member, IEEE "Low-Cost Scan-Chain-Based Technique to RecoverMultiple Errors in TMR Systems",
- [2]. Bartlett, J. F., Tandem Computers Inc., Cupertino, CA, "A Nonstop Operating System," Proc. of the 11<sup>th</sup>Hawaii Int'lConf on System Sciences, pp. 103-1 17, 1978.
- [3]. Losq, J. "A Highly Efficient Redundancy Scheme: Self- Purging Redundancy," IEEE Trans. Comp. C-25, pp. 569- 578,1976.
- [4]. Mathur, F. P., "Reliability Estimation Procedures and CARE: The Computer Aided Reliability Estimation Program," Jet Propulsion Laboratory Quarterly Tech. Review I, Oct 197 1.
- [5]. Mathur F. P. and P. DeSousa, "Reliability Modeling and Analysis of General Modular Redundant Systems," IEEETrans. Rel., R-24, No. 5, pp. 296-299, 1975.
- [6]. Webber, S. and J. Beime, "The Stratus Architecture," Digest of Papers. Fault-Tolerant Computing: Twenty-First International Symposium, pp. 79-85, 1991.
- [7]. Lala, J. H., L. S. Alger, R. J. Gauthier, M. J. Gauthier, and M. J. Dzwonczyk, "A Fault Tolerant Processor to MeetRigorous Failure," Proc. of IEEE/AIAA 7<sup>th</sup>Digital AvionicsSystems Conf. pp. 555-562, 1986.
- [8]. Adams, S. J., "Hardware Assisted Recovery from Transient Errors in Redundant Processing Systems," FTCS 19th Digestof Papers, pp. 512-519, 1989.
- S. D'Angelo, C. Metra, and G. Sechi, "Transient and permanent faultdiagnosis for FPGA-based TMR systems," in Proc. Int. Symp. DefectFault Tolerance VLSI Syst., 1999, pp. 330–338.
- [10]. J. Yoon and H. Kim, "Time-redundant recovery policy of TMR failuresusing rollback and roll-forward methods," IEE Proc. -Comput. Digit.Tech., vol. 147, no. 2, pp. 124–132, Mar. 2000.
- [11]. M. Ebrahimi, S. G. Miremadi, and H. Asadi, "ScTMR: A scan chainbasederror recovery technique for TMR systems in safetycriticalapplications," in Proc. Design Autom. Test Eur. Conf. Exhibit., 2011.
- [12]. The Leon2 Processor User Manual. (2007) [Online].
- [13]. G. Latif-Shabgahi and S. Bennett, "Adaptive majority voter: A novelvoting algorithm for real-time fault-tolerant control systems," in Proc.25th EUROMICRO Conf., vol. 2. 1999, pp. 113–120.

