Dr. Andre Kleyner
Series Editor
The Wiley Series in Quality and Reliability Engineering aims to provide a solid educational foundation for both practitioners and researchers in Q&R field and to expand the reader's knowledge base to include the latest developments in this field. The series will provide contribution to the teaching and practice of engineering.
The series coverage will contain, but is not exclusive to,
Wiley Series in Quality and Reliability Engineering
Design for Safety
by Louis J. Gullo, Jack Dixon
February 2018
Next Generation HALT and HASS: Robust Design of Electronics and Systems
by Krik A. Gray, John J. Paschkewitz
May 2016
Reliability and Risk Models: Setting Reliability Requirements, 2nd Edition
by Michael Todinov
September 2015
Applied Reliability Engineering and Risk Analysis: Probabilistic Models and Statistical Inference
By Ilia B. Frenkel, Alex Karagrigoriou, Anatoly Lisnianski, Andre V. Kleyner
September 2013
Design for Reliability
by Dev G. Raheja (Editor), Louis J. Gullo (Editor)
July 2012
Effective FMEAs: Achieving Safe, Reliable, and Economical Products and Process using Failure Mode and Effects Analysis
by Carl Carlson
April 2012
Failure Analysis: A Practical Guide for Manufactures of Electronic Components and Systems
by Marius Bazu, Titu Bajenescu
April 2011
Reliability Technology: Principles and Practice of Failure Prevention in Electronic Systems
by Norman Pascoe
April 2011
Improving Product Reliability: Strategies and Implementation
by Mark A. Levin, Ted T. Kalal
March 2003
Test Engineering: A Concise Guide to Cost‐Effective Design, Development and Manufacture
by Patrick O'Connor
April 2001
Integrated Circuit Failure Analysis: A Guide to Preparation Techniques
by Friedrich Beck
January 1998
Measurement and Calibration Requirements for Quality Assurance to ISO 9000
by Alan S. Morris
October 1997
Electronic Component Reliability: Fundamentals, Modeling, Evaluation, and Assurance
by Finn Jensen
November 1995
This edition first published 2019
© 2019 John Wiley & Sons Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Liudong Xing, Gregory Levitin and Chaonan Wang to be identified as the authors of this work has been asserted in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data applied for
Hardback: 9781119507635
Cover design: Wiley
Cover image: © sakkmesterke/Shutterstock
“Dynamic System Reliability: Modeling and Analysis of Dynamic and Dependent Behaviors”
by Xing, Levitin and Wang
The importance of quality and reliability to a system can hardly be disputed. Product failures in the field inevitably lead to losses in the form of repair cost, warranty claims, customer dissatisfaction, product recalls, loss of sale, and in extreme cases, loss of life.
Engineering systems are becoming more and more complex with added functions and capabilities. Modeling of such complex systems, assessment of their performance, risk analysis and reliability prediction present an increasingly challenging task. Functional dependency, fault detection and coverage, common cause failures, redundancies, standby modes and other interactions among system components further complicate the modeling process requiring new methods and approaches to address the dynamic system reliability.
This book has been written by the leading experts in the field of dynamic reliability and multi‐state systems. It discusses many technical aspects of modeling the reliability of complex systems when the reliabilities of their components change with time due to various types of interactions and state changes.
This book will be a great addition to the Wiley Series in Quality and Reliability Engineering, which aims to provide a solid educational foundation for researchers and practitioners in the field of quality and reliability engineering and to expand the knowledge base by including the latest developments in these disciplines.
Despite its obvious importance, quality and reliability education is paradoxically lacking in today's engineering curriculum. Few engineering schools offer degree programs or even a sufficient variety of courses in quality or reliability methods. Therefore, the majority of quality and reliability practitioners receive their professional training from colleagues, professional seminars, publications and technical books. The lack of formal education opportunities in this field greatly emphasizes the importance of technical publications for professional development.
We hope that this book, as well as the whole series, will continue Wiley's tradition of excellence in technical publishing and provide a lasting and positive contribution to the teaching and practice of engineering.
Dynamic behavior and dependence are typical characteristics of modern engineering and computing systems and products. Specifically, system load, stress levels, redundancy levels, and other operating environment parameters can be changing with time, causing dynamics in failure behavior of system components and in reliability requirements of the entire system. In addition, system components may have significant dependencies or correlations in time or function during the mission process. Modeling effects of these dynamic and dependent behaviors is crucial for accurate system reliability modeling and analysis, and further design optimization and maintenance activities.
Traditional system reliability models can define only the static logical structure of a system, but not the dynamic and dependent behaviors of the system and its components. Thus, reliability analysis results obtained using the traditional reliability models often deviate from the actual system reliability performance significantly, misleading system design, operation, and maintenance efforts. Therefore, the traditional reliability theory must be extended and enhanced for addressing the dependent and dynamic behaviors. This book presents recent developments of such extensions involving dynamic system reliability modeling theory, reliability evaluation methods, and case studies based on real‐world examples.
The topic of the book “Dynamic System Reliability” has gained increasing attention in the reliability and safety community in the past few decades. Research articles on this subject are continuously being published in peer‐reviewed journals and conference proceedings. However, to the best of the authors' knowledge, the subject has never been adequately or systematically included in any reliability book. Therefore, there is a great need for such a book covering recent developments on the dynamic system reliability modeling and analysis techniques. With an increased and sustained interest in this subject, it is the right time to publish this book.
This book particularly focuses on hot issues of dynamic system reliability, systematically introducing the reliability modeling and analysis methods for systems with imperfect fault coverage, systems with functional dependence, systems subject to deterministic or probabilistic common‐cause failures, systems subject to deterministic or probabilistic competing failures, and dynamic standby sparing systems.
In the Introduction, the book describes the evolution from the traditional static reliability theory to the dynamic system reliability theory, and provides an overview description of dynamic and dependent behaviors addressed in the subsequent chapters of the book.
In Chapter 2, the book reviews basic probability and reliability concepts, various reliability measures, different types of fault trees, fundamentals of binary decision diagrams (a combinatorial model for system reliability analysis), and Markov processes. Some reliability analysis software tools are also introduced.
Chapter 3 introduces an inherent behavior of fault‐tolerant systems called imperfect fault coverage. Just like any system component, the recovery mechanism of a system is hard to be perfect; it can fail such that the system cannot adequately detect, locate, isolate, or recover from a fault occurring in the system. The uncovered component fault may propagate through the system, causing extensive damage to the system. Reliability models and evaluation methods for addressing the imperfect fault coverage in binary‐state systems, multi‐state systems, and phased‐mission systems are discussed in this chapter.
Chapter 4 discusses an extension of the traditional imperfect fault coverage concept to the modular imperfect fault coverage for systems with hierarchical structures. Due to the layered recovery of hierarchical systems, the extent of the damage from an uncovered component fault may exhibit multiple levels. This chapter introduces the modeling of such a modular imperfect fault coverage behavior as well as methods for considering the behavior in the reliability analysis of nonrepairable and repairable hierarchical systems.
Chapter 5 focuses on the functional dependence (Functional DEPendence, FDEP) behavior of complex systems, where the failure of one component (or in general the occurrence of a certain trigger event) causes other components (referred to as dependent components) within the same system to become unusable or inaccessible. The OR‐gate replacement method is discussed for systems with perfect fault coverage. The combinatorial algorithm is discussed for systems with imperfect fault coverage. Case studies involving combined trigger events, cascading effects, dual‐role events, and shared dependent events are also presented in this chapter.
Chapter 6 focuses on the reliability modeling of traditional deterministic common‐cause failures, where the occurrence of a root cause results in deterministic failures of multiple system components simultaneously or in a short time interval. Methods based on Decomposition and Aggregation, Decision Diagrams, and Universal Generating Functions are discussed.
Chapter 7 discusses the extension of the traditional common‐cause failures to the probabilistic common‐cause failures, where the occurrence of a root cause results in failures of multiple system components with different probabilities. Both explicit and implicit methods are discussed for single‐phase and multi‐phase systems.
Chapter 8 presents the deterministic competing failure behavior in systems with the FDEP. This behavior is concerned with competitions in the time domain between the failure isolation and failure propagation effects, causing distinct system statuses. Reliability modeling of the deterministic competing effects is discussed for different types of systems, including single‐phase systems with a single FDEP group, single‐phase systems with multiple FDEP groups, single‐phase systems with both global and selective effects, multi‐phase systems with a single FDEP group, and multi‐phase systems with multiple FDEP groups.
Chapter 9 focuses on probabilistic competing failures, which extend the deterministic competing failure behavior by considering probabilistic or uncertain failure isolation effects (commonly found in systems involving relayed wireless communications). Systems with a single type of local component failures, multiple different types of local component failures, and random propagation times are modeled and illustrated with real‐world examples from wireless sensor networks, body sensor systems, and smart homes.
Chapter 10 presents diverse methods for the reliability analysis of standby sparing systems, including the traditional Markov‐based method, the decision diagrams−based method, the approximation method based on the central limit theorem, and the recently developed event transition method.
The book has the following distinct features:
The target audience of the book is undergraduate and graduate students, engineers and researchers in reliability and related disciplines. The readers should have a background in basic probability theory and stochastic processes. However, the book includes a chapter reviewing the fundamentals that the readers need to know for understanding the contents of the other chapters, covering advanced topics in reliability theory and case studies. The book can provide the readers with knowledge and insights on complex system reliability behaviors, as well as skills of modeling and analyzing these behaviors for guiding reliability design of real‐world systems.
We would like to extend our sincere gratitude and appreciation to researchers who have developed some underlying concepts and models of this book, or have co‐authored with us on some subjects of the book, to name a few, Professor Joanne Bechta Dugan and Professor Barry W. Johnson from the University of Virginia, Professor Kishor S. Trivedi from Duke University, Dr. Suprasad V. Amari from BAE Systems, USA, Dr. Akhilesh Shrestha from Autoliv Inc., USA, Dr. Ola Tannous from Illinois Institute of Technology, USA, Dr. Prashanthi Boddu from Global Prior Art Inc., USA, Dr. Yujie Wang from the University of Electronic Science and Technology of China, Ms. Guilin Zhao from the University of Massachusetts Dartmouth, USA, Professor Yuchang Mo from Huaqiao University, China, and Professor Rui Peng from the University of Science and Technology Beijing, China. There are many other researchers to mention. We have tried to recognize their contributions in the bibliographical references of this book.
Finally, it is our great pleasure to work with the editorial staff from Wiley, who have assisted in the publication of this book, their efforts and support are greatly appreciated.
June 8, 2018
Liudong Xing
Gregory Levitin
Chaonan Wang
ACP | Application Communication Phase |
BDD | Binary Decision Diagram |
BEM | BDD Expansion Method |
BSN | Body Sensor Network |
CC | Common Cause |
CCE | Common‐Cause Event |
CCF | Common‐Cause Failure |
CCG | Common‐Cause Group |
cdf | cumulative distribution function |
CLT | Central Limit Theorem |
CM | Computing Module |
CPR | Combinatorial Phase Requirement |
CPUC | CPU Chip |
CSP | Cold SPare |
CTE | Combined Trigger Event |
CTMC | Continuous Time Markov Chain |
DC | Dependent Component |
DD | Decision Diagram |
DFT | Dynamic Fault Tree |
EDA | Efficient Decomposition and Aggregation |
ELC | Element Level Coverage |
EMB | External Memory Block |
FCE | Failure Competition Event |
FDEP | Functional DEPendence |
FDG | Functional Dependence Group |
FLC | Fault Level Coverage |
FT | Fault Tree |
FTS | Fault Tolerant System |
HS | Hierarchical System |
HSP | Hot SPare |
IC | Interface Chip |
ICP | Infrastructure Communication Phase |
IFG | Isolation Factor Group |
i.i.d. | independent and identically distributed |
IoT | Internet of Things |
IPC | ImPerfect Coverage |
IPCM | IPC Model |
ite | if‐then‐else |
ITE | Independent Trigger Event |
LF | Local Failure |
MC | Memory Chip |
MFT | Multi‐state Fault Tree |
MIPCM | Modular IPCM |
MIU | Memory Interface Unit |
MM | Memory Module |
MMDD | Multi‐state Multi‐valued Decision Diagram |
MRL | Mean Residual Life |
MSS | Multi‐State System |
MTBF | Mean Time Between Failures |
MTTF | Mean Time To Failure |
MTTR | Mean Time To Repair |
NDC | NonDependent Component |
OBDD | Ordered BDD |
PAND | Priority AND |
PCCE | Probabilistic Common‐Cause Event |
PCCF | Probabilistic Common‐Cause Failure |
PCCG | Probabilistic Common‐Cause Group |
PDC | Performance Dependent Coverage |
PDEP | Probabilistic‐DEPendent |
probability density function | |
PDO | Phase Dependent Operation |
PF | Propagated Failure |
PFD | Probabilistic Functional Dependence |
PFDC | Probabilistic Functional Dependence Case |
PFGE | Propagated Failure with Global Effect |
PFSE | Propagated Failure with Selective Effect |
pmf | probability mass function |
PMS | Phased‐Mission System |
PTC | PorT Chip |
RAP | Redundancy Allocation Problem |
ROBDD | Reduced OBDD |
r.v. | random variable |
SBDD | Sequential BDD |
SEA | Simple and Efficient Algorithm |
SEQ | SEQquence enforcing |
SESP | Standby Element Sequencing Problem |
SFT | Static Fault Tree |
ttf | time to failure |
UF | Uncovered Failure |
u‐function | universal generating function |
WSN | Wireless Sensor Network |
WSP | Warm SPare |