Second Edition
This edition first published 2019
© 2019 John Wiley & Sons Ltd
Edition History
John Wiley & Sons, Ltd (1e, 2003)
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Mark A. Levin, Ted T. Kalal and Jonathan Rodin to be identified as the authors of this work has been asserted in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Levin, Mark A., 1959- author. | Kalal, Ted T., author. | Rodin, Jonathan,
1957- author.
Title: Improving product reliability and software quality : strategies,
tools, process and implementation / Mark A. Levin, Teradyne, Inc.,
California, USA, Ted T. Kalal (Retired), Texas, USA, Jonathan Rodin,
Teradyne, Inc., California, USA.
Other titles: Improving product reliability
Description: 2nd edition. | Hoboken, NJ : John Wiley & Sons, Inc., [2019] |
Revised edition of: Improving product reliability : strategies and
implementation / Mark A. Levin and Ted T. Kalal. c2003. | Includes
bibliographical references and index. |
Identifiers: LCCN 2018061430 (print) | LCCN 2019000421 (ebook) | ISBN
9781119179412 (Adobe PDF) | ISBN 9781119179436 (ePub) | ISBN 9781119179399
(hardcover)
Subjects: LCSH: Reliability (Engineering) | Manufacturing processes--Data
processing. | Computer software--Evaluation.
Classification: LCC TS173 (ebook) | LCC TS173 .L47 2019 (print) | DDC
620/.00452--dc23
LC record available at https://lccn.loc.gov/2018061430
Cover Design: Wiley
Cover Images: (top to bottom): © teekid/Getty Images, © ez_thug/Getty Images, © AK2/Getty Images, Courtesy of Universal Robots/Teradyne Inc.
Cary and Darren Kalal
To my beautiful wife, Dana Mischel Levin, for her endless love, support, and patience, and to our sons, Spencer Nathan Levin and Andrew Dylan Levin.
To Brigid, Sam, and Molly Rodin for their support and encouragement.
Mark A. Levin is the reliability manager at Teradyne, Inc. and is based in Agoura Hills, California. He received his bachelor of science degree in Electrical Engineering (1982) from the University of Arizona, a master of science degree in Technology Management (1999) from Pepperdine University, a master of science in Reliability Engineering (2009) from the University of Maryland, and all but dissertation for a PhD in Reliability Engineering from the University of Maryland. He has more than 36 years of electronics experience spanning the aerospace, defense, consumer, and medical electronics industries. He has held several management and research positions at Hughes Aircraft Missiles Systems Group, Hughes Aircraft Microwave Products Division, General Medical Company, and Medical Data Electronics. His experience is diverse, having worked in manufacturing, design, and research and development. He has developed manufacturing and reliability design guidelines, reliability training classes, workmanship standards, quality programs, JIT manufacturing, and ESD safe work environments, and has established a surface mount production facility.
(Mark.levin@Teradyne.com)
Ted T. Kalal is a reliability engineer (now retired) who has gained much of his understanding of reliability from hands‐on experience and from many great mentors. He is a graduate of the University of Wisconsin (1981) in Business Administration after completing much preliminary study in mathematics, physics, and electronics. He has held many positions as a contract engineer and as a consultant, where he was able to focus on design, quality, and reliability tasks. He has authored several papers on electronic circuitry and holds a patent in the field of power electronics. With two partners, he started a small manufacturing company that makes high‐tech power supplies and other scientific apparatus for the bioresearch community.
Jonathan Rodin is a software engineering manager at Teradyne, Inc. A graduate of Columbia University (1981), Jon has 39 years of experience developing software, both working as a programmer and managing software development projects. His experience spans companies of many sizes, ranging from early stage startups to companies of greater than 100 000 employees. Prior to joining Teradyne, Jon held executive engineering management positions at FTP Software, NaviSite, and Percussion Software. He has led software process reengineering projects numerous times, most recently driving the effort to bring Teradyne's Semiconductor Test Division to CMMI Level 3.
Engineering systems are becoming more and more complex, with added functions, capabilities and increasing complexity of the systems architecture. Systems modeling, performance assessment, risk analysis and reliability prediction present increasingly challenging tasks. Continuously growing computing power relegates more and more functions to the software, placing more pressure on delivering faultless hardware‐software interaction. Rapid development of autonomous vehicles and growing attention to functional safety brings quality and reliability to the forefront of the product development cycle.
The book you are about to read presents a comprehensive and practical approach to reliability engineering as an integral part of the product design process. Various pieces of the puzzle, such as hardware reliability, physics of failure, FMEA, product validation and test planning, reliability growth, software quality, lifecycle engineering approach, supplier management and others fit nicely into a comprehensive picture of a successful reliability program.
Despite its obvious importance, quality and reliability education is paradoxically lacking in today's engineering curriculum. Few engineering schools offer degree programs or even a sufficient variety of courses in quality or reliability methods. Therefore, a majority of the quality and reliability practitioners receive their professional training from colleagues, engineering seminars, publications and technical books. The lack of formal education opportunities in this field greatly emphasizes the importance of technical publications, such as this one, for professional development.
We are confident that this book, as well as the whole series, will continue Wiley's tradition of excellence in technical publishing and provide a lasting and positive contribution to the teaching and practice of engineering.
Dr. Andre Kleyner
Editor of the Wiley Series in Quality & Reliability Engineering
There is a popular saying, “If you fail to plan, you are planning to fail.” I don't know if there is another discipline in complex product development where this is more true than designing for product reliability. When products are simple, it is possible to achieve high reliability by observing good design practices, but as products become more complex, and include thousands of components and hundreds of thousands of lines of software, a systematic approach is required.
This has played itself out inside of Teradyne over the last decade through two product lines in our Semiconductor Test Division. One product line, the UltraFLEX Test System, was designed internally. Another, the ETS‐800 Test System, was designed in a company that Teradyne acquired in 2008.
The UltraFLEX platform was designed using Teradyne's internal Design for Reliability standards. The principles embodied in those standards are described by the authors. We religiously used an approved parts list of qualified components and suppliers, we analyzed the electrical stress on every circuit, and we calculated predicted reliability for every instrument and the whole system. Once the system was fielded, we tracked MTBF and executed our failure response, analysis, and corrective action system (FRACAS) on repeat failure modes. The result is that the UltraFLEX platform, our most complex product, has a field reliability about three times higher than prior‐generation products. What makes this more remarkable is that the UltraFLEX has the capability to test two or even four more semiconductor devices in parallel compared to prior testers.
During the development of the UltraFLEX and over the past decade, we also began to deploy and came to rely upon more formal methods to improve software reliability. To be frank, our organizational maturity in software reliability lagged behind our hardware best practices. But through the application of tools like defect models, and especially tracking the reliability of deployed software through automated quality monitors, we were able to both improve the quality of the deployed product and also improve our development methods. A key tool we use to evaluate software reliability is a metric we call clean sessions. A clean session is a session where an operator starts up the tester, loads a program, executes a task like developing tests, debugging, or just testing devices, finishes the task, and then unloads the program, without encountering any anomalous behavior. When we started tracking this metric at the launch of the UltraFLEX, only about half of the sessions were clean. It took us nearly five years to get to 95% clean sessions, and this has set a benchmark that our competitors struggle to reach. Through the learning achieved in this long struggle, we have been able to achieve 95% clean sessions within three months of the release of our next‐generation product.
The ETS‐800 is the next generation version of the successful tester for mixed signal and power devices. When Teradyne acquired the business in 2008, there was no formal reliability program in place, but their products were well regarded in the marketplace and reasonably reliable. The ETS‐800 was a big step up in terms of capability from the prior generation. The instruments were two to four times as dense, and the system could support almost twice as many instruments. Further, the tester included a promising new feature that would greatly simplify customer test programs by providing the switching needed to share tester resources between different device pins.
From a functional and performance perspective, the ETS‐800 was a fantastic success. A single ETS‐800 could replace up to eight prior generation testers. But we found out the hard way that the informal approach to reliability that worked for simple products did not work for more complex ones. When we initially fielded the ETS‐800, it was not a reliable tester. The weak link in the design was the inclusion of thousands of mechanical relays. These relays provided superior electrical performance, but are challenging to use from a reliability perspective. Mechanical relays are highly reliable if they are not hot switched, or switched while a current is flowing through the contacts. A hot‐switching event causes an arc across the contacts surface that causes a rapid degradation to the contact surface and the life of the relay. If the relays were designed for reliability, the hot‐switching event could have been avoided. The ETS 800 reliability was an order of magnitude below the much more complex UltraFLEX platform, and this put a blemish on the reputation we worked hard to develop for delivering highly reliable products.
We worked for a long time to try to improve the robustness of the relays, and reduce the occurrence of hot switching without making much progress. Ultimately we decided to redesign all of the instrumentation using guidelines from the Teradyne reliability system. We are just beginning the deployment of the redesigned instruments, but in side‐by‐side testing, they are demonstrating about 100 times higher reliability than the ones that they replace. It was a hard but effective lesson that a systematic approach to hardware reliability and software quality as the authors have described is the best way to achieve both high customer satisfaction and good profits.
Gregory S. Smith
President, Semiconductor Test Division
Teradyne, Inc.
Modern engineering products, from individual components to large systems, must be designed and manufactured to be reliable. The manufacturing processes must be performed correctly and with the minimum of variation. All of these aspects impact upon the costs of design, development, manufacture, and use, or, as they are often called, the product's life cycle costs. The challenge of modern competitive engineering is to ensure that life cycle costs are minimized whilst achieving requirements for performance and time to market. If the market for the product is competitive, improved quality and reliability can generate very strong competitive advantages. We have seen the results of this in the way that many products, particularly Japanese cars, machine tools, earthmoving equipment, electronic components, and consumer electronic products have won dominant positions in world markets in the last 30–40 years. Their success has been largely the result of the teaching of the late W. E. Deming, who taught the fundamental connections between quality, productivity, and competitiveness. Today this message is well understood by nearly all the engineering companies that face the new competition, and those that do not understand lose position or fail.
The customers for major systems, particularly the US military, drove the quality and reliability methods that were developed in the West. They reacted to a perceived low achievement by imposing standards and procedures, whilst their suppliers saw little motivation to improve, since they were paid for spares and repairs. The methods included formal systems for quality and reliability management (MIL‐Q‐9858 and MIL‐STD‐758) and methods for predicting and measuring reliability (MIL‐STD‐721, MIL‐HDBK‐217, MILSTD781). MIL‐Q‐9858 was the model for the international standard on quality systems (ISO9000); the methods for quantifying reliability have been similarly developed and applied to other types of products and have been incorporated into other standards such as ISO60300. These approaches have not proved to be effective and their application has been controversial.
By contrast, the Japanese quality movement was led by an industry that learned how quality provided the key to greatly increased productivity and competitiveness, principally in commercial and consumer markets. The methods that they applied were based on an understanding of the causes of variation and failures, and continuous improvements through the application of process controls and the motivation and management of people at work. It is one of history's ironies that the foremost teachers of these ideas were Americans, notably P. Drucker, W.A. Shewhart, W.E. Deming, and J.R Juran.
These two streams of development epitomize the difference between the deductive mentality applied by the Japanese to industry in general, and to engineering in particular, in contrast to the more inductive approach that is typically applied in the West. The deductive approach seeks to generate continuous improvements across a broad front and new ideas are subjected to careful evaluation. The inductive approach leads to inventions and “break‐throughs,” and to greater reliance on “systems” for control of people and processes. The deductive approach allows a clearer view, particularly in discriminating between sense and nonsense. However, it is not as conducive to the development of radical new ideas. Obviously these traits are not exclusive, and most engineering work involves elements of both. However, the overall tendency of Japanese thinking shows in their enthusiasm and success in industrial teamwork and in the way that they have adopted the philosophies of western teachers such as Drucker and Deming, whilst their western competitors have found it more difficult to break away from the mold of “scientific” management, with its reliance on systems and more rigid organizations and procedures.
Unfortunately, the development of quality and reliability engineering has been afflicted with more nonsense than any other branch of engineering. This has been the result of the development of methods and systems for analysis and control that contravene the deductive logic that quality and reliability are achieved by knowledge, attention to detail, and continuous improvement on the part of the people involved. Therefore, it can be difficult for students, teachers, engineers, and managers to discriminate effectively, and many have been led down wrong paths.
In this series we will attempt to provide a balanced and practical source covering all aspects of quality and reliability engineering and management, related to present and future conditions, and to the range of new scientific and engineering developments that will shape future products. The goal of this series is to present practical, cost‐efficient and effective quality and reliability engineering methods and systems.
I hope that the series will make a positive contribution to the teaching and the practice of engineering.
Patrick D.T. O'Connor
February 2003