Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data is available.

ISBN: 978-1-119-13264-6

About the Authors

Yu Jiang is a Software Engineer with the Control Systems Toolbox Team at The MathWorks, Inc. He received a B.Sc. degree in Applied Mathematics from Sun Yat-sen University, Guangzhou, China, a M.Sc. degree in Automation Science and Engineering from South China University of Technology, Guangzhou, China, and a Ph.D. degree in Electrical Engineering from New York University. His research interests include adaptive dynamic programming and other numerical methods in control and optimization. He was the recipient of the Shimemura Young Author Prize (with Prof. Z.P. Jiang) at the 9th Asian Control Conference in Istanbul, Turkey, 2013.

Zhong-Ping Jiang is a Professor of Electrical and Computer Engineering at the Tandon School of Engineering, New York University. His main research interests include stability theory, robust/adaptive/distributed nonlinear control, adaptive dynamic programming and their applications to information, mechanical and biological systems. In these areas, he has written 3 books, 14 book chapters and is the (co-)author of over 182 journal papers and numerous conference papers. His work has received 15,800 citations with an h-index of 63 according to Google Scholar. Professor Jiang is a Deputy co-Editor-in-Chief of the Journal of Control and Decision, a Senior Editor for the IEEE Transactions on Control Systems Letters, and has served as an editor, a guest editor and an associate editor for several journals in Systems and Control. Prof. Jiang is a Fellow of the IEEE and a Fellow of the IFAC.

Preface and Acknowledgments

This book covers the topic of adaptive optimal control (AOC) for continuous-time systems. An adaptive optimal controller can gradually modify itself to adapt to the controlled system, and the adaptation is measured by some performance index of the closed-loop system. The study of AOC can be traced back to the 1970s, when researchers at the Los Alamos Scientific Laboratory (LASL) started to investigate the use of adaptive and optimal control techniques in buildings with solar-based temperature control. Compared with conventional adaptive control, AOC has the important ability to improve energy conservation and system performance. However, even though there are various ways in AOC to compute the optimal controller, most of the previously known approaches are model-based, in the sense that a model with a fixed structure is assumed before designing the controller. In addition, these approaches do not generalize to nonlinear models.

On the other hand, quite a few model-free, data-driven approaches for AOC have emerged in recent years. In particular, adaptive/approximate dynamic programming (ADP) is a powerful methodology that integrates the idea of reinforcement learning (RL) observed from mammalian brain with decision theory so that controllers for man-made systems can learn to achieve optimal performance in spite of uncertainty about the environment and the lack of detailed system models. Since the 1960s, RL has been brought to the computer science and control science literature as a way to study artificial intelligence, and has been successfully applied to many discrete-time systems, or Markov Decision Processes (MDPs). However, it has always been challenging to generalize those results to the controller design of physical systems. This is mainly because the state space of a physical control system is generally continuous and unbounded, and the states are continuous in time. Therefore, the convergence and the stability properties have to be carefully studied for ADP-based approaches. The main purpose of this book is to introduce the recently developed framework, known as robust adaptive dynamic programming (RADP), for data-driven, non-model based adaptive optimal control design for both linear and nonlinear continuous-time systems.

In addition, this book is intended to address in a systematic way the presence of dynamic uncertainty. Dynamic uncertainty exists ubiquitously in control engineering. It is primarily caused by the dynamics which are part of the physical system but are either difficult to be mathematically modeled or ignored for the sake of controller design and system analysis. Without addressing the dynamic uncertainty, controller designs based on the simplified model will most likely fail when applied to the physical system. In most of the previously developed ADP or other RL methods, it is assumed that the full-state information is always available, and therefore the system order must be known. Although this assumption excludes the existence of any dynamic uncertainty, it is apparently too strong to be realistic. For a physical model on a relatively large scale, knowing the exact number of state variables can be difficult, not to mention that not all state variables can be measured precisely. For example, consider a power grid with a main generator controlled by the utility company and small distributed generators (DGs) installed by customers. The utility company should not neglect the dynamics of the DGs, but should treat them as dynamic uncertainties when controlling the grid, such that stability, performance, and power security can be always maintained as expected.

The book is organized in four parts. First, an overview of RL, ADP, and RADP is contained in Chapter 1. Second, a few recently developed continuous-time ADP methods are introduced in Chapters 2, 3, and 4. Chapter 2 covers the topic of ADP for uncertain linear systems. Chapters 3 and 4 provide neural network-based and sum-of-squares (SOS)-based ADP methodologies to achieve semi-global and global stabilization for uncertain nonlinear continuous-time systems, respectively. Third, Chapters 5 and 6 focus on RADP for linear and nonlinear systems, with dynamic uncertainties rigorously addressed. In Chapter 5, different robustification schemes are introduced to achieve RADP. Chapter 6 further extends the RADP framework for large-scale systems and illustrates its applicability to industrial power systems. Finally, Chapter 7 applies ADP and RADP to study the sensorimotor control of humans, and the results suggest that humans may be using very similar approaches to learn to coordinate movements to handle uncertainties in our daily lives.

This book makes a major departure from most existing texts covering the same topics by providing many practical examples such as power systems and human sensorimotor control systems to illustrate the effectiveness of our results. The book uses MATLAB in each chapter to conduct numerical simulations. MATLAB is used as a computational tool, a programming tool, and a graphical tool. Simulink, a graphical programming environment for modeling, simulating, and analyzing multidomain dynamic systems, is used in Chapter 2. The third-party MATLAB-based software SOSTOOLS and CVX are used in Chapters 4 and 5 to solve SOS programs and semidefinite programs (SDP). All MATLAB programs and the Simulink model developed in this book as well as extension of these programs are available at http://yu-jiang.github.io/radpbook/

The development of this book would not have been possible without the support and help of many people. The authors wish to thank Prof. Frank Lewis and Dr. Paul Werbos whose seminal work on adaptive/approximate dynamic programming has laid down the foundation of the book. The first-named author (YJ) would like to thank his Master’s Thesis adviser Prof. Jie Huang for guiding him into the area of nonlinear control, and Dr. Yebin Wang for offering him a summer research internship position at Mitsubishi Electric Research Laboratories, where parts of the ideas in Chapters 4 and 5 were originally inspired. The second-named author (ZPJ) would like to acknowledge his colleagues—specially Drs. Alessandro Astolfi, Lei Guo, Iven Mareels, and Frank Lewis—for many useful comments and constructive criticism on some of the research summarized in the book. He is grateful to his students for the boldness in entering the interesting yet still unpopular field of data-driven adaptive optimal control. The authors wish to thank the editors and editorial staff, in particular, Mengchu Zhou, Mary Hatcher, Brady Chin, Suresh Srinivasan, and Divya Narayanan, for their efforts in publishing the book. We thank Tao Bian and Weinan Gao for collaboration on generalizations and applications of ADP based on the framework of RADP presented in this book. Finally, we thank our families for their sacrifice in adapting to our hard-to-predict working schedules that often involve dynamic uncertainties. From our family members, we have learned the importance of exploration noise in achieving the desired trade-off between robustness and optimality. The bulk of this research was accomplished while the first-named author was working toward his Ph.D. degree in the Control and Networks Lab at New York University Tandon School of Engineering. The authors wish to acknowledge the research funding support by the National Science Foundation.

YU JIANG
Wellesley, Massachusetts

ZHONG-PING JIANG
Brooklyn, New York

Acronyms

ADP: Adaptive/approximate dynamic programming
AOC: Adaptive optimal control
ARE: Algebraic Riccati equation
DF: Divergent force field
DG: Distributed generator/generation
DP: Dynamic programming
GAS: Global asymptotic stability
HJB: Hamilton-Jacobi-Bellman (equation)
IOS: Input-to-output stability
ISS: Input-to-state stability
LQR: Linear quadratic regulator
MDP: Markov decision process
NF: Null-field
PE: Persistent excitation
PI: Policy iteration
RADP: Robust adaptive dynamic programming
RL: Reinforcement learning
SDP: Semidefinite programming
SOS: Sum-of-squares
SUO: Strong unboundedness observability
VF: Velocity-dependent force field
VI: Value iteration

Glossary

| · |: The Euclidean norm for vectors, or the induced matrix norm for matrices
‖ · ‖: For any piecewise continuous function , ‖‖u‖‖ = sup{|u(t)|, t ⩾ 0}
⊗: Kronecker product
: The set of all continuously differentiable functions
J^⊕_D: The cost for the coupled large-scale system
J^⊙_D: The cost for the decoupled large-scale system
: The set of all functions in that are also positive definite and radially unbounded
: Infinitesimal generator
: The set of all real numbers
: The set of all non-negative real numbers
: The set of all polynomials in with degree no less than d₁ > 0 and no greater than d₂
vec( · ): vec(A) is defined to be the mn-vector formed by stacking the columns of on top of another, that is, vec(A) = [a^T₁a₂^T⋅⋅⋅a^T_m]^T, where , with i = 1, 2, …, m, are the columns of A
: The set of all non-negative integers
: The vector of all distinct monic monomials in with degree no less than d₁ > 0 and no greater than d₂
∇: ∇V refers to the gradient of a differentiable function