Details

Reinforcement Learning and Stochastic Optimization


Reinforcement Learning and Stochastic Optimization

A Unified Framework for Sequential Decisions
1. Aufl.

von: Warren B. Powell

122,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 14.03.2022
ISBN/EAN: 9781119815044
Sprache: englisch
Anzahl Seiten: 1136

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>REINFORCEMENT LEARNING AND STOCHASTIC OPTIMIZATION</b></p> <p><b>Clearing the jungle of stochastic optimization </b></p> <p>Sequential decision problems, which consist of “decision, information, decision, information,” are ubiquitous, spanning virtually every human activity ranging from business applications, health (personal and public health, and medical decision making), energy, the sciences, all fields of engineering, finance, and e-commerce. The diversity of applications attracted the attention of at least 15 distinct fields of research, using eight distinct notational systems which produced a vast array of analytical tools. A byproduct is that powerful tools developed in one community may be unknown to other communities.</p> <p><i>Reinforcement Learning and Stochastic Optimization</i> offers a single canonical framework that can model any sequential decision problem using five core components: state variables, decision variables, exogenous information variables, transition function, and objective function. This book highlights twelve types of uncertainty that might enter any model and pulls together the diverse set of methods for making decisions, known as policies, into four fundamental classes that span every method suggested in the academic literature or used in practice.</p> <p><i>Reinforcement Learning and Stochastic Optimization</i> is the first book to provide a balanced treatment of the different methods for modeling and solving sequential decision problems, following the style used by most books on machine learning, optimization, and simulation. The presentation is designed for readers with a course in probability and statistics, and an interest in modeling and applications. Linear programming is occasionally used for specific problem classes. The book is designed for readers who are new to the field, as well as those with some background in optimization under uncertainty.</p> <p>Throughout this book, readers will find references to over 100 different applications, spanning pure learning problems, dynamic resource allocation problems, general state-dependent problems, and hybrid learning/resource allocation problems such as those that arose in the COVID pandemic. There are 370 exercises, organized into seven groups, ranging from review questions, modeling, computation, problem solving, theory, programming exercises and a "diary problem" that a reader chooses at the beginning of the book, and which is used as a basis for questions throughout the rest of the book.</p>
<p>Preface xxv</p> <p>Acknowledgments xxxi</p> <p><b>Part I – Introduction </b><b>1</b></p> <p><b>1 Sequential Decision Problems </b><b>3</b></p> <p>1.1 The Audience 7</p> <p>1.2 The Communities of Sequential Decision Problems 8</p> <p>1.3 Our Universal Modeling Framework 10</p> <p>1.4 Designing Policies for Sequential Decision Problems 15</p> <p>1.5 Learning 20</p> <p>1.6 Themes 21</p> <p>1.7 Our Modeling Approach 27</p> <p>1.8 How to Read this Book 27</p> <p>1.9 Bibliographic Notes 33</p> <p>Exercises 34</p> <p>Bibliography 38</p> <p><b>2 Canonical Problems and Applications </b><b>39</b></p> <p>2.1 Canonical Problems 39</p> <p>2.2 A Universal Modeling Framework for Sequential Decision Problems 64</p> <p>2.3 Applications 69</p> <p>2.4 Bibliographic Notes 85</p> <p>Exercises 90</p> <p>Bibliography 93</p> <p><b>3 Online Learning </b><b>101</b></p> <p>3.1 Machine Learning for Sequential Decisions 102</p> <p>3.2 Adaptive Learning Using Exponential Smoothing 110</p> <p>3.3 Lookup Tables with Frequentist Updating 111</p> <p>3.4 Lookup Tables with Bayesian Updating 112</p> <p>3.5 Computing Bias and Variance* 118</p> <p>3.6 Lookup Tables and Aggregation* 121</p> <p>3.7 Linear Parametric Models 131</p> <p>3.8 Recursive Least Squares for Linear Models 136</p> <p>3.9 Nonlinear Parametric Models 140</p> <p>3.10 Nonparametric Models* 149</p> <p>3.11 Nonstationary Learning* 159</p> <p>3.12 The Curse of Dimensionality 162</p> <p>3.13 Designing Approximation Architectures in Adaptive Learning 165</p> <p>3.14 Why Does It Work?** 166</p> <p>3.15 Bibliographic Notes 174</p> <p>Exercises 176</p> <p>Bibliography 180</p> <p><b>4 Introduction to Stochastic Search </b><b>183</b></p> <p>4.1 Illustrations of the Basic Stochastic Optimization Problem 185</p> <p>4.2 Deterministic Methods 188</p> <p>4.3 Sampled Models 193</p> <p>4.4 Adaptive Learning Algorithms 202</p> <p>4.5 Closing Remarks 210</p> <p>4.6 Bibliographic Notes 210</p> <p>Exercises 212</p> <p>Bibliography 218</p> <p><b>Part II – Stochastic Search </b><b>221</b></p> <p><b>5 Derivative-Based Stochastic Search </b><b>223</b></p> <p>5.1 Some Sample Applications 225</p> <p>5.2 Modeling Uncertainty 228</p> <p>5.3 Stochastic Gradient Methods 231</p> <p>5.4 Styles of Gradients 237</p> <p>5.5 Parameter Optimization for Neural Networks* 242</p> <p>5.6 Stochastic Gradient Algorithm as a Sequential Decision Problem 247</p> <p>5.7 Empirical Issues 248</p> <p>5.8 Transient Problems* 249</p> <p>5.9 Theoretical Performance* 250</p> <p>5.10 Why Does it Work? 250</p> <p>5.11 Bibliographic Notes 263</p> <p>Exercises 264</p> <p>Bibliography 270</p> <p><b>6 Stepsize Policies </b><b>273</b></p> <p>6.1 Deterministic Stepsize Policies 276</p> <p>6.2 Adaptive Stepsize Policies 282</p> <p>6.3 Optimal Stepsize Policies* 289</p> <p>6.4 Optimal Step sizes for Approximate Value Iteration* 297</p> <p>6.5 Convergence 300</p> <p>6.6 Guidelines for Choosing Stepsize Policies 301</p> <p>6.7 Why Does it Work* 303</p> <p>6.8 Bibliographic Notes 306</p> <p>Exercises 307</p> <p>Bibliography 314</p> <p><b>7 Derivative-Free Stochastic Search </b><b>317</b></p> <p>7.1 Overview of Derivative-free Stochastic Search 319</p> <p>7.2 Modeling Derivative-free Stochastic Search 325</p> <p>7.3 Designing Policies 330</p> <p>7.4 Policy Function Approximations 333</p> <p>7.5 Cost Function Approximations 335</p> <p>7.6 VFA-based Policies 338</p> <p>7.7 Direct Lookahead Policies 348</p> <p>7.8 The Knowledge Gradient (Continued)* 362</p> <p>7.9 Learning in Batches 380</p> <p>7.10 Simulation Optimization* 382</p> <p>7.11 Evaluating Policies 385</p> <p>7.12 Designing Policies 394</p> <p>7.13 Extensions* 398</p> <p>7.14 Bibliographic Notes 409</p> <p>Exercises 412</p> <p>Bibliography 424</p> <p><b>Part III – State-dependent Problems </b><b>429</b></p> <p><b>8 State-dependent Problems </b><b>431</b></p> <p>8.1 Graph Problems 433</p> <p>8.2 Inventory Problems 439</p> <p>8.3 Complex Resource Allocation Problems 446</p> <p>8.4 State-dependent Learning Problems 456</p> <p>8.5 A Sequence of Problem Classes 460</p> <p>8.6 Bibliographic Notes 461</p> <p>Exercises 462</p> <p>Bibliography 466</p> <p><b>9 Modeling Sequential Decision Problems </b><b>467</b></p> <p>9.1 A Simple Modeling Illustration 471</p> <p>9.2 Notational Style 476</p> <p>9.3 Modeling Time 478</p> <p>9.4 The States of Our System 481</p> <p>9.5 Modeling Decisions 500</p> <p>9.6 The Exogenous Information Process 506</p> <p>9.7 The Transition Function 515</p> <p>9.8 The Objective Function 518</p> <p>9.9 Illustration: An Energy Storage Model 523</p> <p>9.10 Base Models and Lookahead Models 528</p> <p>9.11 A Classification of Problems* 529</p> <p>9.12 Policy Evaluation* 532</p> <p>9.13 Advanced Probabilistic Modeling Concepts** 534</p> <p>9.14 Looking Forward 540</p> <p>9.15 Bibliographic Notes 542</p> <p>Exercises 544</p> <p>Bibliography 557</p> <p><b>10 Uncertainty Modeling </b><b>559</b></p> <p>10.1 Sources of Uncertainty 560</p> <p>10.2 A Modeling Case Study: The COVID Pandemic 575</p> <p>10.3 Stochastic Modeling 575</p> <p>10.4 Monte Carlo Simulation 581</p> <p>10.5 Case Study: Modeling Electricity Prices 589</p> <p>10.6 Sampling vs. Sampled Models 595</p> <p>10.7 Closing Notes 597</p> <p>10.8 Bibliographic Notes 597</p> <p>Exercises 598</p> <p>Bibliography 601</p> <p><b>11 Designing Policies </b><b>603</b></p> <p>11.1 From Optimization to Machine Learning to Sequential Decision Problems 605</p> <p>11.2 The Classes of Policies 606</p> <p>11.3 Policy Function Approximations 610</p> <p>11.4 Cost Function Approximations 613</p> <p>11.5 Value Function Approximations 614</p> <p>11.6 Direct Lookahead Approximations 616</p> <p>11.7 Hybrid Strategies 620</p> <p>11.8 Randomized Policies 626</p> <p>11.9 Illustration: An Energy Storage Model Revisited 627</p> <p>11.10 Choosing the Policy Class 631</p> <p>11.11 Policy Evaluation 641</p> <p>11.12 Parameter Tuning 642</p> <p>11.13 Bibliographic Notes 646</p> <p>Exercises 646</p> <p>Bibliography 651</p> <p><b>Part IV – Policy Search </b><b>653</b></p> <p><b>12 Policy Function Approximations and Policy Search </b><b>655</b></p> <p>12.1 Policy Search as a Sequential Decision Problem 657</p> <p>12.2 Classes of Policy Function Approximations 658</p> <p>12.3 Problem Characteristics 665</p> <p>12.4 Flavors of Policy Search 666</p> <p>12.5 Policy Search with Numerical Derivatives 669</p> <p>12.6 Derivative-Free Methods for Policy Search 670</p> <p>12.7 Exact Derivatives for Continuous Sequential Problems* 677</p> <p>12.8 Exact Derivatives for Discrete Dynamic Programs** 680</p> <p>12.9 Supervised Learning 686</p> <p>12.10 Why Does it Work? 687</p> <p>12.11 Bibliographic Notes 690</p> <p>Exercises 691</p> <p>Bibliography 698</p> <p><b>13 Cost Function Approximations </b><b>701</b></p> <p>13.1 General Formulation for Parametric CFA 703</p> <p>13.2 Objective-Modified CFAs 704</p> <p>13.3 Constraint-Modified CFAs 714</p> <p>13.4 Bibliographic Notes 725</p> <p>Exercises 726</p> <p>Bibliography 729</p> <p><b>Part V – Lookahead Policies </b><b>731</b></p> <p><b>14 Exact Dynamic Programming </b><b>737</b></p> <p>14.1 Discrete Dynamic Programming 738</p> <p>14.2 The Optimality Equations 740</p> <p>14.3 Finite Horizon Problems 747</p> <p>14.4 Continuous Problems with Exact Solutions 750</p> <p>14.5 Infinite Horizon Problems* 755</p> <p>14.6 Value Iteration for Infinite Horizon Problems* 757</p> <p>14.7 Policy Iteration for Infinite Horizon Problems* 762</p> <p>14.8 Hybrid Value-Policy Iteration* 764</p> <p>14.9 Average Reward Dynamic Programming* 765</p> <p>14.10 The Linear Programming Method for Dynamic Programs** 766</p> <p>14.11 Linear Quadratic Regulation 767</p> <p>14.12 Why Does it Work?** 770</p> <p>14.13 Bibliographic Notes 783</p> <p>Exercises 783</p> <p>Bibliography 793</p> <p><b>15 Backward Approximate Dynamic Programming </b><b>795</b></p> <p>15.1 Backward Approximate Dynamic Programming for Finite Horizon Problems 797</p> <p>15.2 Fitted Value Iteration for Infinite Horizon Problems 804</p> <p>15.3 Value Function Approximation Strategies 805</p> <p>15.4 Computational Observations 810</p> <p>15.5 Bibliographic Notes 816</p> <p>Exercises 816</p> <p>Bibliography 821</p> <p><b>16 Forward ADP I: The Value of a Policy </b><b>823</b></p> <p>16.1 Sampling the Value of a Policy 824</p> <p>16.2 Stochastic Approximation Methods 835</p> <p>16.3 Bellman’s Equation Using a Linear Model* 837</p> <p>16.4 Analysis of TD(0), LSTD, and LSPE Using a Single State* 842</p> <p>16.5 Gradient-based Methods for Approximate Value Iteration* 845</p> <p>16.6 Value Function Approximations Based on Bayesian Learning* 852</p> <p>16.7 Learning Algorithms and Atepsizes 855</p> <p>16.8 Bibliographic Notes 860</p> <p>Exercises 862</p> <p>Bibliography 864</p> <p><b>17 Forward ADP II: Policy Optimization </b><b>867</b></p> <p>17.1 Overview of Algorithmic Strategies 869</p> <p>17.2 Approximate Value Iteration and <i>Q</i>-Learning Using Lookup Tables 871</p> <p>17.3 Styles of Learning 881</p> <p>17.4 Approximate Value Iteration Using Linear Models 886</p> <p>17.5 On-policy vs. off-policy learning and the exploration–exploitation problem 888</p> <p>17.6 Applications 894</p> <p>17.7 Approximate Policy Iteration 900</p> <p>17.8 The Actor–Critic Paradigm 907</p> <p>17.9 Statistical Bias in the Max Operator* 909</p> <p>17.10 The Linear Programming Method Using Linear Models* 912</p> <p>17.11 Finite Horizon Approximations for Steady-State Applications 915</p> <p>17.12 Bibliographic Notes 917</p> <p>Exercises 918</p> <p>Bibliography 924</p> <p><b>18 Forward ADP III: Convex Resource Allocation Problems </b><b>927</b></p> <p>18.1 Resource Allocation Problems 930</p> <p>18.2 Values Versus Marginal Values 937</p> <p>18.3 Piecewise Linear Approximations for Scalar Functions 938</p> <p>18.4 Regression Methods 941</p> <p>18.5 Separable Piecewise Linear Approximations 944</p> <p>18.6 Benders Decomposition for Nonseparable Approximations** 946</p> <p>18.7 Linear Approximations for High-Dimensional Applications 956</p> <p>18.8 Resource Allocation with Exogenous Information State 958</p> <p>18.9 Closing Notes 959</p> <p>18.10 Bibliographic Notes 960</p> <p>Exercises 962</p> <p>Bibliography 967</p> <p><b>19 Direct Lookahead Policies </b><b>971</b></p> <p>19.1 Optimal Policies Using Lookahead Models 974</p> <p>19.2 Creating an Approximate Lookahead Model 978</p> <p>19.3 Modified Objectives in Lookahead Models 985</p> <p>19.4 Evaluating DLA Policies 992</p> <p>19.5 Why Use a DLA? 997</p> <p>19.6 Deterministic Lookaheads 999</p> <p>19.7 A Tour of Stochastic Lookahead Policies 1005</p> <p>19.8 Monte Carlo Tree Search for Discrete Decisions 1009</p> <p>19.9 Two-Stage Stochastic Programming for Vector Decisions* 1018</p> <p>19.10 Observations on DLA Policies 1024</p> <p>19.11 Bibliographic Notes 1025</p> <p>Exercises 1027</p> <p>Bibliography 1031</p> <p><b>Part VI – Multiagent Systems </b><b>1033</b></p> <p><b>20 Multiagent Modeling and Learning </b><b>1035</b></p> <p>20.1 Overview of Multiagent Systems 1036</p> <p>20.2 A Learning Problem – Flu Mitigation 1044</p> <p>20.3 The POMDP Perspective* 1059</p> <p>20.4 The Two-Agent Newsvendor Problem 1062</p> <p>20.5 Multiple Independent Agents – An HVAC Controller Model 1067</p> <p>20.6 Cooperative Agents – A Spatially Distributed Blood Management Problem 1070</p> <p>20.7 Closing Notes 1074</p> <p>20.8 Why Does it Work? 1074</p> <p>20.9 Bibliographic Notes 1076</p> <p>Exercises 1077</p> <p>Bibliography 1083</p> <p>Index 1085</p>
<p><b>Warren B. Powell, PhD,</b> is Professor Emeritus of Operations Research and Financial Engineering at Princeton University, where he taught for 39 years. He was the founder and Director of CASTLE Laboratory, a research unit that works with industrial partners to test new ideas found in operations research. He supervised 70 graduate students and post-docs, with whom he wrote over 250 papers. He is currently the Chief Analytics Officer of Optimal Dynamics, a lab spinoff that is taking his research to industry. </p>
<p><b>Clearing the jungle of stochastic optimization </b> </p> <p>Sequential decision problems, which consist of “decision, information, decision, information,” are ubiquitous, spanning virtually every human activity ranging from business applications, health (personal and public health, and medical decision making), energy, the sciences, all fields of engineering, finance, and e-commerce. The diversity of applications attracted the attention of at least 15 distinct fields of research, using eight distinct notational systems which produced a vast array of analytical tools. A byproduct is that powerful tools developed in one community may be unknown to other communities. <p> <i> Reinforcement Learning and Stochastic Optimization</i> offers a single canonical framework that can model any sequential decision problem using five core components: state variables, decision variables, exogenous information variables, transition function, and objective function. This book highlights twelve types of uncertainty that might enter any model and pulls together the diverse set of methods for making decisions, known as policies, into four fundamental classes that span every method suggested in the academic literature or used in practice. <p> <i>Reinforcement Learning and Stochastic Optimization</i> is the first book to provide a balanced treatment of the different methods for modeling and solving sequential decision problems, following the style used by most books on machine learning, optimization, and simulation. The presentation is designed for readers with a course in probability and statistics, and an interest in modeling and applications. Linear programming is occasionally used for specific problem classes. The book is designed for readers who are new to the field, as well as those with some background in optimization under uncertainty. <p> Throughout this book, readers will find references to over 100 different applications, spanning pure learning problems, dynamic resource allocation problems, general state-dependent problems, and hybrid learning/resource allocation problems such as those that arose in the COVID pandemic. There are 370 exercises, organized into seven groups, ranging from review questions, modeling, computation, problem solving, theory, programming exercises and a “diary problem” that a reader chooses at the beginning of the book, and which is used as a basis for questions throughout the rest of the book.

Diese Produkte könnten Sie auch interessieren:

DPSM for Modeling Engineering Problems
DPSM for Modeling Engineering Problems
von: Dominique Placko, Tribikram Kundu
PDF ebook
159,99 €
Mathematical Analysis
Mathematical Analysis
von: Bernd S. W. Schröder
PDF ebook
114,99 €