Details

Markov Decision Processes in Artificial Intelligence


Markov Decision Processes in Artificial Intelligence


1. Aufl.

von: Olivier Sigaud, Olivier Buffet

154,99 €

Verlag: Wiley
Format: EPUB
Veröffentl.: 04.03.2013
ISBN/EAN: 9781118620106
Sprache: englisch
Anzahl Seiten: 480

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p>Markov Decision Processes (MDPs) are a mathematical framework for modeling sequential decision problems under uncertainty as well as reinforcement learning problems. <p>Written by experts in the field, this book provides a global view of current research using MDPs in artificial intelligence. It starts with an introductory presentation of the fundamental aspects of MDPs (planning in MDPs, reinforcement learning, partially observable MDPs, Markov games and the use of non-classical criteria). It then presents more advanced research trends in the field and gives some concrete examples using illustrative real life applications.
<p><i>Preface xvii</i></p> <p><i>List of Authors xix</i></p> <p><b>PART 1. MDPS: MODELS AND METHODS 1</b></p> <p><b>Chapter 1. Markov Decision Processes 3</b><br /> <i>Frédérick GARCIA and Emmanuel RACHELSON</i></p> <p>1.1. Introduction 3</p> <p>1.2. Markov decision problems 4</p> <p>1.3. Value functions 9</p> <p>1.4. Markov policies 12</p> <p>1.5. Characterization of optimal policies 14</p> <p>1.6. Optimization algorithms for MDPs 28</p> <p>1.7. Conclusion and outlook 37</p> <p>1.8. Bibliography 37</p> <p><b>Chapter 2. Reinforcement Learning 39</b><br /> <i>Olivier SIGAUD and Frédérick GARCIA</i></p> <p>2.1. Introduction 39</p> <p>2.2. Reinforcement learning: a global view 40</p> <p>2.3. Monte Carlo methods 45</p> <p>2.4. From Monte Carlo to temporal difference methods 45</p> <p>2.5. Temporal difference methods 46</p> <p>2.6. Model-based methods: learning a model 59</p> <p>2.7. Conclusion 63</p> <p>2.8. Bibliography 63</p> <p><b>Chapter 3. Approximate Dynamic Programming 67</b><br /> <i>Rémi MUNOS</i></p> <p>3.1. Introduction 68</p> <p>3.2. Approximate value iteration (AVI) 70</p> <p>3.3. Approximate policy iteration (API) 77</p> <p>3.4. Direct minimization of the Bellman residual 87</p> <p>3.5. Towards an analysis of dynamic programming in Lp-norm 88</p> <p>3.6. Conclusions 93</p> <p>3.7. Bibliography 93</p> <p><b>Chapter 4. Factored Markov Decision Processes 99</b><br /> <i>Thomas DEGRIS and Olivier SIGAUD</i></p> <p>4.1. Introduction 99</p> <p>4.2. Modeling a problem with an FMDP 100</p> <p>4.3. Planning with FMDPs 108</p> <p>4.4. Perspectives and conclusion 122</p> <p>4.5. Bibliography 123</p> <p><b>Chapter 5. Policy-Gradient Algorithms 127</b><br /> <i>Olivier BUFFET</i></p> <p>5.1. Reminder about the notion of gradient 128</p> <p>5.2. Optimizing a parameterized policy with a gradient algorithm 130</p> <p>5.3. Actor-critic methods 143</p> <p>5.4. Complements 147</p> <p>5.5. Conclusion 150</p> <p>5.6. Bibliography 150</p> <p><b>Chapter 6. Online Resolution Techniques 153</b><br /> <i>Laurent PÉRET and Frédérick GARCIA</i></p> <p>6.1. Introduction 153</p> <p>6.2. Online algorithms for solving an MDP 155</p> <p>6.3. Controlling the search 167</p> <p>6.4. Conclusion 180</p> <p>6.5. Bibliography 180</p> <p><b>PART 2. BEYOND MDPS 185</b></p> <p><b>Chapter 7. Partially Observable Markov Decision Processes 187</b><br /> <i>Alain DUTECH and Bruno SCHERRER</i></p> <p>7.1. Formal definitions for POMDPs 188</p> <p>7.2. Non-Markovian problems: incomplete information 196</p> <p>7.3. Computation of an exact policy on information states 202</p> <p>7.4. Exact value iteration algorithms 207</p> <p>7.5. Policy iteration algorithms 222</p> <p>7.6. Conclusion and perspectives 223</p> <p>7.7. Bibliography 225</p> <p><b>Chapter 8. Stochastic Games 229</b><br /> <i>Andriy BURKOV, Laëtitia MATIGNON and Brahim CHAIB-DRAA</i></p> <p>8.1. Introduction 229</p> <p>8.2. Background on game theory 230</p> <p>8.3. Stochastic games 245</p> <p>8.4. Conclusion and outlook 269</p> <p>8.5. Bibliography 270</p> <p><b>Chapter 9. DEC-MDP/POMDP 277</b><br /> <i>Aurélie BEYNIER, François CHARPILLET, Daniel SZER and Abdel-Illah MOUADDIB</i></p> <p>9.1. Introduction 277</p> <p>9.2. Preliminaries 278</p> <p>9.3. Multi agent Markov decision processes 279</p> <p>9.4. Decentralized control and local observability 280</p> <p>9.5. Sub-classes of DEC-POMDPs 285</p> <p>9.6. Algorithms for solving DEC-POMDPs 295</p> <p>9.7. Applicative scenario: multirobot exploration 310</p> <p>9.8. Conclusion and outlook . . . 312</p> <p>9.9. Bibliography 313</p> <p><b>Chapter 10. Non-Standard Criteria 319</b><br /> <i>Matthieu BOUSSARD, Maroua BOUZID, Abdel-Illah MOUADDIB, Régis SABBADIN and Paul WENG</i></p> <p>10.1. Introduction 319</p> <p>10.2. Multicriteria approaches 320</p> <p>10.3. Robustness in MDPs 327</p> <p>10.4. Possibilistic MDPs 329</p> <p>10.5. Algebraic MDPs 342</p> <p>10.6. Conclusion 354</p> <p>10.7. Bibliography 355</p> <p><b>PART 3. APPLICATIONS 361</b></p> <p><b>Chapter 11. Online Learning for Micro-Object Manipulation 363</b><br /> <i>Guillaume LAURENT</i></p> <p>11.1. Introduction 363</p> <p>11.2. Manipulation device 364</p> <p>11.3. Choice of the reinforcement learning algorithm 367</p> <p>11.4. Experimental results 370</p> <p>11.5. Conclusion 373</p> <p>11.6. Bibliography 373</p> <p><b>Chapter 12. Conservation of Biodiversity 375</b><br /> <i>Iadine CHADÈS</i></p> <p>12.1. Introduction 375</p> <p>12.2. When to protect, survey or surrender cryptic endangered species 376</p> <p>12.3. Can sea otters and abalone co-exist? 381</p> <p>12.4. Other applications in conservation biology and discussions 391</p> <p>12.5. Bibliography 392</p> <p><b>Chapter 13. Autonomous Helicopter Searching for a Landing Area in an Uncertain Environment 395</b><br /> <i>Patrick FABIANI and Florent TEICHTEIL-KÖNIGSBUCH</i></p> <p>13.1. Introduction 395</p> <p>13.2. Exploration scenario 397</p> <p>13.3. Embedded control and decision architecture 401</p> <p>13.4. Incremental stochastic dynamic programming 404</p> <p>13.5. Flight tests and return on experience 407</p> <p>13.6. Conclusion 410</p> <p>13.7. Bibliography 410</p> <p><b>Chapter 14. Resource Consumption Control for an Autonomous Robot 413</b><br /> <i>Simon LE GLOANNEC and Abdel-Illah MOUADDIB</i></p> <p>14.1. The rover’s mission 414</p> <p>14.2. Progressive processing formalism 415</p> <p>14.3. MDP/PRU model 416</p> <p>14.4. Policy calculation 418</p> <p>14.5. How to model a real mission 419</p> <p>14.6. Extensions 422</p> <p>14.7. Conclusion 423</p> <p>14.8. Bibliography 423</p> <p><b>Chapter 15. Operations Planning 425</b><br /> <i>Sylvie THIÉBAUX and Olivier BUFFET</i></p> <p>15.1. Operations planning 425</p> <p>15.2. MDP value function approaches 433</p> <p>15.3. Reinforcement learning: FPG 442</p> <p>15.4. Experiments 446</p> <p>15.5. Conclusion and outlook 448</p> <p>15.6. Bibliography 450</p> <p><i>Index 453</i></p>
"As an overall conclusion, this book is an extensive presentation of MDPs and their applications in modeling uncertain decision problems and in reinforcement learning." (Zentralblatt MATH, 2011) <p> "The range of subjects covered is fascinating, however, from game-theoretical applications to reinforcement learning, conservation of biodiversity and operations planning. Oriented towards advanced students and researchers in the fields of both artificial intelligence and the study of algorithms as well as discrete mathematics." (<i>Book News</i>, September 2010)</p>
<p><strong>Olivier Sigaud</strong> is a Professor of Computer Science at the University of Paris 6 (UPMC). He is the Head of the "Motion" Group in the Institute of Intelligent Systems and Robotics (ISIR).<br />Olivier Buffet has been an INRIA researcher in the Autonomous Intelligent Machines (MAIA) team of theLORIA laboratory, since November 2007.

Diese Produkte könnten Sie auch interessieren:

Strategies to the Prediction, Mitigation and Management of Product Obsolescence
Strategies to the Prediction, Mitigation and Management of Product Obsolescence
von: Bjoern Bartels, Ulrich Ermel, Peter Sandborn, Michael G. Pecht
PDF ebook
116,99 €