Details

Multi-Processor System-on-Chip 1


Multi-Processor System-on-Chip 1

Architectures
1. Aufl.

von: Liliana Andrade, Frederic Rousseau

139,99 €

Verlag: Wiley
Format: EPUB
Veröffentl.: 24.03.2021
ISBN/EAN: 9781119818281
Sprache: englisch
Anzahl Seiten: 320

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

A Multi-Processor System-on-Chip (MPSoC) is the key component for complex applications. These applications put huge pressure on memory, communication devices and computing units. This book, presented in two volumes – Architectures and Applications – therefore celebrates the 20th anniversary of MPSoC, an interdisciplinary forum that focuses on multi-core and multi-processor hardware and software systems. It is this interdisciplinarity which has led to MPSoC bringing together experts in these fields from around the world, over the last two decades. <p><i>Multi-Processor System-on-Chip 1</b> covers the key components of MPSoC: processors, memory, interconnect and interfaces. It describes advance features of these components and technologies to build efficient MPSoC architectures. All the main components are detailed: use of memory and their technology, communication support and consistency, and specific processor architectures for general purposes or for dedicated applications.
<p>Foreword xiii<br /><i>Ahmed JERRAYA</i></p> <p>Acknowledgments xv<br /><i>Liliana ANDRADE and Frédéric ROUSSEAU</i></p> <p><b>Part 1. Processors 1</b></p> <p><b>Chapter 1. Processors for the Internet of Things 3<br /></b><i>Pieter VAN DER WOLF and Yankin TANURHAN</i></p> <p>1.1. Introduction 3</p> <p>1.2. Versatile processors for low-power IoT edge devices 4</p> <p>1.2.1. Control processing, DSP and machine learning 4</p> <p>1.2.2. Configurability and extensibility 6</p> <p>1.3. Machine learning inference 8</p> <p>1.3.1. Requirements for low/mid-end machine learning inference 10</p> <p>1.3.2. Processor capabilities for low-power machine learning inference 14</p> <p>1.3.3. A software library for machine learning inference 17</p> <p>1.3.4. Example machine learning applications and benchmarks 20</p> <p>1.4. Conclusion 23</p> <p>1.5. References 24</p> <p><b>Chapter 2. A Qualitative Approach to Many-core Architecture 27<br /></b><i>Benoît DUPONT DE DINECHIN</i></p> <p>2.1. Introduction 28</p> <p>2.2. Motivations and context 29</p> <p>2.2.1. Many-core processors 29</p> <p>2.2.2. Machine learning inference 30</p> <p>2.2.3. Application requirements 32</p> <p>2.3. The MPPA3 many-core processor 34</p> <p>2.3.1. Global architecture 34</p> <p>2.3.2. Compute cluster 36</p> <p>2.3.3. VLIW core 38</p> <p>2.3.4. Coprocessor 39</p> <p>2.4. The MPPA3 software environments 42</p> <p>2.4.1. High-performance computing 42</p> <p>2.4.2. KaNN code generator 43</p> <p>2.4.3. High-integrity computing 46</p> <p>2.5. Conclusion 47</p> <p>2.6. References 48</p> <p><b>Chapter 3. The Plural Many-core Architecture – High Performance at Low Power 53<br /></b><i>Ran GINOSAR</i></p> <p>3.1. Introduction 54</p> <p>3.2. Related works 55</p> <p>3.3. Plural many-core architecture 55</p> <p>3.4. Plural programming model 56</p> <p>3.5. Plural hardware scheduler/synchronizer 58</p> <p>3.6. Plural networks-on-chip 61</p> <p>3.6.1. Schedule rNoC 61</p> <p>3.6.2. Shared memory NoC 61</p> <p>3.7. Hardware and software accelerators for the Plural architecture 62</p> <p>3.8. Plural system software 63</p> <p>3.9. Plural software development tools 65</p> <p>3.10. Matrix multiplication algorithm on the Plural architecture 65</p> <p>3.11. Conclusion 67</p> <p>3.12. References 67</p> <p><b>Chapter 4. ASIP-Based Multi-Processor Systems for an Efficient Implementation of CNNs 69<br /></b><i>Andreas BYTYN, René AHLSDORF and Gerd ASCHEID</i></p> <p>4.1. Introduction 70</p> <p>4.2. Related works 71</p> <p>4.3. ASIP architecture 74</p> <p>4.4. Single-core scaling 75</p> <p>4.5. MPSoC overview 78</p> <p>4.6. NoC parameter exploration 79</p> <p>4.7. Summary and conclusion 82</p> <p>4.8. References 83</p> <p><b>Part 2. Memory 85</b></p> <p><b>Chapter 5. Tackling the MPSoC Data Locality Challenge 87<br /></b><i>Sven RHEINDT, Akshay SRIVATSA, Oliver LENKE, Lars NOLTE, Thomas WILD and Andreas HERKERSDORF</i></p> <p>5.1. Motivation 88</p> <p>5.2. MPSoC target platform 90</p> <p>5.3. Related work 91</p> <p>5.4. Coherence-on-demand: region-based cache coherence 92</p> <p>5.4.1. RBCC versus global coherence 93</p> <p>5.4.2. OS extensions for coherence-on-demand 94</p> <p>5.4.3. Coherency region manager 94</p> <p>5.4.4. Experimental evaluations 97</p> <p>5.4.5. RBCC and data placement 99</p> <p>5.5. Near-memory acceleration 100</p> <p>5.5.1. Near-memory synchronization accelerator 102</p> <p>5.5.2. Near-memory queue management accelerator 104</p> <p>5.5.3. Near-memory graph copy accelerator 107</p> <p>5.5.4. Near-cache accelerator 110</p> <p>5.6. The big picture 111</p> <p>5.7. Conclusion 113</p> <p>5.8. Acknowledgments 114</p> <p>5.9. References 114</p> <p><b>Chapter 6. mMPU: Building a Memristor-based General-purpose In-memory Computation Architecture 119<br /></b><i>Adi ELIAHU, Rotem BEN HUR, Ameer HAJ ALI and Shahar KVATINSKY</i></p> <p>6.1. Introduction 120</p> <p>6.2. MAGIC NOR gate 121</p> <p>6.3. In-memory algorithms for latency reduction 122</p> <p>6.4. Synthesis and in-memory mapping methods 123</p> <p>6.4.1. SIMPLE 124</p> <p>6.4.2. SIMPLER 126</p> <p>6.5. Designing the memory controller 127</p> <p>6.6. Conclusion 129</p> <p>6.7. References 130</p> <p><b>Chapter 7. Removing Load/Store Helpers in Dynamic Binary Translation 133<br /></b><i>Antoine FARAVELON, Olivier GRUBER and Frédéric PÉTROT</i></p> <p>7.1. Introduction 134</p> <p>7.2. Emulating memory accesses 136</p> <p>7.3. Design of our solution 140</p> <p>7.4. Implementation 143</p> <p>7.4.1. Kernel module 143</p> <p>7.4.2. Dynamic binary translation 145</p> <p>7.4.3. Optimizing our slow path 147</p> <p>7.5. Evaluation 149</p> <p>7.5.1. QEMU emulation performance analysis 150</p> <p>7.5.2. Our performance overview 151</p> <p>7.5.3. Optimized slow path 153</p> <p>7.6. Related works 155</p> <p>7.7. Conclusion 157</p> <p>7.8. References 158</p> <p><b>Chapter 8. Study and Comparison of Hardware Methods for Distributing Memory Bank Accesses in Many-core Architectures 161<br /></b><i>Arthur VIANES and Frédéric ROUSSEAU</i></p> <p>8.1. Introduction 162</p> <p>8.1.1. Context 162</p> <p>8.1.2. MPSoC architecture 163</p> <p>8.1.3. Interconnect 164</p> <p>8.2. Basics on banked memory 165</p> <p>8.2.1. Banked memory 165</p> <p>8.2.2. Memory bank conflict and granularity 166</p> <p>8.2.3. Efficient use of memory banks: interleaving 168</p> <p>8.3. Overview of software approaches 170</p> <p>8.3.1. Padding 170</p> <p>8.3.2. Static scheduling of memory accesses 172</p> <p>8.3.3. The need for hardware approaches 172</p> <p>8.4. Hardware approaches 172</p> <p>8.4.1. Prime modulus indexing 172</p> <p>8.4.2. Interleaving schemes using hash functions 174</p> <p>8.5. Modeling and experimenting 181</p> <p>8.5.1. Simulator implementation 182</p> <p>8.5.2. Implementation of the Kalray MPPA cluster interconnect 182</p> <p>8.5.3. Objectives and method 184</p> <p>8.5.4. Results and discussion 185</p> <p>8.6. Conclusion 191</p> <p>8.7. References 192</p> <p><b>Part 3. Interconnect and Interfaces 195</b></p> <p><b>Chapter 9. Network-on-Chip (NoC): The Technology that Enabled Multi-processor Systems-on-Chip (MPSoCs) 197<br /></b><i>K. Charles JANAC</i></p> <p>9.1. History: transition from buses and crossbars to NoCs 198</p> <p>9.1.1.NoC architecture 202</p> <p>9.1.2. Extending the bus comparison to crossbars 207</p> <p>9.1.3. Bus, crossbar and NoC comparison summary and conclusion 207</p> <p>9.2. NoC configurability 208</p> <p>9.2.1. Human-guided design flow 208</p> <p>9.2.2. Physical placement awareness and NoC architecture design 209</p> <p>9.3. System-level services 211</p> <p>9.3.1. Quality-of-service (QoS) and arbitration 211</p> <p>9.3.2. Hardware debug and performance analysis 212</p> <p>9.3.3. Functional safety and security 212</p> <p>9.4. Hardware cache coherence 215</p> <p>9.4.1. NoC protocols, semantics and messaging 216</p> <p>9.5. Future NoC technology developments 217</p> <p>9.5.1. Topology synthesis and floorplan awareness 217</p> <p>9.5.2. Advanced resilience and functional safety for autonomous vehicles 218</p> <p>9.5.3. Alternatives to von Neumann architectures for SoCs 219</p> <p>9.5.4. Chiplets and multi-die NoC connectivity 221</p> <p>9.5.5. Runtime software automation 222</p> <p>9.5.6. Instrumentation, diagnostics and analytics for performance, safety and security 223</p> <p>9.6. Summary and conclusion 224</p> <p>9.7. References 224</p> <p><b>Chapter 10. Minimum Energy Computing via Supply and Threshold Voltage Scaling 227<br /></b><i>Jun SHIOMI and Tohru ISHIHARA</i></p> <p>10.1. Introduction 228</p> <p>10.2. Standard-cell-based memory for minimum energy computing 230</p> <p>10.2.1. Overview of low-voltage on-chip memories 230</p> <p>10.2.2. Design strategy for area- and energy-efficient SCMs 234</p> <p>10.2.3. Hybrid memory design towards energy- and area-efficient memory systems 236</p> <p>10.2.4. Body biasing as an alternative to power gating 237</p> <p>10.3. Minimum energy point tracking 238</p> <p>10.3.1. Basic theory 238</p> <p>10.3.2. Algorithms and implementation 244</p> <p>10.3.3. OS-based approach to minimum energy point tracking 246</p> <p>10.4. Conclusion 249</p> <p>10.5. Acknowledgments 249</p> <p>10.6. References 250</p> <p><b>Chapter 11. Maintaining Communication Consistency During Task Migrations in Heterogeneous Reconfigurable Devices 255<br /></b><i>Arief WICAKSANA, OlivierMULLER, Frédéric ROUSSEAU and Arif SASONGKO</i></p> <p>11.1. Introduction 256</p> <p>11.1.1. Reconfigurable architectures 256</p> <p>11.1.2. Contribution 257</p> <p>11.2. Background 257</p> <p>11.2.1. Definitions 258</p> <p>11.2.2. Problem scenario and technical challenges 259</p> <p>11.3. Related works 261</p> <p>11.3.1. Hardware context switch 261</p> <p>11.3.2. Communication management 262</p> <p>11.4. Proposed communication methodology in hardware context switching 263</p> <p>11.5. Implementation of the communication management on reconfigurable computing architectures 266</p> <p>11.5.1. Reconfigurable channels in FIFO 267</p> <p>11.5.2. Communication infrastructure 268</p> <p>11.6. Experimental results 269</p> <p>11.6.1. Setup 269</p> <p>11.6.2. Experiment scenario 270</p> <p>11.6.3. Resource overhead 271</p> <p>11.6.4. Impact on the total execution time 273</p> <p>11.6.5. Impact on the context extract and restore time 275</p> <p>11.6.6. System responsiveness to context switch requests 276</p> <p>11.6.7. Hardware task migration between heterogeneous FPGAs 280</p> <p>11.7. Conclusion 282</p> <p>11.8. References 283</p> <p>List of Authors 287</p> <p>Authors Biographies 291</p> <p>Index 299</p>
<p><b>Liliana Andrade</b> is Associate Professor at TIMA Lab, Universite Grenoble Alpes in France. She received her PhD in Computer Science, Telecommunications and Electronics from Universite Pierre et Marie Curie in 2016. Her research interests include system-level modeling/validation of systems-on-chips, and the acceleration of heterogeneous systems simulation. <p><b>Frederic Rousseau</b> is Full Professor at TIMA Lab, Universite Grenoble Alpes in France. His research interests concern Multi-Processor Systems-on-Chip design and architecture, prototyping of hardware/software systems including reconfigurable systems and highlevel synthesis for embedded systems.

Diese Produkte könnten Sie auch interessieren:

MDX Solutions
MDX Solutions
von: George Spofford, Sivakumar Harinath, Christopher Webb, Dylan Hai Huang, Francesco Civardi
PDF ebook
53,99 €
Concept Data Analysis
Concept Data Analysis
von: Claudio Carpineto, Giovanni Romano
PDF ebook
107,99 €
Handbook of Virtual Humans
Handbook of Virtual Humans
von: Nadia Magnenat-Thalmann, Daniel Thalmann
PDF ebook
150,99 €