Cover
Preface
1. References
Acknowledgment
1 Overview
1. 1.1 History of Neural Networks
2. 1.2 Neural Networks in Software
3. 1.3 Need for Neuromorphic Hardware
4. 1.4 Objectives and Outlines of the Book
5. References
2 Fundamentals and Learning of Artificial Neural Networks
1. 2.1 Operational Principles of Artificial Neural Networks
2. 2.2 Neural Network Based Machine Learning
3. 2.3 Network Topologies
4. 2.4 Dataset and Benchmarks
5. 2.5 Deep Learning
6. References
3 Artificial Neural Networks in Hardware
1. 3.1 Overview
2. 3.2 General‐Purpose Processors
3. 3.3 Digital Accelerators
4. 3.4 Analog/Mixed‐Signal Accelerators
5. 3.5 Case Study: An Energy‐Efficient Accelerator for Adaptive Dynamic Programming
6. References
4 Operational Principles and Learning in Spiking Neural Networks
1. 4.1 Spiking Neural Networks
2. 4.2 Learning in Shallow SNNs
3. 4.3 Learning in Deep SNNs
4. References
5 Hardware Implementations of Spiking Neural Networks
1. 5.1 The Need for Specialized Hardware
2. 5.2 Digital SNNs
3. 5.3 Analog/Mixed‐Signal SNNs
4. References
6 Conclusions
1. 6.1 Outlooks
2. 6.2 Conclusions
3. References
Appendix AAppendix A
1. A.1 Hopfield Network
2. A.2 Memory Self‐Repair with Hopfield Network
3. References
Index
End User License Agreement

List of Tables

Chapter 2
1. Table 2.1 Comparison of the computational complexity of aconventional update and...
2. Table 2.2 Meaning of parameters in Figures 2.18 and 2.19.
Chapter 3
1. Table 3.1 The number of bits required per convolutional layer.
2. Table 3.2 Summary of Specifications of the ADHDP accelerator.
Chapter 4
1. Table 4.1 Information for the limiting operations used to obtain data in Figures...
2. Table 4.2 Comparison of the recognition accuracy on the MNIST benchmark test.
Chapter 5
1. Table 5.1 Power consumption breakdown.
2. Table 5.2 Summary and comparison with prior arts.
3. Table 5.3 Benchmark performance achieved with different memristor models.
1
1. Table A.1 Averaged successful repairs achieved by RM and GD‐zero.
2. Table A.2 Comparison of the performances between GD and HC.

List of Illustrations

Chapter 1
1. Figure 1.1 The development of neural networks over time. One of the earliest n...
2. Figure 1.2 History of the development of computing devices. From the early mai...
3. Figure 1.3 Overview of the organization of the book.
Chapter 2
1. Figure 2.1 Illustration of the basic operations in an ANN. Activations from th...
2. Figure 2.2 Popular activation functions. (a) A sigmoid function. (b) A hyperbo...
3. Figure 2.3 Illustration of a feedforward neural network. The layers connected ...
4. Figure 2.4 Illustration of the gradient descent process. The parameter w is ad...
5. Figure 2.5 Illustration of the learning process when (a) the learning rate is ...
6. Figure 2.6 Illustration of the local minimum and the global minimum of a cost ...
7. Figure 2.7 Illustration of proper‐fitting, underfitting and overfitting.
8. Figure 2.8 Illustration of the configuration of a reinforcement‐learning task....
9. Figure 2.9 Illustration of two ways of storing the state‐value functions neede...
10. Figure 2.10 Illustration of the dimensionality reduction in unsupervised learn...
11. Figure 2.11 Configuration of a typical autoencoder. The autoencoder encodes th...
12. Figure 2.12 Illustration of the actor‐critic configuration used in the ADHDP a...
13. Figure 2.13 Pseudocode for the ADHDP algorithm [27]. Reproduced with permissio...
14. Figure 2.14 Conceptual illustration of the virtual update algorithm. The backw...
15. Figure 2.15 Pseudocode for the virtual update algorithm [27]. Only the while l...
16. Figure 2.16 Illustration of the intuition behind how the virtual update algori...
17. Figure 2.17 Illustration of the matrix‐related operations in a non‐batch mode ...
18. Figure 2.18 Configuration of a typical CNN. For each convolutional operation, ...
19. Figure 2.19 Pseudocode for a typical convolution operation involved in a CNN. ...
20. Figure 2.20 Illustration of average pooling and max pooling. The mean and the ...
21. Figure 2.21 Comparison of an FNN and an RNN. While only feedforward connection...
22. Figure 2.22 Schematic of a typical LSTM block. The input information flows thr...
23. Figure 2.23 Illustration of the MNIST dataset. Each image in the dataset conta...
24. Figure 2.24 Configuration of the (a) cart‐pole balancing task, (b) beam balanc...
25. Figure 2.25 Illustration of the dropout and DropConnect techniques. With dropo...
26. Figure 2.26 Illustration of two popular techniques employed to improve the com...
27. Figure 2.27 Building block for ResNet. A shortcut is created to bypass a few l...
28. Figure 2.28 Architecture of a deep Q‐network [14]. The network itself is simil...
Chapter 3
1. Figure 3.1 Comparison among different hardware platforms for implementing neur...
2. Figure 3.2 An example of the memory hierarchy and the energy cost for data mov...
3. Figure 3.3 Pseudocode for a matrix multiplication operation with tiling. The o...
4. Figure 3.4 Illustration of the tile‐based matrix multiplication. Matrices are ...
5. Figure 3.5 Conceptual illustration of the main dataflow employed in a DianNao ...
6. Figure 3.6 Comparison of two popular dataflows that are widely used in neural ...
7. Figure 3.7 Illustration of the data reuse scheme used in the row‐stationary te...
8. Figure 3.8 (a) Illustration of the systolic dataflow employed in TPU. (b) Deta...
9. Figure 3.9 Comparison of (a) the conventional breadth‐first convolution and (b...
10. Figure 3.10 Illustration of the quantization table‐based matrix multiplication...
11. Figure 3.11 Illustration of the operations needed in a binarized network accel...
12. Figure 3.12 Comparison of two aggregation modes in a bit‐precision scalable ar...
13. Figure 3.13 illustration of the static and dynamic zeros defined in [53]. Stat...
14. Figure 3.14 Comparison of two typical ways to leverage (a) sparsity in the inp...
15. Figure 3.15 Comparison of (a) the conventional digital computing and (b) the i...
16. Figure 3.16 Architecture of the in‐SRAM classifier. The word‐lines of the SRAM...
17. Figure 3.17 Techniques employed in DIMA to achieve a multi‐bit weight readout....
18. Figure 3.18 Techniques used in conv‐RAM. Input activations are injected throug...
19. Figure 3.19 The diagram of the MLCP implementing an ELM [87]. The input layer ...
20. Figure 3.20 Comparison of two approaches employed to perform inference directl...
21. Figure 3.21 Implementing mixed‐signal MAC operations in (a) the charge domain ...
22. Figure 3.22 Illustration of using memristive devices for conducting matrix–vec...
23. Figure 3.23 Demonstration of a few typical non‐idealities in programming NVM c...
24. Figure 3.24 Illustration of the optical neural interface unit [138]. The porti...
25. Figure 3.25 Hardware architecture for the ADP accelerator [139]. The accelerat...
26. Figure 3.26 Illustration of the dataflow and memory access pattern in the ADP ...
27. Figure 3.27 (a) Piecewise linear approximation of the hyperbolic tangent activ...
28. Figure 3.28 Comparison of the learning performance achieved with different lev...
29. Figure 3.29 Illustration of the instructions used in the accelerators [139]. (...
30. Figure 3.30 Chip layout and floorplan of the accelerator chip with the virtual...
31. Figure 3.31 Comparison of learning performances achieved by the accelerators a...
32. Figure 3.32 Typical waveforms obtained in the cart‐pole balancing task with th...
33. Figure 3.33 Typical waveforms obtained in the beam‐balancing task w...
34. Figure 3.34 Typical waveforms obtained in the triple‐link inverted pendulum ta...
35. Figure 3.35 Comparison of the numbers of clock cycles needed for every critic/...
36. Figure 3.36 Comparison of the power consumption breakdown for the two accelera...
37. Figure 3.37 Comparison of the energy consumption for every critic/actor update...
Chapter 4
1. Figure 4.1 Schematic of the equivalent circuit of (a) the Hodgkin‐Huxley model...
2. Figure 4.2 Illustration of two spike trains. Depending on the encoding and dec...
3. Figure 4.3 Comparison between spiking neurons and non‐spiking neurons. A non‐s...
4. Figure 4.4 Illustration of the ReSuMe learning rule. The synaptic weight is up...
5. Figure 4.5 Illustration of the tempotron learning rule. (a) Two spike patterns...
6. Figure 4.6 Illustration of a typical STDP protocol. The change in the synaptic...
7. Figure 4.7 Illustration of a few examples of A ₊(w) that can be used to lim...
8. Figure 4.8 Illustration of a two‐layer neural network, where x _i(t) is the pres...
9. Figure 4.9 Illustration of the two regions divided by the spike timing of the ...
10. Figure 4.10 Gradients obtained from STDP and numerical simulation as the input...
11. Figure 4.11 Gradients obtained from STDP and numerical simulation as the weigh...
12. Figure 4.12 Illustration of the noise down‐mixing process. Quantization noises...
13. Figure 4.13 Illustration of the decimation process. Down‐sampling is taken pla...
14. Figure 4.14 Configuration of the one‐dimensional test case. An agent is moving...
15. Figure 4.15 Comparison between the output of the critic and the analytical res...
16. Figure 4.16 Comparison of the output of the critic network when filters of dif...
17. Figure 4.17 Illustration of three different ways of injecting noise into neuro...
18. Figure 4.19 Comparison of the RMSEs obtained with a triangular kernel and thre...
19. Figure 4.18 Comparison of the RMSEs obtained with a Gaussian kernel and three ...
20. Figure 4.20 Magnitudes of correlations obtained from the simulation. The corre...
21. Figure 4.21 Actor neurons employed in the maze problem. Action a _k and are mu...
22. Figure 4.23 State‐value function and preferred actions outputted by the critic...
23. Figure 4.22 State‐value function and preferred actions outputted by the critic...
24. Figure 4.24 Accumulated reward obtained and time it takes for the agent to rea...
25. Figure 4.25 Accumulated reward obtained and time it takes for the agent to rea...
26. Figure 4.26 The configuration of an HMAX‐like architecture. Two alternate conv...
27. Figure 4.27 Illustration of a multilayer neural network. A neuron located at t...
28. Figure 4.28 Illustration of the two regions divided by the spike timing of a p...
29. Figure 4.29 Comparison of the gradients obtained from the numerical simulation...
30. Figure 4.30 Comparison of the gradients obtained from the numerical...
31. Figure 4.31 Comparison of the gradients obtained from the numerical simulation...
32. Figure 4.32 Correlations between the estimated gradients and the gradients obt...
33. Figure 4.33 Correlations between the estimated gradients and the gradients obt...
34. Figure 4.34 Comparison of the training and test correct rates achieved with di...
35. Figure 4.35 Comparison of the training and test correct rates achieved with di...
36. Figure 4.36 Comparison of the training and test correct rates achieved with th...
37. Figure 4.37 Comparison of the test correct rates achieved with different learn...
38. Figure 4.38 Comparison of the test correct rates achieved with the two differe...
Chapter 5
1. Figure 5.1 Illustration of the AER that is widely used in hardware SNNs. Spars...
2. Figure 5.2 Comparison between the conventional frame‐based computing and the e...
3. Figure 5.3 Block diagram illustrating the concept of an event‐triggered comput...
4. Figure 5.4 Comparison between the output of an ANN and the output from an SNN....
5. Figure 5.5 Comparison of three inference strategies: (a) with no reduced margi...
6. Figure 5.6 Comparison of the output of the SNN when different images are prese...
7. Figure 5.7 The recognition accuracy on the test set for different levels of re...
8. Figure 5.8 Comparison of the effective inference duration needed for classific...
9. Figure 5.9 Examples of the digits that (a) meet the early‐termination criteria...
10. Figure 5.10 The recognition accuracy on the test set with the first‐to‐spike‐K...
11. Figure 5.11 Illustration of an example of the centralized memory architecture....
12. Figure 5.12 An example of the distributed memory architecture‐based neuromorph...
13. Figure 5.14 An example of the timing diagram of the proposed learning algorith...
14. Figure 5.13 Percentage of the active (non‐zero) STDP information and cross‐l...
15. Figure 5.15 Illustration of a memristor‐based synapse sandwiched by two neuron...
16. Figure 5.16 Comparison of the classification accuracy when different levels of...
17. Figure 5.17 High‐level overview of key components in a SpiNNaker node. 18 ARM ...
18. Figure 5.18 Overview of the TrueNorth chip. Each chip (leftmost) contains 64 ×...
19. Figure 5.19 Schematic of the neuron implemented in TrueNorth [13]. Five major ...
20. Figure 5.20 A hardware architecture for the STDP learning rule‐based actor–cri...
21. Figure 5.21 An example of a timing diagram for the proposed hardware architect...
22. Figure 5.22 Comparison of the RMSE of the critic network obtained when exact d...
23. Figure 5.23 Comparison of the learning performance achieved with multiplicativ...
24. Figure 5.24 The content of each memory used in the system. Memory A stores w‐b...
25. Figure 5.25 Comparison of the RMSE of the critic network versus the number of ...
26. Figure 5.26 Comparison of the original MNIST images and the down‐sampled MNIST...
27. Figure 5.27 Effect of shortening bit width of synaptic weights on classificati...
28. Figure 5.28 Comparison of the recognition rate achieved with the proposed simp...
29. Figure 5.29 Schematic of one layer of neural network. The input to the network...
30. Figure 5.30 Illustration of the sparsity of the employed neural network over 1...
31. Figure 5.31 Diagram of the cache structure employed to store active ST informa...
32. Figure 5.32 Illustration of the concept of utilizing local buffers to speed up...
33. Figure 5.33 Schematic of the background ST information update. Through a prior...
34. Figure 5.34 State diagram of the background ST updating scheduler. The schedul...
35. Figure 5.35 Comparison of the number of clock cycles per forward pass for diff...
36. Figure 5.36 Chip layout of the CMOS implementation.
37. Figure 5.37 Time and energy needed per inference as a function of the inferenc...
38. Figure 5.38 Diagram of a typical analog LIF neuron. Input currents are summed ...
39. Figure 5.39 An example of an analog synapse implementing a typical STDP protoc...
40. Figure 5.40 Overview of the CAVIAR system. (a) Abstract view of the system arc...
41. Figure 5.41 Illustration of how a time‐varying analog signal can be encoded in...
42. Figure 5.42 How the outputs from a presynaptic and a postsynaptic neuron can h...
43. Figure 5.43 Different training and utilization models for analog/digital neura...
44. Figure 5.44 Illustration of a multilayer memristor crossbar‐based spiking neur...
45. Figure 5.45 Comparison of the learning performance achieved with and without e...
46. Figure 5.46 Illustration of the fixed‐polarity memristor crossbar‐based neural...
47. Figure 5.47 Comparison of the classification correct rate for several network ...
48. Figure 5.48 An example of programming the memristor crossbar during learning. ...
49. Figure 5.49 Comparison of the updating schedules for the cumulative and the in...
50. Figure 5.50 Comparison of the learning performance obtained with the hardcoded...
51. Figure 5.51 Illustration of different conductance‐changing characteristics. Th...
52. Figure 5.52 Comparison of the learning performances with different conductance...
53. Figure 5.53 Comparison of the test accuracies achieved with different programm...
54. Figure 5.54 An example of the analog neuron that can be employed in a memristo...
55. Figure 5.55 Comparison of the test accuracy obtained with different levels of ...
56. Figure 5.56 Comparison of the test accuracy obtained with different levels of ...
57. Figure 5.57 Comparison of the test accuracies achieved with different non‐idea...
58. Figure 5.58 Illustration of the weight‐changing characteristics of the memrist...
59. Figure 5.59 Comparison of the test accuracies obtained with different hyperpar...
60. Figure 5.60 The obtained test accuracy as λ increases. The learning perfo...
61. Figure 5.61 The obtained sum of the absolute synaptic weights in the neural ne...
Chapter 6
1. Figure 6.1 Implementation of reinforcement learning in the cortico‐basal gangl...
2. Figure 6.2 Comparison of (a) the area and (b) the power consumption of analog ...
1
1. Figure A.1 An example of generating a compressed defect pattern from a memory ...
2. Figure A.2 (a) An example of a faulty pattern. (b) The corresponding synaptic ...
3. Figure A.3 (a) Performance summary of the four repair methods when the size of...
4. Figure A.4 Schematic of the neural network for memory repair. The weight matri...
5. Figure A.5 Schematic of a neural network for embedded memory. A built‐in memor...
6. Figure A.6 Configuration of the memristor‐based self‐repair circuit for memris...

Copyright

This edition first published 2020

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Nan Zheng and Pinaki Mazumder to be identified as the authors of this work has been asserted in accordance with law.

Registered Offices

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Zheng, Nan, 1989‐ author. | Mazumder, Pinaki, author.

Title: Learning in energy‐efficient neuromorphic computing : algorithm and

architecture co‐design / Nan Zheng, Pinaki Mazumder.

Description: Hoboken, NJ : Wiley‐IEEE Press, [2020] | Includes

bibliographical references and index.

Identifiers: LCCN 2019029946 (print) | LCCN 2019029947 (ebook) | ISBN

9781119507383 (cloth) | ISBN 9781119507390 (adobe pdf) | ISBN

9781119507406 (epub)

Subjects: LCSH: Neural networks (Computer science)

Classification: LCC QA76.87 .Z4757 2019 (print) | LCC QA76.87 (ebook) |

DDC 006.3/2–dc23

LC record available at https://lccn.loc.gov/2019029946

LC ebook record available at https://lccn.loc.gov/2019029947

Cover design: Wiley

Cover image: © monsitj/Getty Images

Preface

In 1987, when I was wrapping up my doctoral thesis at the University of Illinois, I had a rare opportunity to listen to Prof. John Hopfield of the California Institute of Technology describing his groundbreaking research in neural networks to spellbound students in the Loomis Laboratory of Physics at Urbana‐Champaign. He didactically described how to design and fabricate a recurrent neural network chip to rapidly solve the benchmark Traveling Salesman Problem (TSP), which is provably NP‐complete in the sense that no physical computer could solve the problem in asymptotically bounded polynomial time as the number of cities in the TSP increases to a very large number.

This discovery of algorithmic hardware to solve intractable combinatorics problems was a major milestone in the field of neural networks as the prior art of perceptron‐type feedforward neural networks could merely classify a limited set of simple patterns. Though, the founder of neural computing, Prof. Frank Rosenblatt of Cornel University had built a Mark 1 Perceptron computer in the late 1950s when the first waves of digital computers such as IBM 650 were just commercialized. Subsequent advancements in neural hardware designs were stymied mainly because of lack of integration capability of large synaptic networks by using the then technology, comprising vacuum tubes, relays, and passive components such as resistors, capacitors, and inductors. Therefore, in 1985, when AT&T Bell Labs fabricated the first solid‐state proof‐of‐concept TSP chip by using MOS technology to verify Prof. John Hopfield's neural net architecture, it opened the vista for solving non‐Boolean and brain‐like computing on silicon.

Prof. John Hopfield's seminal work established that if the “objective function” of a combinatorial algorithm can be expressed in quadratic form, the synaptic links in a recurrent artificial neural network could be accordingly programmed to reduce (i.e. locally minimize) the value of the objective function through massive interactions between the constituent neurons. Hopfield's neural network consists of laterally connected neurons that can be randomly initialized and then the network can iteratively reduce the intrinsic Lyapunov energy function of the network to reach a local minima state. Notably, the Lyapunov function decreases in a monotone fashion under the dynamics of the recurrent neural networks, where neurons are not provided with self‐feedback.1

Prof. Hopfield used a combination of four separate quadratic functions to represent the objective function of the TSP. The first part of the objective function ensures that the energy function minimizes if the traveling salesman traverses cities exactly once, the second part ensures that the traveling salesman visits all cities in the itinerary, the third part ensures that no two cities are visited simultaneously, and the fourth part of the quadratic function is designed to determine the shortest route connecting all cities in the TSP. Because of massive simultaneous interactions between neurons through the connecting synapses that are precisely adjusted to meet the constraints in the above quadratic functions, a simple recurrent neural network could rapidly generate a very good quality solution. However, unlike well‐tested software procedures such as simulated annealing, dynamic programming, and the branch‐and‐bound algorithm, neural networks generally fail to find the best solution because of their simplistic connectionist structures.

Therefore, after listening to Prof. Hopfield's fascinating talk I harbored a mixed feeling about the potential benefit of his innovation. On the one hand, I was thrilled to learn from his lecture how computationally hard algorithmic problems could be solved very quickly by using simple neuromorphic CMOS circuits having very small hardware overheads. On the other hand, I thought that the TSP application that Prof. Hopfield selected to demonstrate the ability of neural networks to solve combinatorial optimization problems was not the right candidate, as software algorithms are well crafted to obtain nearly the best solution that the neural networks can hardly match. I started contemplating developing self‐healing VLSI chips where the power of neural‐inspired self‐repair algorithms could be used to automatically restructure faulty VLSI chips. Low overheads and the ability to solve a problem concurrently through parallel interactions between neurons are two salient features that I thought could be elegantly deployed for automatically repairing VLSI chips by built‐in neural net circuitry.

Soon after I joined the University of Michigan as an assistant professor, working with one of my doctoral students [2], and, at first, I developed a CMOS analog neural net circuitry with asynchronous state updates, which lacked robustness due to process variation within a die. In order to improve the reliability of the self‐repair circuitry, an MS student [3] and I designed a digital neural net circuitry with synchronous state updates. These neural circuits were designed to repair VLSI chips by formulating the repair problem in terms of finding the node cover, edge cover, or node pair matching in a bipartite graph. In our graph formalism, one set of vertices in the bipartite graph represented the faulty circuit elements, and the other set of vertices represented the spare circuit elements. In order to restructure a faulty VLSI chip into a fault‐free operational chip, the spare circuit elements were automatically invoked through programmable switching elements after identifying the faulty elements through embedded built‐in self‐testing circuitry.

Most importantly, like the TSP problem, the two‐dimensional array repair can be shown to be an NP‐complete problem because the repair algorithm seeks the optimal number of spare rows and spare columns that can be assigned to bypass faulty components such as memory cells, word‐line and bit‐line drivers, and sense amplifier bands located inside the memory array. Therefore, simple digital circuits comprising counters and other blocks woefully fail to solve such intractable self‐repair problems. Notably, one cannot use external digital computers to determine how to repair embedded arrays, as input and output pins of the VLSI chip cannot be deployed to access the fault patterns in the deeply embedded arrays.

In 1989 and 1992, I received two NSF grants to expand the neuromorphic self‐healing design styles to a wider class of embedded VLSI modules such as memory array [4], processors array [5], programmable logic array, and so on [6]. However, this approach to improving VLSI chip yield by built‐in self‐testing and self‐repair was a bit ahead of its time as the state‐of‐the‐art microprocessors in the early 1990s contained only a few hundred thousands of transistors as well as the submicron CMOS technology that was relatively robust. Therefore, after developing the neural‐net based self‐healing VLSI chip design methodology for various types of embedded circuit blocks, I stopped working on CMOS neural networks. I was not particularly interested in pursuing applications of neural networks for other types of engineering problems, as I wanted to remain focused on solving emerging problems in VLSI research.

On the other hand, in the late 1980s there were mounting concerns among CMOS technology prognosticators about the impending red brick wall heralding the end of the shrinking era in CMOS. Therefore, to promote several types of emerging technologies that might push the frontier of VLSI technology, the Defense Advanced Research Projects Agency (DARPA) in the USA had initiated (around 1990) the Ultra Electronics: Ultra Dense, Ultra Fast Computing Components Research Program. Concurrently, the Ministry of International Trade & Industry (MITI) in Japan had launched the Quantum Functional Devices (QFD) Project. Early successes with a plethora of innovative non‐CMOS technologies in both research programs led to the launching of the National Nanotechnology Initiative (NNI), which is a U.S. Government research and development (R&D) initiative, involving 20 departments and independent agencies to bring about revolution in nanotechnology to impact the industry and society at large.

During the period of 1995 and 2010, my research group had at first focused on a quantum physics based device and circuit modeling for quantum tunneling devices, and then we extensively worked on cellular neural network (CNN) circuits for image and video processing by using one‐dimensional (double barrier resonant tunneling device), two‐dimensional (self‐assembled nanowire), and three‐dimensional (quantum dot array) constrained quantum devices. Subsequently, we developed learning‐based neural network circuits by using resistive synaptic devices (commonly known as memristors) and CMOS neurons. We also developed analog voltage programmable nanocomputing architectures by hybridizing quantum tunneling and memristive devices in computing nodes of a two‐dimensional processing element (PE) ensemble. Our research on nanoscale neuromorphic circuits will soon be published in our new book, titled: Neuromorphic Circuits for Nanoscale Devices, River Publishing, U.K., 2019.

After spending a little over a decade developing neuromorphic circuits with various types of emerging nanoscale electronic and spintronic devices, I decided to embark on research on learning‐based digital VLSI neuromorphic chips using nanoscale CMOS technology in both subthreshold and superthreshold modes of operation. My student and coauthor of this book, Dr. Nan Zheng, conducted his doctoral dissertation work on architectures and algorithms for digital neural networks. We started our design from both machine learning and biological learning perspectives to design and fabricate energy‐efficient VLSI chips using TSMC 65 nm CMOS technology.

We captured the actor‐critic type reinforcement learning (RL) [7] and an example of temporal difference (TD) learning with off‐line policy updating, called Q‐learning [8] on VLSI chips from the machine learning perspectives. Further, we captured spiking correlation‐based synaptic plasticity commonly used in biological unsupervised learning applications. We also formulated hardware‐friendly spike‐timing‐dependent plasticity (STDP) learning rules [9], which achieved classification rates of 97.2% and 97.8% for the one‐hidden‐layer and two‐hidden‐layer neural networks, respectively, on the Modified National Institute of Standards and Technology (MNIST) database benchmark. The hardware‐friendly learning rule enabled both energy‐efficient hardware design [10] as well as implementations that were robust to the process‐voltage‐temperature (PVT) variations associated with chip manufacturing [11]. We demonstrated that the hardware accelerator VLSI chip for the actor‐critic network solved some control‐theoretic benchmark problems by emulating the adaptive dynamic programming (ADP), which is at the heart of the RL software program. However, compared with traditional software RL running on a general‐purpose processor, the VLSI chip accelerator operating at 175 MHz achieves two orders of magnitude improvement in computational time while consuming merely 25 mW [12].

The chip layout diagrams included in the Preface contain a sample of numerous digital neural network chips using CMOS technology that my research group has designed over the course of the last 35 years. On the left column: a self‐healing chip was designed in 1991 to repair faulty VLSI memory arrays automatically by running node‐covering algorithms on a bipartite graph representing the set of faulty components and the available spare circuit elements. The STDP chip was designed in 2013 for controlling the motion of a virtual insect from an initial source to the selected destination by avoiding collisions while navigating through a set of arbitrarily shaped blocked spaces. A deep learning chip described in the previous paragraph was designed in 2016.

On the right column is shown the RL chip described in the above paragraph and designed in 2016. Also included on the right column are two ultra‐low‐power (ULP) CMOS chips, which were biased in the subthreshold mode for wearable applications in health care. In one application, Kohonen's self‐organizing map (SOM) of neural networks was implemented to classify electrocardiogram (ECG) waveforms, while the body‐sensing network with a wireless transceiver was designed to sense analog neuronal signals by using an implantable multielectrode sensor and to provide the digitized data through a built‐in wake‐up transceiver to doctors who could monitor the efficacy of drugs at the neuronal and synaptic levels in brain‐related diseases such as schizophrenia, chronic depression, Alzheimer disease, and so on.

Initially, when we decided to publish a monograph highlighting our work in the form of CMOS neuromorphic chips for brain‐like computing, we wanted to aggregate various results of the cited papers in the Preface to compose the contents of the book. However, in the course of preparation of the manuscript, we modified our initial narrow goal, as it would be rather limiting to adopt the book in a regular course for teaching undergraduate and graduate students about the latest generation neural networks with learning capabilities.

Instead, we decided to write a comprehensive book on energy‐efficient hardware design for neural networks with various types of learning capability by discussing expansive ongoing research in neural hardware. This is evidently a Herculean task warranting mulling through hundreds of archival sources of references and describing co‐design and co‐optimization methodologies for building hardware neural networks that can learn to perform various tasks. We attempted to provide a comprehensive perspective, from high‐level algorithms to low‐level implementation details by covering many fundamentals and essentials in neural networks (e.g. deep learning), as well as hardware implementation of neural networks. In a nutshell, the present version of the book has the following salient features:

It includes a cross‐layer survey of hardware accelerators for neuromorphic algorithms;
It covers the co‐design of architecture and algorithms with emerging devices for much‐improved computing efficiency;
It focuses on the co‐design of algorithms and hardware, which is paramount for deploying emerging devices such as traditional memristors or diffusive memristors for neuromorphic computing.

Finally, due to the stringent time constraint to complete this book in conjunction with the commitment to concurrently finish the complementary book (Neuromorphic Circuits for Nanoscale Devices, River Publishing, U.K., 2019), the present version of the book has been completed without describing the didactic materials pedagogically as expected in a textbook along with exercise problems at the end of each chapter. Hopefully, those goals will be achieved in the next edition of the book after gathering valuable feedback from students, instructors, practicing engineers, and other readers. I shall truly appreciate it if you give me such guiding feedback, both positive and negative, which will enable me to prepare the Second Edition of the book. My contact information is included below for your convenience.

Pinaki Mazumder February 14, 2019

Address:

4765 BBB Building

Division of Computer Science and Engineering Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, MI 48109–2122; Ph: 734‐763‐2107; E‐mail: mazum@eecs.umich.edu, pinakimazum@gmail.com

Website: http://www.eecs.umich.edu/∼mazum

References

1 Mazumder, P. and Yih, J. (1993). A new built‐in self‐repair approach to VLSI memory yield enhancement by using neural‐type circuits. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 12 (1): 124–136.
2 Mazumder, P. and Yih, J. (1989). Fault‐diagnosis and self‐repairing of embedded memories by using electronic neural network. In: Proc. of IEEE 19th Fault‐Tolerant Computing Symposium, 270–277. Chicago.
3 Smith, M.D. and Mazumder, P. (1996). Analysis and design of Hopfield‐type network for built‐in self‐repair of memories. IEEE Trans. Comput. 45 (1): 109–115.
4 Mazumder, P. and Yih, J. (1990). Built‐in self‐repair techniques for yield enhancement of embedded memories. In: Proceedings of IEEE International Test Conference, 833–841.
5 Mazumder, P. and Yih, J. (1993). Restructuring of square processor arrays by built‐in self‐repair circuit. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 12 (9): 1255–1265.
6 Mazumder, P. (1992). An integrated built‐in self‐testing and self‐repair of VLSI/WSI hexagonal arrays. In: Proceedings of IEEE International Test Conference, 968–977.
7 Zheng, N. and Mazumder, P. (2017). Hardware‐friendly actor‐critic reinforcement learning through modulation of spiking‐timing dependent plasticity. IEEE Trans. Comput. 66 (2).
8 Ebong, I. and Mazumder, P. (2014). Iterative architecture for value iteration using memristors. In: IEEE Conference on Nanotechnology, Toronto, 967–970. Canada.
9 Zheng, N. and Mazumder, P. (2018). Online supervised learning for hardware‐based multilayer spiking neural networks through the modulation of weight‐dependent spike‐timing‐dependent plasticity. IEEE Trans. Neural Netw. Learn. Syst. 29 (9): 4287–4302.
10 Zheng, N. and Mazumder, P. (2018). A low‐power hardware architecture for on‐line supervised learning in multi‐layer spiking neural networks. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 1–5. Florence.
11 Zheng, N. and Mazumder, P. (2018). Learning in Memristor crossbar‐based spiking neural networks through modulation of weight dependent spike‐timing‐dependent plasticity. IEEE Trans. Nanotechnol. 17 (3): 520–532.
12 Zheng, N. and Mazumder, P. (2018). A scalable low‐power reconfigurable accelerator for action‐dependent heuristic dynamic programming. IEEE Trans. Circuits Syst. Regul. Pap. 65, 6: 1897–1908.

Learning in Energy-Efficient Neuromorphic Computing

Algorithm and Architecture Co-Design

Copyright

Dedication

Preface

References

Note

Acknowledgment