<p>Preface xviii</p> <p>Acknowledgements xxi</p> <p>Foreword xxii</p> <p>About the Companion Website xxiii</p> <p><b>1 Introduction to DSP 1</b></p> <p>1.1 Introduction 1</p> <p>1.2 Multicore processors 3</p> <p>1.2.1 Can any algorithm benefit from a multicore processor? 3</p> <p>1.2.2 How many cores do I need for my application? 5</p> <p>1.3 Key applications of high-performance multicore devices 6</p> <p>1.4 FPGAs, Multicore DSPs, GPUs and Multicore CPUs 8</p> <p>1.5 Challenges faced for programming a multicore processor 9</p> <p>1.6 Texas Instruments DSP roadmap 10</p> <p>1.7 Conclusion 11</p> <p>References 12</p> <p><b>2 The TMS320C66x architecture overview 14</b></p> <p>2.1 Overview 14</p> <p>2.2 The CPU 15</p> <p>2.2.1 Cross paths 16</p> <p>2.2.1.1 Data cross paths 17</p> <p>2.2.1.2 Address cross paths 18</p> <p>2.2.2 Register file A and file B 20</p> <p>2.2.2.1 Operands 20</p> <p>2.2.3 Functional units 21</p> <p>2.2.3.1 Condition registers 21</p> <p>2.2.3.2 .L units 22</p> <p>2.2.3.3 .M units 22</p> <p>2.2.3.4 .S units 23</p> <p>2.2.3.5 .D units 23</p> <p>2.3 Single instruction, multiple data (SIMD) instructions 24</p> <p>2.3.1 Control registers 24</p> <p>2.4 The KeyStone memory 24</p> <p>2.4.1 Using the internal memory 27</p> <p>2.4.2 Memory protection and extension 29</p> <p>2.4.3 Memory throughput 29</p> <p>2.5 Peripherals 30</p> <p>2.5.1 Navigator 32</p> <p>2.5.2 Enhanced Direct Memory Access (EDMA) Controller 32</p> <p>2.5.3 Universal Asynchronous Receiver/Transmitter (UART) 32</p> <p>2.5.4 General purpose input–output (GPIO) 32</p> <p>2.5.5 Internal timers 32</p> <p>2.6 Conclusion 33</p> <p>References 33</p> <p><b>3 Software development tools and the TMS320C6678 EVM 35</b></p> <p>3.1 Introduction 35</p> <p>3.2 Software development tools 37</p> <p>3.2.1 Compiler 38</p> <p>3.2.2 Assembler 39</p> <p>3.2.3 Linker 40</p> <p>3.2.3.1 Linker command file 40</p> <p>3.2.4 Compile, assemble and link 42</p> <p>3.2.5 Using the Real-Time Software Components (RTSC) tools 42</p> <p>3.2.5.1 Platform update using the XDCtools 42</p> <p>3.2.6 KeyStone Multicore Software Development Kit 47</p> <p>3.3 Hardware development tools 47</p> <p>3.3.1 EVM features 47</p> <p>3.4 Laboratory experiments based on the C6678 EVM: introduction to Code Composer Studio (CCS) 51</p> <p>3.4.1 Software and hardware requirements 51</p> <p>3.4.1.1 Key features 52</p> <p>3.4.1.2 Download sites 53</p> <p>3.4.2 Laboratory experiments with the CCS6 53</p> <p>3.4.2.1 Introduction to CCS 55</p> <p>3.4.2.2 Implementation of a DOTP algorithm 63</p> <p>3.4.3 Profiling using the clock 65</p> <p>3.4.4 Considerations when measuring time 67</p> <p>3.5 Loading different applications to different cores 67</p> <p>3.6 Conclusion 72</p> <p>References 72</p> <p><b>4 Numerical issues 74</b></p> <p>4.1 Introduction 74</p> <p>4.2 Fixed- and floating-point representations 75</p> <p>4.2.1 Fixed-point arithmetic 76</p> <p>4.2.1.1 Unsigned integer 76</p> <p>4.2.1.2 Signed integer 77</p> <p>4.2.1.3 Fractional numbers 77</p> <p>4.2.2 Floating-point arithmetic 78</p> <p>4.2.2.1 Special numbers for the 32-bit and 64-bit floating-point formats 81</p> <p>4.3 Dynamic range and accuracy 82</p> <p>4.4 Laboratory exercise 83</p> <p>4.5 Conclusion 85</p> <p>References 85</p> <p><b>5 Software optimisation 86</b></p> <p>5.1 Introduction 86</p> <p>5.2 Hindrance to software scalability for a multicore processor 88</p> <p>5.3 Single-core code optimisation procedure 88</p> <p>5.3.1 The C compiler options 90</p> <p>5.4 Interfacing C with intrinsics, linear assembly and assembly 91</p> <p>5.4.1 Intrinsics 91</p> <p>5.4.2 Interfacing C and assembly 92</p> <p>5.5 Assembly optimisation 97</p> <p>5.5.1 Parallel instructions 98</p> <p>5.5.2 Removing the NOPs 99</p> <p>5.5.3 Loop unrolling 99</p> <p>5.5.4 Double-Word Access 100</p> <p>5.5.5 Optimisation summary 100</p> <p>5.6 Software pipelining 101</p> <p>5.6.1 Software-pipelining procedure 105</p> <p>5.6.1.1 Writing linear assembly code 105</p> <p>5.6.1.2 Creating a dependency graph 105</p> <p>5.6.1.3 Resource allocation 108</p> <p>5.6.1.4 Scheduling table 108</p> <p>5.6.1.5 Generating assembly code 109</p> <p>5.7 Linear assembly 111</p> <p>5.7.1 Hand optimisation of the dotp function using linear assembly 112</p> <p>5.8 Avoiding memory banks 118</p> <p>5.9 Optimisation using the tools 118</p> <p>5.10 Laboratory experiments 123</p> <p>5.11 Conclusion 126</p> <p>References 126</p> <p><b>6 The TMS320C66x interrupts 127</b></p> <p>6.1 Introduction 127</p> <p>6.1.1 Chip-level interrupt controller 129</p> <p>6.2 The interrupt controller 135</p> <p>6.3 Laboratory experiment 140</p> <p>6.3.1 Experiment 1: Using the GIPIOs to trigger some functions 140</p> <p>6.3.2 Experiment 2: Using the console to trigger an interrupt 140</p> <p>6.4 Conclusion 143</p> <p>References 144</p> <p><b>7 Real-time operating system: TI-RTOS 145</b></p> <p>7.1 Introduction 146</p> <p>7.2 TI-RTOS 146</p> <p>7.3 Real-time scheduling 148</p> <p>7.3.1 Hardware interrupts (Hwis) 148</p> <p>7.3.1.1 Setting an Hwi 149</p> <p>7.3.1.2 Hwi hook functions 149</p> <p>7.3.2 Software interrupts (Swis), including clock, periodic or single-shot functions 155</p> <p>7.3.3 Tasks 155</p> <p>7.3.3.1 Task hook functions 157</p> <p>7.3.4 Idle functions 158</p> <p>7.3.5 Clock functions 158</p> <p>7.3.6 Timer functions 158</p> <p>7.3.7 Synchronisation 158</p> <p>7.3.7.1 Semaphores 159</p> <p>7.3.7.2 Semaphore_pend 159</p> <p>7.3.7.3 Semaphore_post 159</p> <p>7.3.7.4 How to configure the semaphores 159</p> <p>7.3.8 Events 159</p> <p>7.3.9 Summary 163</p> <p>7.4 Dynamic memory management 163</p> <p>7.4.1 Stack allocation 165</p> <p>7.4.2 Heap allocation 165</p> <p>7.4.3 Heap implementation 165</p> <p>7.4.3.1 HeapMin implementation 165</p> <p>7.4.3.2 HeapMem implementation 165</p> <p>7.4.3.3 HeapBuf implementation 167</p> <p>7.4.3.4 HeapMultiBuf implementation 171</p> <p>7.5 Laboratory experiments 172</p> <p>7.5.1 Lab 1: Manual setup of the clock (part 1) 172</p> <p>7.5.2 Lab 2: Manual setup of the clock (part 2) 172</p> <p>7.5.3 Lab 3: Using Hwis, Swis, tasks and clocks 174</p> <p>7.5.4 Lab 4: Using events 187</p> <p>7.5.5 Lab 5: Using the heaps 189</p> <p>7.6 Conclusion 190</p> <p>References 191</p> <p>References (further reading) 191</p> <p><b>8 Enhanced Direct Memory Access (EDMA3) controller 192</b></p> <p>8.1 Introduction 192</p> <p>8.2 Type of DMAs available 193</p> <p>8.3 EDMA controllers architecture 194</p> <p>8.3.1 The EDMA3 Channel Controller (EDMA3CC) 194</p> <p>8.3.2 The EDMA3 transfer controller (EDMA3TC) 201</p> <p>8.3.3 EDMA prioritisation 201</p> <p>8.3.3.1 Trigger source priority 202</p> <p>8.3.3.2 Channel priority 203</p> <p>8.3.3.3 Dequeue priority 203</p> <p>8.3.3.4 System (transfer controller) priority 203</p> <p>8.4 Parameter RAM (PaRAM) 203</p> <p>8.4.1 Channel options parameter (OPT) 203</p> <p>8.5 Transfer synchronisation dimensions 203</p> <p>8.5.1 A – Synchronisation 204</p> <p>8.5.2 AB – Synchronisation 204</p> <p>8.6 Simple EDMA transfer 204</p> <p>8.7 Chaining EDMA transfers 208</p> <p>8.8 Linked EDMAs 208</p> <p>8.9 Laboratory experiments 210</p> <p>8.9.1 Laboratory 1: Simple EDMA transfer 211</p> <p>8.9.2 Laboratory 2: EDMA chaining transfer 211</p> <p>8.9.3 Laboratory 3: EDMA link transfer 213</p> <p>8.10 Conclusion 213</p> <p>References 213</p> <p><b>9 Inter-Processor Communication (IPC) 214</b></p> <p>9.1 Introduction 215</p> <p>9.2 Texas Instruments IPC 217</p> <p>9.3 Notify module 219</p> <p>9.3.1 Laboratory experiment 222</p> <p>9.4 MessageQ 222</p> <p>9.4.1 MessageQ protocol 224</p> <p>9.4.2 Message priority 229</p> <p>9.4.3 Thread synchronisation 229</p> <p>9.5 ListMP module 233</p> <p>9.6 GateMP module 234</p> <p>9.6.1 Initialising a GateMP parameter structure 234</p> <p>9.6.1.1 Types of gate protection 235</p> <p>9.6.2 Creating a GateMP instance 236</p> <p>9.6.3 Entering a GateMP 236</p> <p>9.6.4 Leaving a gate 236</p> <p>9.6.5 The list of functions that can be used by GateMP 237</p> <p>9.7 Multi-processor Memory Allocation: HeapBufMP, HeapMemMP and HeapMultiBufMP 237</p> <p>9.7.1 HeapBuf_Params 238</p> <p>9.7.2 HeapMem_Params 239</p> <p>9.7.3 HeapMultiBuf_Params 239</p> <p>9.7.4 Configuration example for HeapMultiBuf 239</p> <p>9.8 Transport mechanisms for the IPC 241</p> <p>9.9 Laboratory experiments with KeyStone I 241</p> <p>9.9.1 Laboratory 1: Using MessageQ with multiple cores 241</p> <p>9.9.1.1 Overview 242</p> <p>9.9.2 Laboratory 2: Using ListMP, ShareRegion and GateMP 243</p> <p>9.10 Laboratory experiments with KeyStone II 249</p> <p>9.10.1 Laboratory experiment 1: Transferring a block of data 249</p> <p>9.10.1.1 Set the connection between the host (PC) and the KeyStone 249</p> <p>9.10.1.2 Explore the ARM code 250</p> <p>9.10.1.3 Explore the DSP code 259</p> <p>9.10.1.4 Compile and run the program 263</p> <p>9.10.2 Laboratory experiment 2: Transferring a pointer 267</p> <p>9.10.2.1 Explore the ARM code 267</p> <p>9.10.2.2 Explore the DSP code 271</p> <p>9.10.2.3 Compile and run the program 278</p> <p>9.11 Conclusion 278</p> <p>References 278</p> <p><b>10 Single and multicore debugging 280</b></p> <p>10.1 Introduction 281</p> <p>10.2 Software and hardware debugging 282</p> <p>10.3 Debug architecture 282</p> <p>10.3.1 Trace 282</p> <p>10.3.1.1 Standard trace 282</p> <p>10.3.1.2 Event trace 283</p> <p>10.3.1.3 System trace 285</p> <p>10.4 Advanced Event Triggering 286</p> <p>10.4.1 Advanced Event Triggering logic 289</p> <p>10.4.2 Unified Breakpoint Manager 294</p> <p>10.5 Unified Instrumentation Architecture 295</p> <p>10.5.1 Host-side tooling 295</p> <p>10.5.2 Target-side tooling 295</p> <p>10.5.2.1 Software instrumentation APIs 297</p> <p>10.5.2.2 Predefined software events and metadata 297</p> <p>10.5.2.3 Event loggers 297</p> <p>10.5.2.4 Transports 297</p> <p>10.5.2.5 SYS/BIOS event capture and transport 297</p> <p>10.5.2.6 Multicore support 297</p> <p>10.6 Debugging with the System Analyzer tools 298</p> <p>10.6.1 Target-side coding with UIA APIs and the XDCtools 299</p> <p>10.6.2 Logging events with Log_write() functions 300</p> <p>10.6.3 Advance debugging using the diagnostic feature 301</p> <p>10.6.4 LogSnapshot APIs for logging state information 302</p> <p>10.7 Instrumentation with TI-RTOS and CCS 302</p> <p>10.7.1 Using RTOS Object Viewer 302</p> <p>10.7.2 Using the RTOS Analyzer and the System Analyzer 303</p> <p>10.7.2.1 RTOS Analyzer 303</p> <p>10.7.2.2 System Analyzer 303</p> <p>10.8 Laboratory sessions 305</p> <p>10.8.1 Laboratory experiment 1: Using the RTOS ROV 305</p> <p>10.8.2 Laboratory experiment 2: Using the RTOS Analyzer 305</p> <p>10.8.3 Laboratory experiment 3: Using the System Analyzer 312</p> <p>10.8.4 Laboratory experiment 4: Using diagnosis features 314</p> <p>10.8.5 Laboratory experiment 5: Using a diagnostic feature with filtering 317</p> <p>10.9 Conclusion 321</p> <p>References 322</p> <p>Further reading 323</p> <p><b>11 Bootloader for KeyStone I and KeyStone II 324</b></p> <p>11.1 Introduction 324</p> <p>11.2 How to start the boot process 325</p> <p>11.3 The boot process 325</p> <p>11.4 ROM Bootloader (RBL) 328</p> <p>11.4.1 The boot configuration format 336</p> <p>11.4.1.1 Creating the boot parameter table 336</p> <p>11.4.1.2 Creating the boot table 338</p> <p>11.4.1.3 The boot configuration table 338</p> <p>11.5 Boot process 340</p> <p>11.5.1 Initialisation stage for the KeyStone I 340</p> <p>11.5.2 Second-level bootloader 341</p> <p>11.5.2.1 Intermediate bootloader 341</p> <p>11.5.2.2 How to use the IBL 342</p> <p>11.6 Laboratory experiment 1 345</p> <p>11.6.1 Initialisation stage for the KeyStone II 350</p> <p>11.6.1.1 Bootloader initialisation after power-on reset 350</p> <p>11.6.1.2 Bootloader initialisation process after hard or soft reset 350</p> <p>11.6.2 Second bootloader for the KeyStone II 350</p> <p>11.6.2.1 U-Boot 351</p> <p>11.7 Laboratory experiment 2 352</p> <p>11.7.1 Printing the U-Boot environment 360</p> <p>11.7.2 Using the help for U-Boot 362</p> <p>11.8 TFTP boot with a host-mounted Network File System (NFS) server – NFS booting 363</p> <p>11.8.1 Laboratory experiment 3 364</p> <p>11.9 Conclusion 372</p> <p>References 372</p> <p><b>12 Introduction to OpenMP 374</b></p> <p>12.1 Introduction to OpenMP 375</p> <p>12.2 Directive formats 376</p> <p>12.3 Forking region 377</p> <p>12.3.1 omp parallel – parallel region construct 377</p> <p>12.3.1.1 Clause descriptions 378</p> <p>12.4 Work-sharing constructs 382</p> <p>12.4.1 omp for 382</p> <p>12.4.1.1 OpenMP loop scheduling 383</p> <p>12.4.2 omp sections 385</p> <p>12.4.3 omp single 386</p> <p>12.4.4 omp master 386</p> <p>12.4.5 omp task 387</p> <p>12.5 Environment variables and library functions 390</p> <p>12.6 Synchronisation constructs 392</p> <p>12.6.1 atomic 393</p> <p>12.6.1.1 Clauses 393</p> <p>12.6.2 barrier 395</p> <p>12.6.3 critical 396</p> <p>12.7 OpenMP accelerator model 397</p> <p>12.7.1 Supported OpenMP device constructs 397</p> <p>12.7.1.1 #pragma omp target 397</p> <p>12.7.1.2 #pragma omp target data 399</p> <p>12.7.1.3 #pragma omp target update 400</p> <p>12.7.1.4 #pragma omp declare target 401</p> <p>12.8 Laboratory experiments 402</p> <p>12.8.1 Laboratory experiment 1 402</p> <p>12.8.2 Laboratory experiment 2 402</p> <p>12.8.3 Laboratory experiment 3 404</p> <p>12.8.4 Laboratory experiment 4 405</p> <p>12.8.5 Laboratory experiment 5 405</p> <p>12.9 Conclusion 417</p> <p>References 419</p> <p><b>13 Introduction to OpenCL for the KeyStone II 420</b></p> <p>13.1 Introduction 421</p> <p>13.2 Operation of OpenCL 421</p> <p>13.3 Command queue 424</p> <p>13.3.1 Creating a command queue 427</p> <p>13.3.1.1 Command-queue properties 429</p> <p>13.3.2 Enqueueing a kernel 430</p> <p>13.4 Kernel declaration 431</p> <p>13.5 How do the kernels access data? 431</p> <p>13.6 OpenCL memory model for the KeyStone 432</p> <p>13.6.1 Creating a buffer 433</p> <p>13.6.1.1 Cl_mem_flags 434</p> <p>13.7 Synchronisation 435</p> <p>13.7.1 Event with a callback function 436</p> <p>13.7.2 User event 439</p> <p>13.7.3 Waiting for one command or all commands to finish 439</p> <p>13.7.4 wait_group_events 440</p> <p>13.7.5 Barrier 440</p> <p>13.8 Basic debugging profiling 440</p> <p>13.9 OpenMP dispatch from OpenCL 443</p> <p>13.9.1 OpenMP for the kernel code 443</p> <p>13.9.2 OpenMP for the ARM code 443</p> <p>13.10 Building the OpenCL project 444</p> <p>13.11 Laboratory experiments 445</p> <p>13.11.1 Laboratory experiment 1: Hello World 446</p> <p>13.11.2 Laboratory experiment 2: dotp functions 454</p> <p>13.11.2.1 Explore the main.cpp function 454</p> <p>13.11.2.2 Explore the kernel dotp.cl 459</p> <p>13.11.2.3 Run the dotp program 460</p> <p>13.11.3 Laboratory experiment 3: USE_HOST_PTR 460</p> <p>13.11.4 Laboratory experiment 4: ALLOC_HOST_PTR 463</p> <p>13.11.5 Laboratory experiment 5: COPY_HOST_PTR 465</p> <p>13.11.6 Laboratory experiment 6: Synchronisation 467</p> <p>13.11.7 Laboratory experiment 7: Local buffer 473</p> <p>13.11.8 Laboratory experiment 8: Barrier 477</p> <p>13.11.9 Laboratory experiment 9: Profiling 479</p> <p>13.11.10 Laboratory experiment 10: OpenMP in kernel 484</p> <p>13.11.11 Laboratory experiment 11: OpenMP in ARM 487</p> <p>13.12 Conclusion 489</p> <p>References 490</p> <p><b>14 Multicore Navigator 491</b></p> <p>14.1 Introduction 491</p> <p>14.2 Navigator architecture 492</p> <p>14.2.1 The PKDMA 494</p> <p>14.2.1.1 PKDMA transmit side 495</p> <p>14.2.1.2 PKDMA receive side 495</p> <p>14.2.1.3 Infrastructure PKDMA 497</p> <p>14.2.2 Descriptors 497</p> <p>14.2.2.1 Host packet descriptors 498</p> <p>14.2.2.2 Monolithic packet descriptor 498</p> <p>14.2.2.3 Setting up the memory regions for the descriptors 498</p> <p>14.2.3 Queue Manager Subsystem 500</p> <p>14.2.4 Queue Manager 503</p> <p>14.2.4.1 Queue peek registers 503</p> <p>14.2.4.2 Link RAM 504</p> <p>14.2.5 Accumulator packet data structure processors 504</p> <p>14.2.5.1 Accumulation 506</p> <p>14.2.5.2 Quality of service 506</p> <p>14.2.5.3 Event management (resource sharing and job load balancing) 506</p> <p>14.2.6 Interrupt distributor module 506</p> <p>14.3 Complete functionality of the Navigator 506</p> <p>14.4 Laboratory experiment 511</p> <p>14.5 Conclusion 513</p> <p>References 514</p> <p><b>15 FIR filter implementation 515</b></p> <p>15.1 Introduction 515</p> <p>15.2 Properties of an FIR filter 516</p> <p>15.2.1 Filter coefficients 516</p> <p>15.2.2 Frequency response of an FIR filter 516</p> <p>15.2.3 Phase linearity of an FIR filter 517</p> <p>15.3 Design procedure 518</p> <p>15.3.1 Specifications 518</p> <p>15.3.2 Coefficients calculation 519</p> <p>15.3.2.1 Window method 519</p> <p>15.3.3 Realisation structure 522</p> <p>15.3.3.1 Direct structure 525</p> <p>15.3.3.2 Linear phase structures 525</p> <p>15.3.3.3 Cascade structures 527</p> <p>15.4 Laboratory experiments 528</p> <p>15.4.1 Filter implementation 529</p> <p>15.4.2 Synchronisation 530</p> <p>15.4.3 Building and running the DSP project 532</p> <p>15.4.4 Building and running the PC project 534</p> <p>15.5 Conclusion 540</p> <p>References 540</p> <p><b>16 IIR filter implementation 542</b></p> <p>16.1 Introduction 542</p> <p>16.2 Design procedure 543</p> <p>16.3 Coefficients calculation 543</p> <p>16.3.1 Pole–zero placement approach 543</p> <p>16.3.2 Analogue-to-digital filter design 543</p> <p>16.3.3 Bilinear transform (BZT) method 544</p> <p>16.3.3.1 Practical example of the bilinear transform method 547</p> <p>16.3.3.2 Coefficients calculation 547</p> <p>16.3.3.3 Realisation structures 548</p> <p>16.3.4 Impulse invariant method 552</p> <p>16.3.4.1 Practical example of the impulse invariant method 553</p> <p>16.4 IIR filter implementation 556</p> <p>16.5 Laboratory experiment 561</p> <p>16.6 Conclusion 561</p> <p>Reference 562</p> <p><b>17 Adaptive filter implementation 563</b></p> <p>17.1 Introduction 563</p> <p>17.2 Mean square error 564</p> <p>17.3 Least mean square 565</p> <p>17.4 Implementation of an adaptive filter using the LMS algorithm 565</p> <p>17.5 Implementation using linear assembly 567</p> <p>17.6 Implementation in C language with compiler switches 572</p> <p>17.7 Laboratory experiment 572</p> <p>17.8 Conclusion 573</p> <p>References 573</p> <p><b>18 FFT implementation 574</b></p> <p>18.1 Introduction 574</p> <p>18.2 FFT algorithm 574</p> <p>18.2.1 Fourier series 574</p> <p>18.2.2 Fourier transform 575</p> <p>18.2.3 Discrete Fourier transform 575</p> <p>18.2.4 Fast Fourier transform 576</p> <p>18.2.4.1 Splitting the DFT into two DFTs 576</p> <p>18.2.4.2 Exploiting the periodicity and symmetry of the twiddle factors 577</p> <p>18.3 FFT implementation 579</p> <p>18.4 Laboratory experiment 582</p> <p>18.4.1 Part 1: Implementation of DIF FFT 582</p> <p>18.4.2 Part 2: Using ping-pong EDMA 585</p> <p>18.5 Conclusion 590</p> <p>References 590</p> <p><b>19 Hough transform 591</b></p> <p>19.1 Introduction 591</p> <p>19.2 Theory 591</p> <p>19.3 Limits of r and θ 593</p> <p>19.4 Hough transform implementation 595</p> <p>19.5 Laboratory experiment 596</p> <p>19.6 Conclusion 603</p> <p>References 603</p> <p><b>20 Stereo vision implementation 604</b></p> <p>20.1 Introduction 604</p> <p>20.2 Algorithm for performing depth calculation 605</p> <p>20.3 Cost functions 606</p> <p>20.4 Implementation 607</p> <p>20.4.1 Laboratory experiment 610</p> <p>20.4.1.1 SAD implementation 610</p> <p>20.4.1.2 NCC implementation 611</p> <p>20.4.1.3 ZNCC implementation 611</p> <p>20.5 Conclusion 613</p> <p>References 616</p> <p>Index 617</p>