cover_image

Table of Contents

Title Page

Copyright

Preface

Acknowledgments

Contributors

Chapter 1: Power Allocation and Task Scheduling on Multiprocessor Computers with Energy and Time Constraints

1.1 Introduction

1.2 Preliminaries

1.3 Problem Analysis

1.4 Pre-Power-Determination Algorithms

1.5 Post-Power-Determination Algorithms

1.6 Summary and Further Research

Acknowledgment

References

Chapter 2: Power-Aware High Performance Computing

2.1 Introduction

2.2 Background

2.3 Related Work

2.4 PowerPack: Fine-Grain Energy Profiling of HPC Applications

2.5 Power-Aware Speedup Model

2.6 Model Usages

2.7 Conclusion

References

Chapter 3: Energy Efficiency in HPC Systems

3.1 Introduction

3.2 Background and Related Work

3.3 Proactive, Component-Based Power Management

3.4 Quantifying Energy Saving Possibilities

3.5 Evaluation of the Proposed Strategies

3.6 Results

3.7 Concluding Remarks

3.8 Summary

Acknowledgments

References

Chapter 4: A Stochastic Framework for Hierarchical System-Level Power Management

4.1 Introduction

4.2 Related Work

4.3 A Hierarchical DPM Architecture

4.4 Modeling

4.5 Policy Optimization

4.6 Experimental Results

4.7 Conclusion

References

Chapter 5: Energy-Efficient Reservation Infrastructure for Grids, Clouds, and Networks

5.1 Introduction

5.2 Related Works

5.3 Eridis: Energy-Efficient Reservation Infrastructure for Large-Scale Distributed Systems

5.4 Eari: Energy-Aware Reservation Infrastructure for Data Centers and Grids

5.5 GOC: Green Open Cloud

5.6 Hermes: High Level Energy-Aware Model for Bandwidth Reservation in End-To-End Networks

5.7 Summary

References

Chapter 6: Energy-Efficient Job Placement on Clusters, Grids, and Clouds

6.1 Problem and Motivation

6.2 Energy-Aware Infrastructures

6.3 Current Resource Management Practices

6.4 Scientific and Technical Challenges

6.5 Energy-Aware Job Placement Algorithms

6.6 Discussion

6.7 Conclusion

References

Chapter 8: Toward Energy-Aware Scheduling Using Machine Learning

8.1 Introduction

8.2 Intelligent Self-Management

8.3 Introducing Power-Aware Approaches

8.4 Experiences of Applying ML on Power-Aware Self-Management

8.5 Conclusions on Intelligent Power-Aware Self-Management

References

Chapter 9: Energy Efficiency Metrics for DATA Centers

9.1 Introduction

9.2 Fundamentals of Metrics

9.3 Data Center Energy Efficiency

9.4 Available Metrics

9.5 Harmonizing Global Metrics for Data Center Energy Efficiency

References

Chapter 10: Autonomic Green Computing in Large-Scale Data Centers

10.1 Introduction

10.2 Related Technologies and Techniques

10.3 Autonomic Green Computing: A Case Study

10.4 Conclusion and Future Directions

Acknowledgment

References

Chapter 11: Energy and Thermal Aware Scheduling in Data Centers

11.1 Introduction

11.2 Related Work

11.3 Intermachine Scheduling

11.4 Intramachine Scheduling

11.5 Evaluation

11.6 Conclusion

References

Chapter 12: QOS-Aware Power Management in Data Centers

12.1 Introduction

12.2 Problem Classification

12.3 Energy Efficiency

12.4 Power Capping

12.5 Conclusion

References

Chapter 13: Energy-Efficient Storage Systems for Data Centers

13.1 Introduction

13.2 Disk Drive Operation and Disk Power

13.3 Disk and Storage Power Reduction Techniques

13.4 Using Nonvolatile Memory and Solid-State Disks

13.5 Conclusions

References

Chapter 14: Autonomic Energy/Performance Optimizations for Memory in Servers

14.1 Introduction

14.2 Classifications of Dynamic Power Management Techniques

14.3 Applications of Dynamic Power Management (DPM)

14.4 Autonomic Power and Performance Optimization of Memory Subsystems in Server Platforms

14.5 Conclusion

References

Chapter 15: ROD: A Practical Approach to Improving Reliability of Energy-Efficient Parallel Disk Systems

15.1 Introduction

15.2 Modeling Reliability of Energy-Efficient Parallel Disks

15.3 Improving Reliability of Maid Via Disk Swapping

15.4 Experimental Results and Evaluation

15.5 Related Work

15.6 Conclusions

Acknowledgment

References

Chapter 16: Embracing the Memory and I/O Walls for Energy-Efficient Scientific Computing

16.1 Introduction

16.2 Background and Related Work

16.3 β-Adaptation: A New DVFS Algorithm

16.4 Algorithm Effectiveness

16.5 Conclusions and Future Work

Acknowledgments

References

Chapter 17: Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption

17.1 Introduction

17.2 Energy Efficiency in HPC Systems

17.3 Exploitation of Dynamic Voltage–Frequency Scaling

17.4 Preliminaries

17.5 Energy-Aware Scheduling Via DVFS

17.6 Experimental Results

17.7 Conclusion

Acknowledgment

References

Chapter 18: The Paramountcy of Reconfigurable Computing

18.1 Introduction

18.2 Why Computers are Important

18.3 Performance Progress Stalled

18.4 The Tail is Wagging the Dog (Accelerators)

18.5 Reconfigurable Computing

18.6 Conclusions

List of Abbreviations

References

Chapter 19: Workload Clustering for Increasing Energy Savings on Embedded MPSoCs

19.1 Introduction

19.2 Embedded MPSoC Architecture, Execution Model, and Related Work

19.3 Our Approach

19.4 Experimental Evaluation

19.5 Conclusions

References

Chapter 20: Energy-Efficient Internet Infrastructure

20.1 Introduction

20.2 SRAM-Based Pipelined IP Lookup Architectures: Alternative to TCAMs

20.3 Data Structure Optimization for Power Efficiency

20.4 Architectural Optimization to Reduce Dynamic Power Dissipation

20.5 Related Work

20.6 Summary

Acknowledgment

References

Chapter 21: Demand Response in the Smart Grid: A Distributed Computing Perspective

21.1 Introduction

21.2 Demand Response

21.3 Demand Response as a Distributed System

21.4 Summary

References

Chapter 22: Resource Management for Distributed Mobile Computing

22.1 Introduction

22.2 Single-Hop Energy-Constrained Environment

22.3 Multihop Distributed Mobile Computing Environment

22.4 Future Work

Acknowledgment

References

Chapter 23: An Energy-Aware Framework for Mobile Data Mining

23.1 Introduction

23.2 System Architecture

23.3 Mobile Device Components

23.4 Energy Model

23.5 Clustering Scheme

23.6 Conclusion

References

Chapter 24: Energy Awareness and Efficiency in Wireless Sensor Networks: from Physical Devices to the Communication Link

24.1 Introduction

24.2 WSN and Power Dissipation Models

24.3 Strategies for Energy Optimization

24.4 Final Remarks

Acknowledgments

References

Chapter 25: Network-Wide Strategies for Energy Efficiency in Wireless Sensor Networks

25.1 Introduction

25.2 Data Link Layer

25.3 Network Layer

25.4 Transport Layer

25.5 Application Layer

25.6 Final Remarks

Acknowledgments

References

Chapter 26: Energy Management in Heterogeneous Wireless Health Care Networks

26.1 Introduction

26.2 System Model

26.3 Collaborative Distributed Environmental Sensing

26.4 Task Assignment in a Body Area Network

26.5 Results

26.6 Conclusion

References

Index

Series

Title Page

Preface

The scope of energy-efficient computing is not limited to main computing components (e.g., processors, storage devices, and visualization facilities), but it can expand to a much larger range of resources associated with computing facilities, including auxiliary equipment, water used for cooling, and even physical and floor space that these resources occupy. Energy consumption in computing facilities raises various monetary, environmental, and system performance concerns.

Recent advances in hardware technologies have improved the energy consumption issue to a certain degree. However, it still remains a serious concern for energy-efficient computing because the amount of energy consumed by computing and auxiliary hardware resources is affected substantially by their usage patterns. In other words, resource underutilization or overloading incurs a higher volume of energy consumption when compared with efficiently utilized resources. This calls for the development of various software energy-saving techniques and new algorithms that are more energy efficient.

This book, Energy-Efficient Distributed Computing Systems, seeks to provide an opportunity for researchers to explore different energy consumption issues and their impact on the design of new computing systems. The book is quite timely since the field of distributed computing as a whole is undergoing many changes. Vast literature exists today on such energy consumption paradigms and frameworks and their implications for a wide range of distributed platforms.

The book is intended to be a virtual roundtable of several outstanding researchers, which one might invite to attend a conference on energy-efficient computing systems. Of course, the list of topics that is explored here is by no means exhaustive, but most of the conclusions provided here should be extended to other computing platforms that are not covered here. There was a decision to limit the number of chapters while providing more pages for contributing authors to express their ideas, so that the book remains manageable within a single volume.

We also hope that the topics covered in this book will get the readers to think of the implications of such new ideas on the developments in their own fields. The book endeavors to strike a balance between theoretical and practical coverage of innovative problem-solving techniques for a range of distributed platforms. The book is intended to be a repository of paradigms, technologies, and applications that target the different facets of energy consumption in computing systems.

The 26 chapters were carefully selected to provide a wide scope with minimal overlap between the chapters to reduce duplications. Each contributor was asked that his/her chapter should cover review material as well as current developments. In addition, the choice of authors was made so as to select authors who are leaders in their respective disciplines.

Albert Y. Zomaya

Young Choon Lee

Acknowledgments

First and foremost, we would like to thank and acknowledge the contributors to this volume for their support and patience, and the reviewers for their useful comments and suggestions that helped in improving the earlier outline of the book and presentation of the material. Also, I should extend my deepest thanks to Simone Taylor and Diana Gialo from Wiley (USA) for their collaboration, guidance, and most importantly, patience in finalizing this handbook. Finally, I would like to acknowledge the efforts of the team from Wiley's production department for their extensive efforts during the many phases of this project and the timely manner in which the book was produced.

Albert Y. Zomaya

Young Choon Lee

Contributors

Priti, Aghera, University of California, San Diego, CA, USA.
Al-Nashif, Youssif, NSF Center for Autonomic Computing, The University of Arizona, USA.
Ayoub, Raid, University of California, San Diego, CA, USA.
Berral, Josep Ll., Computer Architecture Dept. and Department of Software, UPC-Barcelona Tech., Catalonia, Spain.
Bilal, Kashif, Department of Computer Science, North Dakota State University, Fargo, ND, USA.
Boloori, Ali Javadzadeh, Centre for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, NSW, Australia.
Borgetto, Damien, University Paul Sabatier, Toulouse, France.
Bouvry, Pascal, Faculty of Sciences, Technology, and Communications, University of Luxembourg, Luxembourg.
Cameron, Kirk W., Virginia Tech, VA, USA.
Casanova, Henri, University of Hawai'i at Manoa, Hawai'i, USA.
Comito, Carmela, DEIS, University of Calabria, Rende (CS), Italy.
Da Costa, Georges, University Paul Sabatier, Toulouse, France.
Delicato, Flavia C., Computer Science Department, Federal University of Rio de Janeiro—RN, Brazil.
Dhiman, Gaurav, University of California, San Diego, CA, USA.
Feng, Wu-chun, Virginia Tech, Blacksburg, Virginia, USA.
Joseph. O. Fito, Computer Architecture Dept. and Barcelona Supercomputing Center, UPC-Barcelona Tech., Catalonia, Spain.
Gavalda, Ricard, Department of Software, UPC-Barcelona Tech., Catalonia, Spain.
Ge, Rong, The Department of Mathematics, Statistics, and Computer Science, Marquette University, WI, USA.
Ghani, Nasir, Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM, USA.
Goiri, Inigo, Computer Architecture Dept. and Barcelona Supercomputing Center, UPC-Barcelona Tech., Catalonia, Spain.
Gong, Jiayu, Department of Electrical and Computer Engineering, Wayne State University, MI, USA.
De Groot, Martin, CSIRO ICT Center, Epping, NSW, Australia.
Guitart, Jordi, Computer Architecture Dept. and Barcelona Supercomputing Center, UPC-Barcelona Tech., Catalonia, Spain.
Gurumurthi, Sudhanva, Dept. of Computer Science, University of Virginia, Charlottesville, VA, USA.
Hariri, Salim, NSF Center for Autonomic Computing, The University of Arizona, USA.
Hartenstein, Reiner, Department of Computer Science, Kaiserslautern University of Technology, Kaiserslautern, Germany.
Hsu, Chung-Hsing, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Jiang, Weirong, Juniper Networks, Inc., Sunnyvale, CA, USA.
Julia, Ferran, Computer Architecture Dept., UPC-Barcelona Tech., Catalonia, Spain.
Kandemir, Mahmut, Pennsylvania State University, PA, USA.
Khan, Samee Ullah, Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND, USA.
Khargharia, Bithika, Cisco Systems, Inc., Durham, NC, USA.
Kim, Jong-Kook, School of Electrical Engineering, Korea University, Korea.
Lee, Young Choon, Centre for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, NSW, Australia.
Lefevre, Laurent, INRIA, Ecole Normale Superieure de Lyon, University of Lyon, France.
Leingang, James, Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND, USA.
Li, Juan, Department of Computer Science, North Dakota State University, Fargo, ND, USA.
Li, Keqin, State University of New York, New Paltz, NY, USA.
Lindberg, Peder, Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND, USA.
Luo, Haoting, NSF Center for Autonomic Computing, The University of Arizona, AZ, USA.
Lysaker, Daniel, Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND, USA.
Manzanares, Adam, Los Alamos National Laboratory, Los Alamos, NM, USA.
Min-Allah, Nasro, Department of Computer Science, COMSATS Institute of Information Technology, Pakistan.
Narayanan, Sri Hari Krishna, Argonne National Laboratory, IL, USA.
Nikzad, Nima, University of California, San Diego, CA, USA.
Nou, Ramon, Computer Architecture Dept. and Barcelona Supercomputing Center, UPC-Barcelona Tech., Catalonia, Spain.
Orgerie, Anne-Cecile, Ecole Normale Superieure de Lyon, Lyon, France.
Ozturk, Ozcan, Bilkent University, Turkey.
Parashar, Manish, NSF Cloud and Autonomic Computing Center and Rutgers Discovery Informatics Institute, Rutgers University, NJ, USA.
Pedram, Massoud, University of Southern California, Los Angeles, CA, USA.
Pierson, Jean-Marc, University Paul Sabatier, Toulouse, France.
Pires, Paulo F., Computer Science Department, Federal University of Rio de Janeiro - RN, Brazil.
Prasanna, Viktor K., University of Southern California, Los Angeles, CA, USA.
Qin, Xiao, Auburn University, Auburn, AL, USA.
Rizvandi, Nikzad Babaii, Centre for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, NSW, Australia.
Rodero, Ivan, NSF Cloud and Autonomic Computing Center and Rutgers Discovery Informatics Institute, Rutgers University, NJ, USA.
Rong, Peng, Brocade Communications Systems, San Jose, CA, USA.
Rosing, Tajana Simunic, University of California, San Diego, CA, USA.
Ruan, Xiaojun, Auburn University, Auburn, AL, USA.
Sivasubramaniam, Anand, Dept. of Computer Science and Engineering, The Pennsylvania State University, PA, USA.
Taheri, Javid, Centre for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, NSW, Australia.
Talia, Domenico, DEIS, University of Calabria, Rende (CS), Italy.
Torres, Jordi, Computer Architecture Dept. and Barcelona Supercomputing Center, UPC-Barcelona Tech., Catalonia, Spain.
Trunfio, Paolo, DEIS, University of Calabria, Rende (CS), Italy.
Wang, Chen, CSIRO ICT Center, Epping, NSW, Australia.
Xu, Cheng-Zhong, Department of Electrical and Computer Engineering, Wayne State University, MI, USA.
Yin, Shu, Auburn University, Auburn, AL, USA.
Yousif, Mazin, T-Systems International, Inc., Portland, OR, USA.
Zappi, Piero, University of California, San Diego, CA, USA.
Zomaya, Albert Y., Centre for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, NSW, Australia.

Chapter 1

Power Allocation and Task Scheduling on Multiprocessor Computers with Energy and Time Constraints

KEQIN LI

1.1 Introduction

1.1.1 Energy Consumption

Performance-driven computer development has lasted for over six decades. Computers have been developed to achieve higher performance. As of June 2010, three supercomputers have achieved petaflops speed: Cray Jaguar (224,162 processors, 1.759 petaflops), Dawning Nebulae (120,640 processors, 1.271 petaflops), and IBM Roadrunner (122,400 processors, 1.042 petaflops) [1]. According to Moore's law of computing hardware, the following quantities increase (decrease) exponentially, doubling (halving) approximately every 2 years: the number of transistors per integrated circuit (cost per transistor), processing speed, memory/storage capacity (cost per unit of information), and network capacity [2].

While performance/cost has increased dramatically, power consumption in computer systems has also increased according to Moore's law. To achieve higher computing performance per processor, microprocessor manufacturers have doubled the power density at an exponential speed over decades, which will soon reach that of a nuclear reactor [3]. Such increased energy consumption causes severe economic, ecological, and technical problems.

It is clear that there are compelling economic, environmental, and technical reasons for emphasis on energy efficiency.

1.1.2 Power Reduction

Power conservation is critical in many computation and communication environments and has attracted extensive research activities. For high performance supercomputers, energy-aware design has significance impact on system performance. It is noticed that performance per rack equals to performance per watt times watt per rack, where watt per rack is determined by thermal cooling capabilities and can be considered as a constant of order 20 kW for an air-cooled rack. Therefore, it is the performance per watt term that determines the rack performance. It is found that in terms of performance per watt, the low frequency and low power embedded IBM PowerPC consistently outperforms high frequency and high power microprocessors by a factor of 2–10. This is one of the main reasons why IBM chose the low power design for the Blue Gene/L supercomputer that was developed around a processor with moderate frequency. In mobile computing and communication environments, efficient processor power management increases the lifetime of battery operated devices such as hand-held mobile computers and portable embedded systems. Energy efficiency is a major design constraint in these portable devices, since battery technology has not been developed in the same pace as semiconductor industry.

Reducing processor energy consumption has been an important and pressing research issue in recent years. There has been increasing interest and importance in developing high performance and energy-efficient computing systems. There exists a large body of literature on power-aware computing and communication. The reader is referred to References [3, 9–11] for comprehensive surveys.

There are two approaches to reducing power consumption in computing systems. The first approach is the method of thermal-aware hardware design, which can be carried out at various levels, including device level power reduction, circuit and logic level techniques, and architecture level power reduction (low power processor architecture adaptations, low power memories and memory hierarchies, and low power interconnects). Low power consumption and high system reliability, availability, and usability are main concerns of modern high performance computing system development. In addition to the traditional performance measure using FLOPS, the Green500 list uses FLOPS per watt to rank the performance of computing systems, so that the awareness of other performance metrics such as energy efficiency and system reliability can be raised [12]. All the current systems that can achieve at least 400 MFLOPS/W are clusters of low power processors, aiming to achieve high performance/power and performance/space. For instance, the Dawning Nebulae, currently the world's second fastest computer, which achieves peak performance of 2.984 PFLOPS, is also the fourth most energy-efficient supercomputer in the world with an operational rate of 492.64 MFLOPS/W [12]. Intel's Tera-scale research project has developed the world's first programmable processor that delivers supercomputer-like performance from a single 80-core chip, which uses less electricity than most of today's home appliances and achieves over 16.29 GFLOPS/W [13].

The second approach to reducing energy consumption in computing systems is the method of power-aware software design at various levels, including operating system level power management, compiler level power management, application level power management, and cross-layer (from transistors to applications) adaptations. The power reduction technique discussed in this chapter belongs to the operating system level, which we elaborate in the next section.

1.1.3 Dynamic Power Management

Software techniques for power reduction are supported by a mechanism called dynamic voltage scaling (equivalently, dynamic frequency scaling, dynamic speed scaling, and dynamic power scaling). Many modern components allow voltage regulation to be controlled through software, for example, the BIOS or applications such as PowerStrip. It is usually possible to control the voltages supplied to the CPUs, main memories, local buses, and expansion cards [14]. Processor power consumption is proportional to frequency and the square of supply voltage. A power-aware algorithm can change supply voltage and frequency at appropriate times to optimize a combined consideration of performance and energy consumption. There are many existing technologies and commercial processors that support dynamic voltage (frequency, speed, power) scaling. SpeedStep is a series of dynamic frequency scaling technologies built into some Intel microprocessors that allow the clock speed of a processor to be dynamically changed by software [15]. LongHaul is a technology developed by VIA Technologies, which supports dynamic frequency scaling and dynamic voltage scaling. By executing specialized operating system instructions, a processor driver can exercise fine control on the bus-to-core frequency ratio and core voltage according to how much load is put on the processor [16]. LongRun and LongRun2 are power management technologies introduced by Transmeta. LongRun2 has been licensed to Fujitsu, NEC, Sony, Toshiba, and NVIDIA [17].

Dynamic power management at the operating system level refers to supply voltage and clock frequency adjustment schemes implemented while tasks are running. These energy conservation techniques explore the opportunities for tuning the energy-delay tradeoff [18]. Power-aware task scheduling on processors with variable voltages and speeds has been extensively studied since the mid-1990s. In a pioneering paper [19], the authors first proposed an approach to energy saving by using fine grain control of CPU speed by an operating system scheduler. The main idea is to monitor CPU idle time and to reduce energy consumption by reducing clock speed and idle time to a minimum. In a subsequent work [20], the authors analyzed offline and online algorithms for scheduling tasks with arrival times and deadlines on a uniprocessor computer with minimum energy consumption. These research have been extended in References [21–27] and inspired substantial further investigation, much of which focus on real-time applications, namely, adjusting the supply voltage and clock frequency to minimize CPU energy consumption while still meeting the deadlines for task execution. In References [28–42] and many other related work, the authors addressed the problem of scheduling independent or precedence constrained tasks on uniprocessor or multiprocessor computers where the actual execution time of a task may be less than the estimated worst-case execution time. The main issue is energy reduction by slack time reclamation.

1.1.4 Task Scheduling with Energy and Time Constraints

There are two considerations in dealing with the energy-delay tradeoff. On the one hand, in high performance computing systems, power-aware design techniques and algorithms attempt to maximize performance under certain energy consumption constraints. On the other hand, low power and energy-efficient design techniques and algorithms aim to minimize energy consumption while still meeting certain performance requirements. In Reference 43, the author studied the problems of minimizing the expected execution time given a hard energy budget and minimizing the expected energy expenditure given a hard execution deadline for a single task with randomized execution requirement. In Reference 44, the author considered scheduling jobs with equal requirements on multiprocessors. In Reference 45, the authors studied the relationship among parallelization, performance, and energy consumption, and the problem of minimizing energy-delay product. In References 46, 47, the authors attempted joint minimization of energy consumption and task execution time. In Reference 48, the authors investigated the problem of system value maximization subject to both time and energy constraints.

In this chapter, we address energy and time constrained power allocation and task scheduling on multiprocessor computers with dynamically variable voltage, frequency, speed, and power as combinatorial optimization problems. In particular, we define the problem of minimizing schedule length with energy consumption constraint and the problem of minimizing energy consumption with schedule length constraint on multiprocessor computers [49]. The first problem has applications in general multiprocessor and multicore processor computing systems, where energy consumption is an important concern, and in mobile computers, where energy conservation is a main concern. The second problem has applications in real-time multiprocessing systems and environments such as parallel signal processing, automated target recognition, and real-time MPEG encoding, where timing constraint is a major requirement. Our scheduling problems are defined such that the energy-delay product is optimized by fixing one factor and minimizing the other.

1.1.5 Chapter Outline

The rest of the chapter is organized as follows: In Section 1.2, we present the power consumption model; define our power allocation and task scheduling problems on multiprocessor computers with energy and time constraints; describe various task models, processor models, and scheduling models; discuss problem decomposition and subproblems; and mention different types of algorithms. In Section 1.3, we develop optimal solution to our problems on uniprocessor computers and multiprocessor computers with given partitions of tasks, prove the strong NP-hardness of our problems, derive lower bounds for optimal solutions, and the energy-delay tradeoff theorem. In Section 1.4, we present and analyze the performance of pre-power-determination algorithms, including equal-time algorithms, equal-energy algorithms, and equal-speed algorithms. We show both numerical data and simulation results of our performance bounds. In Section 1.5, we present and analyze the performance of post-power-determination algorithms. We demonstrate both numerical data and simulation results of our performance bounds. In Section 1.6, we summarize the chapter and point out several further research directions.

1.2 Preliminaries

1.2.1 Power Consumption Model

Power dissipation and circuit delay in digital CMOS circuits can be accurately modeled by simple equations, even for complex microprocessor circuits. CMOS circuits have dynamic, static, and short-circuit power dissipation; however, the dominant component in a well-designed circuit is dynamic power consumption p (i.e., the switching component of power), which is approximately p = aCV2f, where a is an activity factor, C is the loading capacitance, V is the supply voltage, and f is the clock frequency [50]. Since sf, where s is the processor speed, and fVϕ with images [51], which implies that Vf1/ϕ, we know that the power consumption is pfα and psα, where images.

Assume that we are given n independent sequential tasks to be executed on m identical processors. Let ri represent the execution requirement (i.e., the number of CPU cycles or the number of instructions) of task i, where 1 ≤ in. We use pi (Vi, fi, respectively) to represent the power (supply voltage, clock frequency, respectively) allocated to execute task i. For ease of discussion, we will assume that pi is simply images, where images is the execution speed of task i. The execution time of task i is images. The energy consumed to execute task i is images.

We would like to mention the following number of basic and important observations: (i) images and images: Linear change in supply voltage results in up to linear change in clock frequency and processor speed; (ii) images and images and images: Linear change in supply voltage results in at least quadratic change in power supply and linear change in clock frequency and processor speed results in at least cubic change in power supply; (iii) images and images: The processor energy performance, measured by speed per watt [12], is at least quadratically proportional to the supply voltage and speed reduction; (iv) images and images, where ri is the amount of work to be performed for task i: The processor energy performance, measured by work per Joule [19], is at least quadratically proportional to the supply voltage and speed reduction; (v) images: Linear change in supply voltage results in quadratic change in energy consumption; (vi) images: Linear change in processor speed results in at least quadratic change in energy consumption; (vii) images: Energy consumption reduces at a sublinear speed, as power supply reduces; (viii) images and images: For a given task, there exist energy-delay and power-delay tradeoffs. (Later, we will extend such tradeoff to a set of tasks, i.e., the energy-delay tradeoff theorem.)

1.2.2 Problem Definitions

The power allocation and task scheduling problems on multiprocessor computers with energy and time constraints addressed in this chapter are defined as the following optimization problems.

Problem 1.1 (Minimizing Schedule Length with Energy Consumption Constraint)
Input: A set of n independent sequential tasks, a multiprocessor computer with m identical processors, and energy constraint E.
Output: Power supplies p1, p2, …, pn to the n tasks and a schedule of the n tasks on the m processors such that the schedule length is minimized and the total energy consumption does not exceed E.
Problem 1.2 (Minimizing Energy Consumption with Schedule Length Constraint)
Input: A set of n independent sequential tasks, a multiprocessor computer with m identical processors, and time constraint T.
Output: Power supplies p1, p2, …, pn to the n tasks and a schedule of the n tasks on the m processors such that the total energy consumption is minimized and the schedule length does not exceed T.

The framework of investigation can be established based on the product of three spaces, namely, the task models, the processors models, and the scheduling models. The above research problems have many variations and extensions, depending on the task models, processors models, and scheduling models. These power allocation and task scheduling problems can be investigated in a variety of ways to consider sophisticated application environments, realistic processor technologies, and practical scheduling algorithms.

1.2.3 Task Models

Our independent sequential tasks can be extended to precedence constrained tasks, parallel tasks, and dynamic tasks, which arise in various application environments.

1.2.4 Processor Models

The following processor technologies can be incorporated into our power allocation and task scheduling problems.

1.2.5 Scheduling Models

As in traditional scheduling theory, different types of scheduling algorithms can be considered for power-aware task scheduling problems.

1.2.6 Problem Decomposition

Our power allocation and task scheduling problems contain four nontrivial subproblems, namely, system partitioning, precedence constraining, task scheduling, and power supplying. Each subproblem should be solved efficiently, so that heuristic algorithms with overall good performance can be developed.

The above decomposition of our optimization problems into several subproblems makes design and analysis of heuristic algorithms tractable. Our approach is significantly different from most existing studies. A unique feature of our work is to compare the performance of our algorithms with optimal solutions analytically and validate our results experimentally, and not to compare the performance of heuristic algorithms among themselves only experimentally. Such an approach is consistent with traditional scheduling theory.

1.2.7 Types of Algorithms

There are naturally three types of power-aware task scheduling algorithms, depending on the order of power supplying and task scheduling.

1.3 Problem Analysis

Our study in this chapter assumes the following models, namely, task model: independent, sequential, static tasks; processor model: a single system of homogeneous processors with continuous and unbounded and regular voltage/frequency/speed/power levels and without overheads for voltage/frequency/speed/power adjustment and idle processors; scheduling model: nonpreemptive, offline, clairvoyant scheduling. The above combination of task model, processor model, and scheduling model yields the easiest version of our power allocation and task scheduling problems.

1.3.1 Schedule Length Minimization

1.3.1.1 Uniprocessor computers

It is clear that on a uniprocessor computer with energy constraint E, the problem of minimizing schedule length with energy consumption constraint is simply to find the power supplies p1, p2, …, pn, such that the schedule length

images

is minimized and the total energy consumed e1 + e2 + ··· + en does not exceed E, that is,

images

Notice that both the schedule length T(p1, p2, … , pn) and the energy consumption F(p1, p2, … , pn) are viewed as functions of p1, p2, …, pn.

We can minimize T(p1, p2, … , pn) subject to the constraint F(p1, p2, … , pn) = E by using the Lagrange multiplier system:

images

where λ is a Lagrange multiplier. Since

images

that is,

images

where 1 ≤ in, we have pi = 1/λ(1 − α), for all 1 ≤ in. Substituting the above pi into the constraint F(p1, p2, … , pn) = E, we get images, where R = r1 + r2 + ··· + rn is the total execution requirement of the n tasks. Therefore, we obtain pi = 1/λ(1 − α) images, for all 1 ≤ in.

The above discussion is summarized in the following theorem, which gives the optimal power supplies and the optimal schedule length.


Theorem 1.1
On a uniprocessor computer, the schedule length is minimized when all tasks are supplied with the same power pi = (E/R)α/(α−1), where 1 ≤ in. The optimal schedule length is images.

1.3.1.2 Multiprocessor computers

Let us consider a multiprocessor computer with m processors. Assume that a set of n tasks is partitioned into m groups, such that all the tasks in group k are executed on processor k, where 1 ≤ km. Let Rk denote group k and the total execution requirement of the tasks in group k. For a given partition of the n tasks into m groups R1, R2, … , Rm, we are seeking power supplies that minimize the schedule length.

Let Ek be the energy consumed by all the tasks in group k. We observe that by fixing Ek and adjusting the power supplies for the tasks in group k to the same power (Ek/Rk)α/(α−1) according to Theorem 1.1, the total execution time of the tasks in group k can be minimized to images. Therefore, the problem of finding power supplies p1, p2, …, pn, which minimize the schedule length is equivalent to finding E1, E2, …, Em, which minimize the schedule length. It is clear that the schedule length is minimized when all the m processors complete their execution of the m groups of tasks at the same time T, that is, T1 = T2 = ··· = Tm = T, which implies that images. Since E1 + E2 + ··· + Em = E, we have

images

that is,

images

and

images

Thus, we have proved the following theorem.


Theorem 1.2
For a given partition R1, R2, …, Rm of n tasks into m groups on a multiprocessor computer, the schedule length is minimized when all the tasks in group k are supplied with the same power (Ek/Rk)α/(α−1), where

images

for all 1 ≤ km. The optimal schedule length is

images

for the above power supplies.


1.3.2 Energy Consumption Minimization

1.3.2.1 Uniprocessor computers

It is clear that on a uniprocessor computer with time constraint T, the problem of minimizing energy consumption with schedule length constraint is simply to find the power supplies p1, p2, …, pn, such that the total energy consumption

images

is minimized and the schedule length t1 + t2 + ··· + tn does not exceed T, that is,

images

The energy consumption E(p1, p2, … , pn) and the schedule length F(p1, p2, … , pn) are viewed as functions of p1, p2, … , pn.

We can minimize E(p1, p2, … , pn) subject to the constraint F(p1, p2, … , pn) = T by using the Lagrange multiplier system:

images

where λ is a Lagrange multiplier. Since

images

that is,

images

where 1 ≤ in, we have pi = λ/(1 − α), for all 1 ≤ in. Substituting the above pi into the constraint F(p1, p2, … , pn) = T, we get images and images, for all 1 ≤ in.

The above discussion gives rise to the following theorem, which gives the optimal power supplies and the minimum energy consumption.


Theorem 1.3
On a uniprocessor computer, the total energy consumption is minimized when all tasks are supplied with the same power pi = (R/T)α, where 1 ≤ in. The minimum energy consumption is images.

1.3.2.2 Multiprocessor computers

By Theorem 1.3, the energy consumed by tasks in group k is minimized as images by allocating the same power (Rk/T)α to all the tasks in group kT