Format of Original
Procedia Computer Science
Original Item ID
The demands of improving energy efficiency for high performance scientific applications arise crucially nowadays. Software-controlled hardware solutions directed by Dynamic Voltage and Frequency Scaling (DVFS) have shown their effectiveness extensively. Although DVFS is beneficial to green computing, introducing DVFS itself can incur non-negligible overhead, if there exist a large number of frequency switches issued by DVFS. In this paper, we propose a strategy to achieve the optimal energy savings for distributed matrix multiplication via algorithmically trading more computation and communication at a time adaptively with user-specified memory costs for less DVFS switches, which saves 7.5% more energy on average than a classic strategy. Moreover, we leverage a high performance communication scheme for fully exploiting network bandwidth via pipeline broadcast. Overall, the integrated approach achieves substantial energy savings (up to 51.4%) and performance gain (28.6% on average) compared to ScaLAPACK pdgemm() on a cluster with an Ethernet switch, and outperforms ScaLAPACK and DPLASMA pdgemm() respectively by 33.3% and 32.7% on average on a cluster with an Infiniband switch.
Tan, Li; Chen, Longxiang; Chen, Zizhong; Zong, Ziliang; Ge, Rong; and Li, Dong, "HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON" (2014). Mathematics, Statistics and Computer Science Faculty Research and Publications. 248.
Published version. Procedia Computer Science, Vol. 29 (2014): 599-613. DOI. © 2014 The Authors. Used with permission. Published under Creative Commons License 3.0.