Document Type

Article

Language

eng

Format of Original

15 p.

Publication Date

2014

Publisher

Elsevier

Source Publication

Procedia Computer Science

Source ISSN

1877-0509

Original Item ID

doi: 10.1016/j.procs.2014.05.054

Abstract

The demands of improving energy efficiency for high performance scientific applications arise crucially nowadays. Software-controlled hardware solutions directed by Dynamic Voltage and Frequency Scaling (DVFS) have shown their effectiveness extensively. Although DVFS is beneficial to green computing, introducing DVFS itself can incur non-negligible overhead, if there exist a large number of frequency switches issued by DVFS. In this paper, we propose a strategy to achieve the optimal energy savings for distributed matrix multiplication via algorithmically trading more computation and communication at a time adaptively with user-specified memory costs for less DVFS switches, which saves 7.5% more energy on average than a classic strategy. Moreover, we leverage a high performance communication scheme for fully exploiting network bandwidth via pipeline broadcast. Overall, the integrated approach achieves substantial energy savings (up to 51.4%) and performance gain (28.6% on average) compared to ScaLAPACK pdgemm() on a cluster with an Ethernet switch, and outperforms ScaLAPACK and DPLASMA pdgemm() respectively by 33.3% and 32.7% on average on a cluster with an Infiniband switch.

Comments

Published version. Procedia Computer Science, Vol. 29 (2014): 599-613. DOI. © 2014 The Authors. Used with permission. Published under Creative Commons License 3.0.

Share

COinS