Document Type
Conference Proceeding
Language
eng
Format of Original
5 p.
Publication Date
9-2013
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Source Publication
42nd International Conference on Parallel Processing (ICPP) 2013
Source ISSN
0190-3918
Original Item ID
doi: 10.1109/CLUSTER.2013.6702672
Abstract
Boosting performance and energy efficiency of scientific applications running on high performance computing systems arise cruicially [sic] nowadays. Software and hardware based solutions for improving communication performance have been recognized as significant means of achieving performance gain and thus energy savings for such applications. As a fundamental component of most numerical linear algebra algorithms, improving performance and energy efficiency of distributed matrix multiplication is of major concerns. For such purposes, we propose a high performance communication scheme that fully exploits network bandwidth via non-blocking pipeline broadcast with tuned chunk size. Empirically, substantial performance gain up to 8.4% and energy savings up to 6.9% are achieved compared to blocking pipeline broadcast, and against binomial tree broadcast, performance gain up to 6.5% and energy savings up to 6.1% are observed on a 64-core cluster.
Recommended Citation
Tan, Li; Chen, Longxiang; Chen, Zizhong; Zong, Ziliang; Ge, Rong; and Li, Dong, "Improving Performance and Energy Efficiency of Matrix Multiplication via Pipeline Broadcast" (2013). Mathematics, Statistics and Computer Science Faculty Research and Publications. 179.
https://epublications.marquette.edu/mscs_fac/179
Comments
Accepted version. Published as part of the proceedings of the conference, 2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013: 1-5. DOI. © 2013 The Institute of Electrical and Electronics Engineers. Used with permission.