# Communication Efficient Matrix-Multiplication on Hypercubes.

Gupta, H. and Sadayappan, P. (1994) Communication Efficient Matrix-Multiplication on Hypercubes. Technical Report. Stanford InfoLab. (Publication Note: Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, 1994 (SPAA 1994). Extended version in Parallel Computing, 22 (1996) 75-99.)

 BibTeX DublinCore EndNote HTML  Preview
PDF
245Kb

## Abstract

1 Introduction Dense matrix multiplication is used in a variety of applications and is one of the core components in many scientific computations. The standard way of multiplying two matrices of size n n requires O(n 3 ) floating point operations on a sequential machine. Since dense matrix multiplication is computationally expensive, the development of effcient algorithms for large distributed memory machines is of great interest. Matrix multiplication is a very regular computation and lends itself well to parallel implementation. One of the effcient approaches to design other parallel matrix or graph algorithms is to decompose them into a sequence of matrix multiplications [3, 9]. One of the earliest distributed algorithms proposed for matrix multiplication was by Cannon  in 1969 for 2-D meshes. Ho, Johnsson, and Edelman in  presented a variant of Cannon's algorithm which uses the full bandwidth of a 2-D grid embedded in a hypercube. Some other algorithms are by Dekel, Nassimi, and Sahni , Berntsen  and Fox, Otto, and Hey . Gupta and Kumar in  discuss the scalability of these algorithms and their variants. In this paper we propose two new algorithms for hypercubes. The algorithms proposed in this paper are better than all previously proposed algorithms for a wide range of matrix sizes and number of processors. The rest of the paper is organized as follows. In Section 2 we state our assumptions and discuss the communication models used. In Section 3 we discuss the previously proposed algorithms. In Section 4 we present the new algorithms. Section 5 presents some optimality results. In Section 6, we analyze the performance of the algorithms on hypercubes for three different communication cost W e present our conclusions in Section 7. 2 Communication Models In this paper we analyze the performance of the various algorithms presented for hypercube architectures. Throughout this paper, we refer to a 2-ary n-cube as a hypercube and all the logarithms

Item Type: Subjects: Techreport (Technical Report) Miscellaneous WHIPS Project Homepage http://infolab.stanford.edu/warehousing/warehouse.html 59 Import Account 25 Feb 2000 16:00 05 Feb 2009 15:31  