Stanford InfoLab Publication Server

Communication Efficient Matrix-Multiplication on Hypercubes.

Gupta, H. and Sadayappan, P. (1994) Communication Efficient Matrix-Multiplication on Hypercubes. Technical Report. Stanford InfoLab. (Publication Note: Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, 1994 (SPAA 1994). Extended version in Parallel Computing, 22 (1996) 75-99.)




1 Introduction Dense matrix multiplication is used in a variety of applications and is one of the core components in many scientific computations. The standard way of multiplying two matrices of size n n requires O(n 3 ) floating point operations on a sequential machine. Since dense matrix multiplication is computationally expensive, the development of effcient algorithms for large distributed memory machines is of great interest. Matrix multiplication is a very regular computation and lends itself well to parallel implementation. One of the effcient approaches to design other parallel matrix or graph algorithms is to decompose them into a sequence of matrix multiplications [3, 9]. One of the earliest distributed algorithms proposed for matrix multiplication was by Cannon [2] in 1969 for 2-D meshes. Ho, Johnsson, and Edelman in [8] presented a variant of Cannon's algorithm which uses the full bandwidth of a 2-D grid embedded in a hypercube. Some other algorithms are by Dekel, Nassimi, and Sahni [3], Berntsen [1] and Fox, Otto, and Hey [4]. Gupta and Kumar in [5] discuss the scalability of these algorithms and their variants. In this paper we propose two new algorithms for hypercubes. The algorithms proposed in this paper are better than all previously proposed algorithms for a wide range of matrix sizes and number of processors. The rest of the paper is organized as follows. In Section 2 we state our assumptions and discuss the communication models used. In Section 3 we discuss the previously proposed algorithms. In Section 4 we present the new algorithms. Section 5 presents some optimality results. In Section 6, we analyze the performance of the algorithms on hypercubes for three different communication cost W e present our conclusions in Section 7. 2 Communication Models In this paper we analyze the performance of the various algorithms presented for hypercube architectures. Throughout this paper, we refer to a 2-ary n-cube as a hypercube and all the logarithms

Item Type:Techreport (Technical Report)
Related URLs:Project Homepage
ID Code:59
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:05 Feb 2009 15:31

Download statistics

Repository Staff Only: item control page