Afrati, Foto and Joglekar, Manas and Re, Chris and Salihoglu, Semih and Ullman, Jeffrey D. GYM: A Multiround Join Algorithm In MapReduce. Technical Report. Stanford InfoLab.
We study the problem of computing the join of n relations in mul- tiple rounds of MapReduce. We introduce a distributed and gen- eralized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the query in O(d) rounds and O(n(INw + OUT)) communication and computation cost. Using GYM we achieve two main results: (1) Every width- w query can be computed in O(n) rounds of MapReduce with O(n(INw + OUT)) cost; (2) Every width-w query can be com- puted in O(log(n)) rounds of MapReduce with O(n(IN3w +OUT)) cost. We achieve our second result by showing how to construct a O(log(n))-depth and width-3w GHD of a query of width w. We describe another general technique to construct even shorter depth GHDs with longer widths, effectively showing a spectrum of tradeoffs one can make between communication and computation and the number of rounds of MapReduce. By simulating MapRe- duce in the PRAM model, our second main result also implies the result of Gottlob et al.  that computing acyclic and constant- width queries are in NC. In fact, for certain queries, our approach yields significantly fewer PRAM steps than does the construction of the latter paper. However, we achieve our results using only Yan- nakakis’s algorithm, which has been perceived to have a sequential nature. Instead, we surprisingly show that Yannakakis’s algorithm can be parallelized significantly by giving it as input short-depth GHDs of queries.
|Item Type:||Techreport (Technical Report)|
|Deposited By:||Semih Salihoglu|
|Deposited On:||10 Oct 2014 23:35|
|Last Modified:||31 Jan 2015 23:28|
Repository Staff Only: item control page