Afrati, Foto N. and Ullman, Jeffrey D. (2009) A New Computation Model for Cluster Computing. Technical Report. Stanford InfoLab.
|PDF - Submitted for Publication|
Implementations of map-reduce are being used to perform many operations on very large data. We explore alternative ways that a system could use the environment and capabilities of map-reduce implementations such as Hadoop, yet perform operations that are not identical to map-reduce. The centerpiece of this exploration is a computational model that captures the essentials of the environment in which systems like Hadoop operate. Files are unordered sets of tuples that can be read and/or written in parallel; processes are limited in the amount of input/output they can perform, and processors are available in essentially unlimited supply. We develop, in this model, an algorithm for sorting that has a worst-case running time better than the obvious implementations of parallel sorting.
|Item Type:||Techreport (Technical Report)|
|Uncontrolled Keywords:||cluster computing, map-reduce, sorting, merging, communication cost, parallelism|
|Deposited By:||Prof Jeffrey Ullman|
|Deposited On:||31 Dec 2009 07:31|
|Last Modified:||31 Dec 2009 07:31|
Repository Staff Only: item control page