Afrati, Foto N. and Ullman, Jeffrey D. (2009) A New Computation Model for Cluster Computing. Technical Report. Stanford InfoLab.
BibTeX | DublinCore | EndNote | HTML |
![]()
| PDF - Submitted for Publication 233Kb |
Abstract
Implementations of map-reduce are being used to perform many operations on very large data. We explore alternative ways that a system could use the environment and capabilities of map-reduce implementations such as Hadoop, yet perform operations that are not identical to map-reduce. The centerpiece of this exploration is a computational model that captures the essentials of the environment in which systems like Hadoop operate. Files are unordered sets of tuples that can be read and/or written in parallel; processes are limited in the amount of input/output they can perform, and processors are available in essentially unlimited supply. We develop, in this model, an algorithm for sorting that has a worst-case running time better than the obvious implementations of parallel sorting.
Item Type: | Techreport (Technical Report) |
---|---|
Uncontrolled Keywords: | cluster computing, map-reduce, sorting, merging, communication cost, parallelism |
ID Code: | 953 |
Deposited By: | Prof Jeffrey Ullman |
Deposited On: | 31 Dec 2009 07:31 |
Last Modified: | 31 Dec 2009 07:31 |
Download statistics
Repository Staff Only: item control page