A New Computation Model for Cluster Computing

Afrati, Foto N. and Ullman, Jeffrey D. (2009) A New Computation Model for Cluster Computing. Technical Report. Stanford InfoLab.

Preview

PDF - Submitted for Publication
233Kb

Abstract

Implementations of map-reduce are being used to perform many operations on very large data. We explore alternative ways that a system could use the environment and capabilities of map-reduce implementations such as Hadoop, yet perform operations that are not identical to map-reduce. The centerpiece of this exploration is a computational model that captures the essentials of the environment in which systems like Hadoop operate. Files are unordered sets of tuples that can be read and/or written in parallel; processes are limited in the amount of input/output they can perform, and processors are available in essentially unlimited supply. We develop, in this model, an algorithm for sorting that has a worst-case running time better than the obvious implementations of parallel sorting.

Item Type:	Techreport (Technical Report)
Uncontrolled Keywords:	cluster computing, map-reduce, sorting, merging, communication cost, parallelism
ID Code:	953
Deposited By:	Prof Jeffrey Ullman
Deposited On:	31 Dec 2009 07:31
Last Modified:	31 Dec 2009 07:31

Download statistics

Repository Staff Only: item control page