Stanford InfoLab Publication Server

A New Computation Model for Cluster Computing

Afrati, Foto N. and Ullman, Jeffrey D. (2009) A New Computation Model for Cluster Computing. Technical Report. Stanford InfoLab.

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF - Submitted for Publication
233Kb

Abstract

Implementations of map-reduce are being used to perform many operations on very large data. We explore alternative ways that a system could use the environment and capabilities of map-reduce implementations such as Hadoop, yet perform operations that are not identical to map-reduce. The centerpiece of this exploration is a computational model that captures the essentials of the environment in which systems like Hadoop operate. Files are unordered sets of tuples that can be read and/or written in parallel; processes are limited in the amount of input/output they can perform, and processors are available in essentially unlimited supply. We develop, in this model, an algorithm for sorting that has a worst-case running time better than the obvious implementations of parallel sorting.

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:cluster computing, map-reduce, sorting, merging, communication cost, parallelism
ID Code:953
Deposited By:Prof Jeffrey Ullman
Deposited On:31 Dec 2009 07:31
Last Modified:31 Dec 2009 07:31

Download statistics

Repository Staff Only: item control page