Google's MapReduce programming model serves for processing large data
sets in a massively parallel manner. We deliver the first rigorous
description of the model including its advancement as Google's
domain-specific language Sawzall. To this end, we reverse-engineer the
seminal papers on MapReduce and Sawzall, and we capture our findings
as an executable specification. We also identify and resolve some
obscurities in the informal presentation given in the seminal
papers. We use typed functional programming (specifically Haskell) as
a tool for design recovery and executable specification. Our
development comprises three components: (i) the basic program skeleton
that underlies MapReduce computations; (ii) the opportunities for
parallelism in executing MapReduce computations; (iii) the fundamental
characteristics of Sawzall's aggregators as an advancement of the
MapReduce approach. Our development does not formalize the more
implementational aspects of an actual, distributed execution of
MapReduce computations.
Keywords
Data processing; Parallel programming;
Distributed programming; Software design; Executable specification;
Typed functional programming; MapReduce; Sawzall; Map; Reduce; List
homomorphism; Haskell.
Bibtex entry
@article{MapReduce,
author = "Ralf L{\"a}mmel",
title = "{Google's MapReduce Programming Model -- Revisited}",
journal = "Science of Computer Programming",
year = 2008,
}