
![]()
Google’s distributed storage architecture for data is combined with distributed execution of the software that parses and analyzes it.
.
To keep software developers from spending too much time on the arcana of distributed programming, Google invented MapReduce as a way of simplifying the process.
.
MapReduce takes programming instructions and assigns them to be executed in parallel on many computers. It breaks calculations into two parts—a first stage, which produces a set of intermediate results, and a second, which computes a final answer.
The concept comes from functional programming languages such as Lisp (Google’s version is implemented in C++, with interfaces to Java and Python).
.
One example, from a Google developer presentation, shows how the phrase “to be or not to be” would move through this process.

.
MapReduce includes its own middleware—server software that automatically breaks computing jobs apart and puts them back together. This is similar to the way a Java programmer relies on the Java Virtual Machine to handle memory management, in contrast with languages like C++ that make the programmer responsible for manually allocating and releasing computer memory. In the case of MapReduce, the programmer is freed from defining how a computation will be divided among the servers in a Google cluster. More
.
Related Stuffs
What Other CIOs Can Learn from Google
Why Parallel Processing Makes Sense
Would Google’s File System Work for You?
•

Thanks for the post… I’m currently researching some distributed models and your post adds some “food for thought” and useful references.
@Jonas
Thanks as you liked the post and for your comment too.