Archive for August 3rd, 2007



Google’s distributed storage architecture for data is combined with distributed execution of the software that parses and analyzes it.


To keep software developers from spending too much time on the arcana of distributed programming, Google invented MapReduce as a way of simplifying the process.


MapReduce takes programming instructions and assigns them to be executed in parallel on many computers. It breaks calculations into two parts—a first stage, which produces a set of intermediate results, and a second, which computes a final answer.

The concept comes from functional programming languages such as Lisp (Google’s version is implemented in C++, with interfaces to Java and Python).


One example, from a Google developer presentation, shows how the phrase “to be or not to be” would move through this process.

The image “https://i1.wp.com/common.ziffdavisinternet.com/util_get_image/14/0,1425,i=140073,00.gif” cannot be displayed, because it contains errors.


MapReduce includes its own middleware—server software that automatically breaks computing jobs apart and puts them back together. This is similar to the way a Java programmer relies on the Java Virtual Machine to handle memory management, in contrast with languages like C++ that make the programmer responsible for manually allocating and releasing computer memory. In the case of MapReduce, the programmer is freed from defining how a computation will be divided among the servers in a Google cluster. More


Related Stuffs


What Other CIOs Can Learn from Google

Google’s Beginnings

Why Parallel Processing Makes Sense

Behind The Google File System

How Google Reduces Complexity

Google’s Secret Arsenal

Would Google’s File System Work for You?

Inside Google’s Enterprise


Read Full Post »