Inventors:
Grzegorz Malewicz - Mountain View CA, US
Marian Dvorsky - Sunnyvale CA, US
Christopher B. Colohan - Palo Alto CA, US
Derek P. Thomson - Palo Alto CA, US
Joshua Louis Levenberg - Menlo Park CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 7/38, G06F 9/00, G06F 9/44, G06F 15/00
Abstract:
A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application-independent map program retrieves a sequence of input data blocks assigned thereto by the master process and applies an application-specific map function to each input data block in the sequence to produce the intermediate data and stores the intermediate data in high speed memory of the interconnected processors. Each of the plurality of reduce processes receives a respective partition of the intermediate data from the high speed memory of the interconnected processors while the map processes continue to process input data blocks an application-specific reduce function is applied to the respective partition of the intermediate data to produce output values.