Wednesday, April 5, 2017

what is MapReduce Algorithm

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a paralleldistributed algorithm on a cluster

  • "Map" step: Each worker node applies the "map()" function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of redundant input data is processed.
  • "Shuffle" step: Worker nodes redistribute data based on the output keys (produced by the "map()" function), such that all data belonging to one key is located on the same worker node.
  • "Reduce" step: Worker nodes now process each group of output data, per key, in parallel.

credit goes to wikepedia

No comments:

Post a Comment