Comparative Study on MapReduce and Spark for Big Data Analytics
Author:
Affiliation:

Clc Number:

Fund Project:

National Key Researh and Development Program of China (2016YFB1000901); National Natural ScienceFoundation of China (91746209); Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of theMinistry of Education (IRT17R3)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    This paper reviews two state-of-the-art algorithmic architectures, MapReduce and Spark, and compares them from their backgrounds, principles and application scenarios. The advantages and their corresponding limitations of these two algorithms are summarized. When dealing with non-iterative problems, MapReduce, by virtue of its task scheduling strategy and shuffle mechanisms, performs better than Spark in terms of intermediate data transfers and number of files. Spark can be used to deal with iterative problems and low latency issues, as it divides a computing task according to the dependencies between the data and the task. Compared with MapReduce, Spark can effectively reduce the number of intermediate data transmissions and the number of synchronizations, and improve the running efficiency of computing systems.

    Reference
    Related
    Cited by
Get Citation

吴信东,嵇圣硙. MapReduce与Spark用于大数据分析之比较.软件学报,2018,29(6):1770-1791

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 19,2017
  • Revised:
  • Adopted:
  • Online: February 08,2018
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063