MapReduce大数据处理平台与算法研究进展
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家自然科学基金(61672143,61433008,61402090,61502090)


Research Advance on MapReduce Based Big Data Processing Platforms and Algorithms
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    综述了近年来基于MapReduce编程模型的大数据处理平台与算法的研究进展.首先介绍了12个典型的基于MapReduce的大数据处理平台,分析对比它们的实现原理和适用场景,抽象其共性;随后介绍基于MapReduce的大数据分析算法,包括搜索算法、数据清洗/变换算法、聚集算法、连接算法、排序算法、偏好查询、最优化算法、图算法、数据挖掘算法,将这些算法按照MapReduce实现方式分类,分析影响算法性能的因素;最后,将大数据处理算法抽象为外存算法,并对外存算法的特征加以梳理,提出了普适的外存算法性能优化方法的研究思路和问题,以供研究人员参考.具体包括优化外存算法的磁盘I/O、优化外存算法的局部性以及设计增量式迭代算法.现有的大数据处理平台和算法研究多集中在基于资源分配和任务调度的平台动态性能优化、特定算法并行化、特定算法性能优化等领域,所提出的外存算法性能优化属于静态优化方法,是现有研究的良好补充,为研究人员提供了广阔的研究空间.

    Abstract:

    This paper introduces the research advance on MapReduce based big data processing platforms. Frist, twelve typical MapReduce based data processing platforms are descripted, their implementation principles and application areas are compared, and their commonalities are concluded. Second, the MapReduce based big data processing algorithms, including search algorithms, data cleansing/transformation algorithms, aggregation algorithms, join algorithms, sorting algorithms, optimization algorithms, preference query algorithms, graph algorithms, and data mining algorithms, are studied. These algorithms are classified by their MapReduce implementations, and the factors that affect their performance are analyzed. Finally, big data processing algorithms are abstracted as the out-of-core algorithms whose performance features are well analyzed. The considerations, ideas and challenges of universal optimizations on the performance of out-of-core algorithms are proposed as references for researchers. These optimizations include optimizing algorithms' I/O cost and locality, and designing incremental iterative algorithms. Comparing the current topics, such as resource allocation and task scheduling based dynamic optimizations on platform, parallelization for specific algorithms, and performance optimizations on iterative algorithms, the proposed static optimizations serve as complements that highlight new areas for the researchers.

    参考文献
    相似文献
    引证文献
引用本文

宋杰,孙宗哲,毛克明,鲍玉斌,于戈. MapReduce大数据处理平台与算法研究进展.软件学报,2017,28(3):514-543

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2016-08-01
  • 最后修改日期:2016-09-14
  • 录用日期:
  • 在线发布日期: 2018-06-06
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号