Journal of Software:2008.19(10):2573-2584

(数据工程与知识工程教育部重点实验室(中国人民大学),北京 100872; 中国石油信息技术服务中心,北京 100724)
Main Memory Database TPC-H Workload Characterization on Modern Processor
LIU Da-Wei,LUAN Hua,WANG Shan,QIN Biao
Chart / table
Similar Articles
Article :Browse 3875   Download 4201
Received:July 20, 2007    Revised:January 29, 2008
> 中文摘要: Ailamaki等人1999年研究了数据库管理系统(database management system,简称DBMS)在处理器上的时间开销分解.此后,相关研究集中在分析DBMS在处理器上的瓶颈.但这些研究工作均是在磁盘数据库DRDBs(disk resident databases)上开展的,而且都是分析DBMS上的TPC-C类负载.然而,随着硬件技术的进步,现代计算机的多级缓存结构(memory hierarchy)在逐渐地"上移".例如,容量越来越大的芯片内缓存(on-chip caches)和芯片外缓存(off-chip caches),容量越来越大的RAM,Flash Memory等等.为此,处理器负载分析的研究工作也应随之"上移".研究内存数据MMDBs(main memory resident databases)在计算密集型负载下的处理器行为特性.由于磁盘数据库的主要性能瓶颈是磁盘I/O,因而可以用索引、压缩等技术进行优化;然而,内存数据库的性能瓶颈却在于处理器和内存之间的数据交换.针对这一问题,首先分析了磁盘数据库和内存数据库在TPC-H负载下处理器性能瓶颈的差异,并给出了一些优化建议,提出了通过预取的优化方法.其次,通过实验比较了不同存储体系结构(行存储与列存储)对处理器利用率的差异,并探索了下一代内存数据库体系结构方面的解决方案.此外,还研究了索引结构对处理器多级缓存的影响,并给出了索引的优化建议.最后,提出一个微测试集用于评估内存数据库在DSS(decision support system)负载下处理器的性能及行为特性.研究结果会对运行于下一代处理器上的内存数据库体系结构设计和性能优化提供一定的实验依据.
Abstract:In 1999, the research of database systems' execution time breakdown on modern computer platforms has been analyzed by Ailamaki, et al. The primary motivation of these studies is to improve the performance of Disk Resident Databases (DRDBs), which form the main stream of database systems until now. The typical benchmark used in those studies is TPC-C. However, continuing hardware advancements have "moved-up" on the memory hierarchy, such as the larger and larger on-chip and off-chip caches, the steadily increasing RAM space, and the commercial availability of huge flash memory (solid-state disk) on top of regular disk, etc. To reflect such a trend, the target of workload characterization research along the memory hierarchy is also studied. This paper focuses on Main Memory Databases (MMDBs), and the TPC-H benchmark. Unlike the performance of DRDB which is I/O bound and may be optimized by high-level mechanisms such as indexing, the performance of MMDB is basically CPU and memory bound. In this study, the paper first compares the execution time breakdown of DRDB and MMDB, and the paper proposes an optimize strategy to optimize the memory resident aggregate. Then, the paper explores the difference between column-oriented and row-oriented storage models in CPU and cache utilization. Furthermore, the paper measures performance of MMDBs on different generational CPUs. In addition, the paper analyzes the index influence and gives a strategy for main memory database index optimization. Finally, the paper analyzes each query in the full TPC-H benchmark in detail, and obtains systematic results, which help design micro-benchmarks for further analysis of CPU cache stall. Results of this study are expected to benefit the performance optimization of MMDBs, and the architecture design memory-oriented databases of the next generation.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60496325, 60473069, 60503038 (国家自然科学基金); the Grant from HP Labs. China (国际合作(HP Lab.)项目) Supported by the National Natural Science Foundation of China under Grant Nos.60496325, 60473069, 60503038 (国家自然科学基金); the Grant from HP Labs. China (国际合作(HP Lab.)项目)
Foundation items:
Reference text:

刘大为,栾 华,王 珊,覃 飙.内存数据库在TPC-H负载下的处理器性能.软件学报,2008,19(10):2573-2584

LIU Da-Wei,LUAN Hua,WANG Shan,QIN Biao.Main Memory Database TPC-H Workload Characterization on Modern Processor.Journal of Software,2008,19(10):2573-2584