PLTree:一个高性能持久化内存学习索引
DOI:
作者:
作者单位:

浙江大学计算机学院

作者简介:

通讯作者:

中图分类号:

基金项目:


PLTree: A High-Performance Learning Index For Persistent Memory
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    持久化内存(Persistent Memory, PM)作为主存的补充和替代,为数据存储提供了相对较低的价格成本,并且保证了数据的持久化.为 PM 设计的传统结构索引(如 B+树等)未能充分利用数据分布特点来发挥索引在 PM 上的读写性能.最近的研究尝试利用学习索引的数据分布感知能力提升索引在 PM 上的读写性能并实现持久化.但在面对真实世界的数据时,现有基于 PM 的持久化学习索引的数据结构设计会导致额外的内存访问,从而影响读写性能.针对PM 学习索引在面对真实数据时读写性能下降的问题,本文提出了一种 DRAM/PM 混合架构的学习索引 PLTree.它通过以下方法提升了在 PM 上的读写性能并减轻数据分布颠簸对性能的影响:(1)使用两阶段方法构建索引消除内部节点的局部搜索,减少 PM 的访问.(2)利用模型搜索来优化 PM上的查找性能并通过在 DRAM 存储元数据加速查找.(3)根据 PM 的特性设计了日志式分层溢出缓存结构,优化写入性能.实验结果表明,在不同数据集上,与现有的持久化内存索引(APEX, FPTree,uTree,NBTree和DPTree)相比,PLTree 在索引构建性能上平均提升了约 1.9 倍~34 倍;单线程查询/插入性能平均提升了约1.26 倍~4.45 倍和 2.63 倍~6.83 倍;在多线程场景,查询/插入性能最高提升了约10.2 倍和 23.7 倍

    Abstract:

    Persistent Memory (PM) is a viable solution to address the limitations of high cost and low capacity in main memory while ensuring data persistence. However, traditional index structures designed for PM like B+-trees have failed to fully exploit the distribution characteristics of data for optimizing read/write performance on PM. Recent research has attempted to leverage the data distribution awareness of learned indexes to enhance PM's read/write performance and support index persistence.However,existing designs of persistent learned index structures suffer from additional PM accesses and poor performance when dealing with real-world data. To address the performance degradation issue of persistent learned indexes when facing real data distributions, this paper introduces PLTree, a DRAM/PM hybrid architecture persistent learned index. PLTree optimizes read/write performance under real data distributions by employing the following techniques: (1) a two-stage approach to construct the index, eliminating last mile search in internal nodes and reducing PM accesses, (2) model-based searching for efficient query performance on PM, accelerated by leveraging metadata stored in DRAM, and (3) a log-structured hierarchical overflow buffer structure tailored to PM characteristics, optimizing write performance. The results show that, compared to state-of-the-art index (APEX,FPTree,uTree,NBTree and DPTree), PLTree achieves 1.9x to 34x better performance in index construction on different datasets. In single-threaded scenarios, PLTree achieves an average query and insertion performance improvement of 1.26x to 4.45x and 2.63x to 6.83x, respectively. In multi-threaded scenarios, PLTree outperforms the baseline by up to 10.2x and 23.7x in query and insertion performance, respectively.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-11
  • 最后修改日期:2024-03-27
  • 录用日期:2024-04-02
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号