PLTree:一个高性能持久化内存学习索引

微信服务号

微信订阅号

PLTree:一个高性能持久化内存学习索引
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:浙江大学计算机学院
作者简介:
通讯作者:
中图分类号:
基金项目:

PLTree: A High-Performance Learning Index For Persistent Memory

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

持久化内存(Persistent Memory, PM)作为主存的补充和替代,为数据存储提供了相对较低的价格成本,并且保证了数据的持久化.为 PM 设计的传统结构索引(如 B+树等)未能充分利用数据分布特点来发挥索引在 PM 上的读写性能.最近的研究尝试利用学习索引的数据分布感知能力提升索引在 PM 上的读写性能并实现持久化.但在面对真实世界的数据时,现有基于 PM 的持久化学习索引的数据结构设计会导致额外的内存访问,从而影响读写性能.针对PM 学习索引在面对真实数据时读写性能下降的问题,本文提出了一种 DRAM/PM 混合架构的学习索引 PLTree.它通过以下方法提升了在 PM 上的读写性能并减轻数据分布颠簸对性能的影响:(1)使用两阶段方法构建索引消除内部节点的局部搜索,减少 PM 的访问.(2)利用模型搜索来优化 PM上的查找性能并通过在 DRAM 存储元数据加速查找.(3)根据 PM 的特性设计了日志式分层溢出缓存结构,优化写入性能.实验结果表明,在不同数据集上,与现有的持久化内存索引(APEX, FPTree,uTree,NBTree和DPTree)相比,PLTree 在索引构建性能上平均提升了约 1.9 倍~34 倍;单线程查询/插入性能平均提升了约1.26 倍~4.45 倍和 2.63 倍~6.83 倍;在多线程场景,查询/插入性能最高提升了约10.2 倍和 23.7 倍

Abstract:

Persistent Memory (PM) is a viable solution to address the limitations of high cost and low capacity in main memory while ensuring data persistence. However, traditional index structures designed for PM like B+-trees have failed to fully exploit the distribution characteristics of data for optimizing read/write performance on PM. Recent research has attempted to leverage the data distribution awareness of learned indexes to enhance PM's read/write performance and support index persistence.However,existing designs of persistent learned index structures suffer from additional PM accesses and poor performance when dealing with real-world data. To address the performance degradation issue of persistent learned indexes when facing real data distributions, this paper introduces PLTree, a DRAM/PM hybrid architecture persistent learned index. PLTree optimizes read/write performance under real data distributions by employing the following techniques: (1) a two-stage approach to construct the index, eliminating last mile search in internal nodes and reducing PM accesses, (2) model-based searching for efficient query performance on PM, accelerated by leveraging metadata stored in DRAM, and (3) a log-structured hierarchical overflow buffer structure tailored to PM characteristics, optimizing write performance. The results show that, compared to state-of-the-art index (APEX,FPTree,uTree,NBTree and DPTree), PLTree achieves 1.9x to 34x better performance in index construction on different datasets. In single-threaded scenarios, PLTree achieves an average query and insertion performance improvement of 1.26x to 4.45x and 2.63x to 6.83x, respectively. In multi-threaded scenarios, PLTree outperforms the baseline by up to 10.2x and 23.7x in query and insertion performance, respectively.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-12-11
最后修改日期:2024-03-27
录用日期:2024-04-02
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史