Abstract:In the era of large models, the widespread use of vector databases has led to a rapid expansion in the scale of vector indexes. How to efficiently support large-scale vector updates in disk-based vector indexes while maintaining high query performance has become an important research problem in recent years. FreshDiskANN, as a leading algorithm, suffers from query throughput bottlenecks and high tail latency under mixed query-update workloads. Inspired by the successful application of log-structured merge (LSM) in secondary indexes, LSMDiskANN is proposed as an update-friendly disk-resident vector index framework based on the LSM paradigm. Building on the FreshDiskANN architecture, a three-level structure including a disk intermediate level is designed and implemented. In addition, a dynamic parameter selection mechanism for disk component search and a re-layout strategy for the deletion phase of compaction are introduced to further reduce query latency and I/O overhead during merges. Experimental results show that on multiple large-scale, high-dimensional datasets, query throughput is improved by up to 35.5%, update throughput by up to 14.24%, and tail query latency is reduced by up to 73.45%. The proposed framework and strategies effectively enhance overall performance and stability under mixed workloads.