Vector Index Recall Optimization for Batch Updates

doi:10.13328/j.cnki.jos.007519

微信小程序

微信服务号

微信订阅号

Home > Archive>Volume 37, Issue 3, 2026 >1084-1103. DOI:10.13328/j.cnki.jos.007519

PDF HTML XML Export Cite reminder

Vector Index Recall Optimization for Batch Updates
DOI:
                        10.13328/j.cnki.jos.007519
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Approximate nearest neighbor search (ANNS) is a foundational technology supporting applications such as vector databases, recommendation systems, and large language models (LLMs). Among these, the hierarchical navigable small world (HNSW) graph indexing technique constructs a hierarchical structure to quickly locate results within the target region, thus achieving high retrieval recall at low computational cost. However, existing HNSW algorithms are primarily designed for static data retrieval scenarios and fail to account for the impact of data updates on retrieval performance. Through research on real-world datasets, it is found that data in vector databases is typically updated in batches, and their similar characteristics weaken the effectiveness of heuristic pruning in HNSW algorithms and lead to sparsification issues in the connections among similar vectors, collectively causing a significant decline in retrieval recall. To address these issues, this study proposes an adaptive fine-grained pruning strategy based on local adjustments to the graph structure and constructs a comprehensive optimization scheme that integrates an identification and repair mechanism. First, in the identification phase, the regional neighbor distance is calculated to quantify local topological density, thereby precisely locating the dense regions requiring intervention. Second, in the repair phase, for hub nodes in dense regions, a dual pruning neighbor selection strategy is adopted: native and modified heuristic pruning rules are applied synergistically, and the results of both rules are merged to enhance neighbor connection diversity while maintaining retrieval accuracy, effectively alleviating over-pruning and connection sparsification issues. Experimental results on multiple public datasets show that the proposed method demonstrates good adaptability in scenarios with frequent data updates, achieving a 1%–4% improvement in recall while maintaining stable query latency and throughput.

Reference

Cited by

Get Citation

王可,胡思劼,胡卉芪,赵明昊,魏星,屠要峰,周烜.面向批量更新的向量索引召回率优化.软件学报,2026,37(3):1084-1103

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 07,2025
Revised:June 30,2025
Adopted:
Online: September 02,2025
Published: March 06,2026

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063