国家自然科学基金(62232009, 61925205, 62072261)
云基础设施的虚拟化、高可用、可弹性调度等特点, 为云数据库提供了开箱即用、可靠可用、按需计费等优势. 云数据库按照架构可以划分为云托管数据库(cloud-hosted database)以及云原生数据库(cloud-native database). 云托管数据库将数据库系统直接部署到云上虚拟机环境中, 具备低成本、易运维、高可靠的优势. 在此基础上, 云原生数据库充分利用云基础设施弹性伸缩的特点, 采用计算存储分离的架构, 实现了计算资源和存储资源的独立伸缩, 进一步提升数据库性价比. 然而计算存储分离的架构为数据库系统设计带来了新的挑战. 深入分析云原生数据库系统的架构和技术. 首先将云原生OLTP和云原生OLAP的数据库架构按照资源分离模式的差异分别进行归类分析, 并对比各类架构的优势与局限. 其次, 基于计算存储分离的架构, 按照各个功能模块深入探讨云原生数据库的关键技术: 主要包括云原生OLTP关键技术(数据组织、副本一致性、主备同步、故障恢复以及混合负载处理)和云原生OLAP关键技术(存储管理、查询处理、无服务器感知计算、数据保护以及机器学习优化). 最后, 总结现有云原生数据库的技术挑战并展望未来研究方向.
The virtualization, high availability, high scheduling elasticity, and other characteristics of cloud infrastructure provide cloud databases with many advantages, such as the out-of-the-box feature, high reliability and availability, and pay-as-you-go model. Cloud databases can be divided into two categories according to the architecture design: cloud-hosted databases and cloud-native databases. Cloud-hosted databases, deploying the database system in the virtual machine environment on the cloud, offer the advantages of low cost, easy operation and maintenance, and high reliability. Besides, cloud-native databases take full advantage of the characteristic elastic scaling of the cloud infrastructure. The disaggregated compute and storage architecture is adopted to achieve the independent scaling of computing and storage resources and further increase the cost-performance ratio of the databases. However, the disaggregated compute and storage architecture poses new challenges to the design of database systems. This survey is an in-depth analysis of the architecture and technology of the cloud-native database system. Specifically, the architectures of cloud-native online transaction processing (OLTP) and online analytical processing (OLAP) databases are classified and analyzed, respectively, according to the difference in the resource disaggregation mode, and the advantages and limitations of each architecture are compared. Then, on the basis of the disaggregated compute and storage architectures, this study explores the key technologies of cloud-native databases in depth by functional modules. The technologies under discussion include those of cloud-native OLTP (data organization, replica consistency, main/standby synchronization, failure recovery, and mixed workload processing) and those of cloud-native OLAP (storage management, query processing, serverless-aware compute, data protection, and machine learning optimization). At last, the study summarizes the technical challenges for existing cloud-native databases and suggests the directions for future research.