混合特征编码的HTAP工作负载识别方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家重点研发计划(2022YFF0503900); 山东省重点研发计划(2021CXGC010104); 中国科学院软件研究所重大项目(ISCAS-ZD-202401, ISCAS-ZD-202403)


HTAP Workload Identification Method Based on Hybrid Feature Coding
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    HTAP数据库在一套系统中同时支持OLTP和OLAP工作负载. 其中工作负载的识别是查询执行中路由分发的关键, 只有准确识别出查询属于OLTP或OLAP, 才能对查询进行合理优化和分配资源. 因此, 准确识别工作负载类型是HTAP数据库性能的关键因素之一. 然而, 现有的负载识别方法主要基于SQL语句中的规则和成本代价, 以及传统机器学习的方法来区分工作负载. 这些方法没有考虑查询语句的自身特点, 也没有利用执行计划的结构信息, 影响识别工作负载的准确率. 为了提高负载识别的准确性, 提出了一种智能识别OLTP和OLAP工作负载的方法, 该方法通过对SQL语句和执行计划进行特征提取和特征编码, 基于BERT构建SQL语句编码器, 结合树卷积神经网络和注意力机制构建执行计划的编码器, 两种特征融合构建分类器, 该模型能够智能识别HTAP混合负载中的工作负载. 通过实验验证, 模型可以准确识别OLTP和OLAP工作负载, 具有较高的识别准确率. 同时, 在多种数据集中验证了模型的鲁棒性, 并将模型集成到TiDB数据库中验证了其对数据库性能的提升.

    Abstract:

    HTAP databases are capable of simultaneously supporting OLTP and OLAP workloads within a set of systems. The workload identification is a critical entry point for routing distribution in query execution. The only way to reasonably optimize the queries and allocate resources is to accurately identify whether a query belongs to OLTP or OLAP. Therefore, accurate identification of workload types is a key factor in the performance of HTAP databases. However, existing workload identification methods are mainly based on rules and cost-based measures in SQL statements, as well as machine learning approaches to differentiate workloads. These methods do not consider the inherent characteristics of query statements and utilize structural information in execution plans, resulting in low workload identification accuracy. To improve workload identification accuracy, this study proposes an intelligent method for identifying OLTP and OLAP workloads. This method extracts and encodes features from SQL statements and execution plans, builds the SQL statement encoder based on BERT, and combines the convolutional neural networks and attention mechanisms to construct the encoder of execution plans, with two types of features integrated to build a classifier. The model enables intelligent identification of workloads in HTAP hybrid workloads. Experimental verification shows that the proposed model can accurately identify OLTP and OLAP workloads with high identification accuracy. Additionally, the robustness of the model has been validated across multiple datasets, and the model is integrated into the TiDB database to verify its performance improvement on the database.

    参考文献
    相似文献
    引证文献
引用本文

杨建文,丁治明,严瑾,张秋鸿,朱美玲.混合特征编码的HTAP工作负载识别方法.软件学报,,():1-18

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-06-18
  • 最后修改日期:2024-12-28
  • 录用日期:
  • 在线发布日期: 2025-11-26
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号