BDMasker:面向开放大数据环境的动态数据保护系统
作者:
作者单位:

作者简介:

通讯作者:

牛家浩,E-mail:niu.jiahao@zte.com.cn

中图分类号:

基金项目:

国家重点研发计划(2021YFB3101100)


BDMasker:Dynamic Data Protection System for Open Big Data Environment
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    大数据成为国家基础性战略资源,数据的开放共享是我国大数据战略的核心.云原生技术和湖仓一体架构正在重构大数据基础设施并推动数据共享和价值传播.大数据产业和技术的发展,都需要更强的数据安全和数据共享能力.然而,开放环境下数据的安全问题已成为制约大数据技术发展与利用的瓶颈.无论开源大数据生态还是商业大数据系统,所引发的数据安全及隐私保护问题都日益凸显.开放大数据环境下的动态数据保护系统面临着数据可用性、处理高效性和系统可扩展性等方面的挑战.本文提出面向开放大数据环境的动态数据保护系统BDMasker,通过一种基于查询依赖模型(Query Dependency Model)的精准查询分析及查询改写技术,能够精准感知但不改变原始业务请求,实现动态脱敏全过程对业务零影响;通过面向多引擎的统一安全策略框架,实现了动态数据保护能力的纵向扩展和在多种计算引擎中的横向扩展;利用大数据执行引擎的分布式计算能力提升系统的数据保护处理性能.实验结果表明,BDMasker提出的精准SQL分析及改写技术是有效的,系统具有良好的扩展能力和性能表现,在TPC-DS和YCSB基准测试中整体性能波动在3%之内.

    Abstract:

    Big data has become a national basic strategic resource,while the opening and sharing of data is the core of our country's big data strategy.Cloud native technology and lake-house architecture are reconstructing the big data infrastructure and promoting data sharing and value dissemination.The development of big data industry and technology require stronger data security and data sharing capabilities.However,data security in an open environment has become a bottleneck which restricts the development and utilization of big data technology.The issues of data security and privacy protection have become increasingly prominent both in the open source big data ecosystem and the commercial big data system.Dynamic data protection system under open big data environment is now facing challenges of data availability,processing efficiency and system scalability and etc.This paper proposes a dynamic data protection system BDMasker for the open big data environment.Through a precise query analysis and query rewriting technology based on query dependency model,it can accurately perceive but not change the original business request,which indicates that the whole process of dynamic desensitization has zero impact on the business.Furthermore,its multi-engine-oriented unified security strategy framework realizes the vertical expansion of dynamic data protection capabilities and the horizontal expansion among multiple computing engines.The distributed computing capability of the big data execution engine can be used to improve the data protection processing performance of the system.The experimental results show that the precise SQL analysis and rewriting technology proposed by BDMasker is effectively,the system has good scalability and performance,and the overall performance fluctuates within 3% in the TPC-DS and YCSB benchmark tests.

    参考文献
    相似文献
    引证文献
引用本文

屠要峰,牛家浩,王德政,高洪,徐进,洪科,阳方. BDMasker:面向开放大数据环境的动态数据保护系统.软件学报,2023,34(3):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-05-14
  • 最后修改日期:2022-09-07
  • 录用日期:
  • 在线发布日期: 2022-10-26
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号