基于双分支特征提取和自适应胶囊网络的DGA域名检测方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP393

基金项目:

国家自然科学基金(62201576,U1833107);中央高校基本科研业务费专项资金(3122022050);中国民航大学信息安全测评中心开放基金(ISECCA-202202);中国民航大学学科经费资助


DGA Domain Name Detection Method Based on Double Branch Feature Extraction and Adaptive Capsule Network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    面向域名生成算法(DGA,domain generation algorithm)的域名检测方法普遍具有特征提取能力弱、特征信息压缩比高等特点,这导致特征信息丢失、特征结构破坏以及域名检测效果较差等诸多不足.针对上述问题,提出一种基于双分支特征提取和自适应胶囊网络的DGA域名检测方法.首先,通过样本清洗和字典构建重构原始样本并生成重构样本集.其次,通过双分支特征提取网络处理重构样本,在其中利用切片金字塔网络提取域名局部特征,利用Transformer提取域名全局特征,并利用轻量级注意力融合不同层次的域名特征.然后,利用自适应胶囊网络计算域名特征图的重要度系数,将域名文本特征转换为向量域名特征,并通过特征转移计算基于文本特征的域名分类概率,同时利用多层感知机处理域名统计特征,以此计算基于统计特征的域名分类概率.最后,通过合并得到的两种不同视角的域名分类概率进行域名检测.大量的实验表明,本文所提方法在DGA域名检测以及DGA域名家族检测分类方面均取得了当前领先的检测效果,其中,在DGA域名检测中F1分数提升了0.76%~5.57%,在DGA域名家族检测分类中F1分数(宏平均)提升了1.79%~3.68%.

    Abstract:

    The existing domain name detection methods for domain generation algorithm (DGA) generally have the characteristics of weak feature extraction ability and high feature information compression ratio, which lead to feature information loss, feature structure destruction, and poor domain name detection performance. Aiming at the above problems, a DGA domain name detection method based on double branch feature extraction and adaptive capsule network is proposed. Firstly, the original samples are reconstructed through sample cleaning and dictionary construction, and the reconstructed sample set is generated. Secondly, the reconstructed samples are processed by a double branch feature extraction network, in which the domain name local features are extracted by using a sliced pyramid network, the domain name global features are extracted by using a transformer, and the features at different levels are fused by using lightweight attention. Then, using an adaptive capsule network to calculate the importance coefficient of the domain name feature map, convert domain name text features into vector domain name features, and calculate the domain name classification probability based on text features by feature transfer, meanwhile, using multilayer perceptron to process domain name statistical features, to calculate the domain name classification probability based on statistical features. Finally, domain name detection is performed by combining the domain name classification probabilities from two different perspectives. A large number of experiments show that the method proposed in this paper achieves leading detection results in DGA domain name detection and DGA domain name family detection and classification, where the F1-score in DGA domain name detection increased by 0.76% to 5.57%, and the F1-score(macro average) in DGA domain name family detection classification increased by 1.79% to 3.68%.

    参考文献
    相似文献
    引证文献
引用本文

杨宏宇,章涛,张良,成翔,胡泽.基于双分支特征提取和自适应胶囊网络的DGA域名检测方法.软件学报,2024,35(8):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-09-10
  • 最后修改日期:2023-10-30
  • 录用日期:
  • 在线发布日期: 2024-01-05
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号