交通场景多模态双阶反馈的三维目标检测方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(62473307)


Multi-modal 3D Object Detection Method for Traffic Scenarios Based on Two-stage Feedback
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    智能驾驶技术的最新进展主要体现在环境感知层面,其中传感器数据融合对提升系统性能至关重要。点云数据虽能提供精确三维空间描述,但存在无序性和稀疏性;图像数据则分布规则且稠密,二者融合可弥补单模态检测的不足。然而,现有融合算法存在语义信息有限、模态交互不足等问题,多模态三维目标检测在高精度检测方面仍有提升空间。针对此问题,本文提出一种创新的多传感器融合方法:利用RGB图像深度补全生成伪点云,与真实点云结合以识别感兴趣区域。关键改进包括:采用可变形注意力的多层次特征提取,自适应扩展感受野至目标区域;利用二维稀疏卷积对伪点云进行高效特征提取,发挥其图像域规则分布特性;提出双阶反馈机制,在特征级通过多模态交叉注意力解决数据对齐问题,在决策级采用高效融合策略,实现多阶段交互训练。该方法有效解决了伪点云精度受限与计算量增大的矛盾,显著提升了特征提取效率与检测精度。在KITTI数据集的实验表明,所提方法在三维交通要素检测任务中实现了更优的性能,充分验证了算法的有效性,为智能驾驶环境感知中的多模态融合提供了新思路。

    Abstract:

    The latest advancements in intelligent driving technology are primarily reflected in the environmental perception layer, where sensor data fusion is critical for enhancing system performance. Although point cloud data provides accurate 3D spatial descriptions, it suffers from unorderedness and sparsity; image data, with its regular distribution and dense semantics, can compensate for the limitations of single-modality detection when fused with point clouds. However, existing fusion algorithms face challenges such as limited semantic information and insufficient modal interaction, leaving room for improvement in high-precision multi-modal 3D object detection. To address this, this paper proposes an innovative multi-sensor fusion method: generating pseudo-point clouds via depth completion from RGB images and combining them with real point clouds to identify regions of interest (RoIs). It includes three key improvements: deformable attention-based multi-layer feature extraction to adaptively expand the receptive field to target regions; 2D sparse convolution for efficient pseudo-point cloud feature extraction, leveraging their regular distribution in the image domain; and a two-stage feedback mechanism, which uses multi-modal cross-attention at the feature level to solve data alignment issues and an efficient fusion strategy at the decision level for interactive training across different stages. These innovations effectively resolve the contradictions between pseudo-point cloud accuracy and computational load, significantly enhancing feature extraction efficiency and detection accuracy. Experimental results on the KITTI dataset demonstrate the proposed method's superior performance in 3D traffic element detection, validating its effectiveness and offering a new approach for multi-modal fusion in autonomous driving environmental perception.

    参考文献
    相似文献
    引证文献
引用本文

唐文能,李垚辰,高笙景,高聪,彭越涵,刘跃虎.交通场景多模态双阶反馈的三维目标检测方法.软件学报,2026,37(5):

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-05-26
  • 最后修改日期:2025-07-11
  • 录用日期:
  • 在线发布日期: 2025-09-23
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号