电子科技 ›› 2023, Vol. 36 ›› Issue (10): 15-23.doi: 10.16180/j.cnki.issn1007-7820.2023.10.003

• • 上一篇    下一篇

一种基于注意力机制和上下文感知的三维目标检测网络

张吴冉,李菲菲   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2022-05-07 出版日期:2023-10-15 发布日期:2023-10-20
  • 作者简介:张吴冉(1995-),男,硕士研究生。研究方向:计算机视觉与目标检测。|李菲菲(1970-),女,博士,教授。研究方向:多媒体信息处理、图像处理与目标识别、信息检索等。
  • 基金资助:
    上海市高校特聘教授(东方学者)岗位计划资助(ES2015XX)

A 3D Object Detection Network Based on Attention Mechanism and Context Awareness

ZHANG Wuran,LI Feifei   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology, Shanghai 200093,China
  • Received:2022-05-07 Online:2023-10-15 Published:2023-10-20
  • Supported by:
    The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(ES2015XX)

摘要:

随着自动驾驶的发展,行车安全成为关键性问题。由于点云场景杂乱、背景环境干扰大以及场景采集范围扩增,点云愈加稀疏,使得检测算法鲁棒性变弱。为缓解以上问题,文中提出了一种基于注意力机制和上下文感知的三维目标检测算法。在点云处理阶段,增加点云双向注意力机制生成点权重矩阵,显示标注重要点数据,抑制背景噪声干扰。在伪图特征提取模块中,添加FPN(Feature Pyramid Network)模块重复利用多尺度特征,引入上下文感知模块(Context Awareness Module,CAM)捕捉多比例的上下文语义,并基于源特征设计注意力导向模块(Attention Guide Module,AGM)生成空间位置清晰的导向权图,缓解冗余特征导致的空间模糊问题。文中网络基于KITTI数据集进行测试,结果表明在困难指标下,和基线网络相比,所提方法中行人、汽车和骑行者平均精度(Average Precision,AP)分别提升了0.59%、0.87%和1.42%;和新基线网络相比,在3种难度级别下,所提方法中行人的平均精度分别提升了3.04%、3.53%和3.23%,结果证明改进网络可有效提升三维目标检测性能。

关键词: 点云, 目标检测, 注意力机制, 特征金字塔, 多尺度特征, 上下文感知, 注意力导向, 深度学习

Abstract:

As research in the field of autonomous driving has attracted much attention, driving safety has become the primary consideration. Because the point cloud scene is cluttered and the background environment interferes greatly, and with the expansion of the acquisition range, the point cloud becomes more sparse, which makes the robustness of the detection algorithm weaker. To alleviate the above problems, this study proposes a 3D object detection network based on attention mechanism and context awareness. In the point cloud processing stage, a double attention mechanism based point cloud is added to generate a point weight matrix, display and mark important point data, and suppress background noise interference. In the pseudo-map feature extraction module, the FPN(Feature Pyramid Network) module is added to reuse multi-scale features, and a Context Awareness Module(CAM) is designed to capture multi-scale context semantics. Furthermore, an Attention Guide Module(AGM) is proposed based on the source features to generate a guidance weight map with clear spatial positions, so as to alleviate the spatial ambiguity caused by redundant features. The experiments in this study are carried out on the KITTI data set test. Compared with the baseline network, the Average Precision(AP) of the proposed method for pedestrians, cars and cyclists is improved by 0.59%, 0.87% and 1.42% respectively under the difficulty index. Compared with the new baseline network, the AP of the proposed method for pedestrians is improved by 3.04%, 3.53% and 3.23% under the three difficulty levels, respectively. The results show that the proposed algorithm can effectively improve the performance of 3D object detection.

Key words: point cloud, object detection, attention mechanism, FPN, multi-scale feature, context awareness, attention guide, deep learning

中图分类号: 

  • TP391