<dd id="om44c"><optgroup id="om44c"></optgroup></dd>
  • <xmp id="om44c"><nav id="om44c"></nav>
    <xmp id="om44c"><nav id="om44c"></nav>
    <menu id="om44c"></menu>

    科學研究

    Research

    首頁 >  論文  > 詳情

    BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

    發表會議及期刊:arXiv


    Zhiqi Li1,2? , Wenhai Wang2? , Hongyang Li2? , Enze Xie3 , Chonghao Sima2 , 

    Tong Lu1 , Yu Qiao2 , Jifeng Dai2?

     1Nanjing University  2Shanghai AI Laboratory  3The University of Hong Kong

    1.png

    Figure 1: We propose BEVFormer, a paradigm for autonomous driving that applies both Transformer and Temporal structure to generate bird’s-eye-view (BEV) features from multi-camera inputs. BEVFormer leverages queries to lookup spatial/temporal space and aggregate spatiotemporal information correspondingly, hence benefiting stronger representations for perception tasks.

    Abstract

    3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose a temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code will be released at https://github.com/zhiqi-li/BEVFormer

    comm@pjlab.org.cn

    上海市徐匯區云錦路701號西岸國際人工智能中心37-38層

    滬ICP備2021009351號-1

    最近中文字幕国语免费完整