<dd id="om44c"><optgroup id="om44c"></optgroup></dd>
  • <xmp id="om44c"><nav id="om44c"></nav>
    <xmp id="om44c"><nav id="om44c"></nav>
    <menu id="om44c"></menu>

    科學研究

    Research

    首頁 >  論文  > 詳情

    VideoChat : Chat-Centric Video Understanding

    發表會議及期刊:arXiv

    KunChang Li?1,4 , Yinan He?1 , Yi Wang??1 , Yizhuo Li1,3 , Wenhai Wang1

    Ping Luo3 , Yali Wang4,1 , Limin Wang2,1 , Yu Qiao

    1OpenGVLab, Shanghai AI Laboratory     2Nanjing University     3The University of Hong Kong 4Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

     https://github.com/OpenGVLab/Ask-Anything 


    Abstract

    In this study, we initiate an exploration into video understanding by introducing VideoChat, an end-to-end chat-centric video understanding system. It integrates video foundation models and large language models via a learnable neural interface, excelling in spatiotemporal reasoning, event localization, and causal relationship inference. To instructively tune this system, we propose a video-centric instruction dataset, composed of thousands of videos matched with detailed descriptions and conversations. This dataset emphasizes spatiotemporal reasoning and causal relationships, providing a valuable asset for training chat-centric video understanding systems. Preliminary qualitative experiments reveal our system’s potential across a broad spectrum of video applications and set the standard for future research. Access our code and data at https://github.com/OpenGVLab/Ask-Anything. 


    comm@pjlab.org.cn

    上海市徐匯區云錦路701號西岸國際人工智能中心37-38層

    滬ICP備2021009351號-1

    最近中文字幕国语免费完整