推广 热搜： 采购方式滤芯带式称重给煤机甲带气动隔膜泵减速机型号无级变速机链式给煤机履带减速机

＂多模耦合的开放集视频自动解析方法研究＂项目结题报告

日期：2026-05-10 16:14:17 来源：网络整理作者：本站编辑评论：0

国家自然科学基金项目

多模耦合的开放集视频自动解析方法研究

基本信息

项目批准号：61772359

申请代码：F0210

项目名称：多模耦合的开放集视频自动解析方法研究

项目负责人：刘安安

依托单位：天津大学

研究期限：2018-01-01 至 2021-12-31

资助经费：63.0（万元）

中文摘要：

面对多领域和跨平台的视频大数据，如何将其自动解析为符合人类认知的自然语言描述，从而满足视频管理、信息检索和自动问答等需求，已成为当前亟待解决的问题。视频自动解析相关研究尚处于初级阶段，缺乏桥接计算机视觉与自然语言处理的成熟理论来指导人们跨越视觉语义鸿沟。针对现有问题,本课题以多模耦合的开放集视频自动解析理论研究为核心，在明确科学问题基础上，重点探索视觉和文本数据潜在关联，通过深度学习网络构建实现视频自动解析，并通过与可迁移语义模型及时序注意力模型结合，使得所构建视频解析模型不受视频来源和内容的约束，模型学习不受封闭语义集合的约束，所生成视频描述更加突出视频所包含的重要语义。在此基础上，针对多样化的数据来源和用户需求构建完整的视频自动解析系统，多角度验证所提出理论的科学性和可行性，为相关应用大规模产业化提供技术支撑，为面向智慧生活的智能服务提供技术保障。

英文摘要：

Facing the availability of multi-domain and cross-platform big visual data, the fundamental research problem in visual analysis is how to decipher videos with natural language that aligns with human cognition (i.e., video captioning). The availability of such model can enable several applications, such as video management, information retrieval, automated question & answer, and so on. Currently, scientific research on video captioning is still in its infancy stage. In particularly, it lacks of advanced theoretical studies that systematically correlate computer vision models and natural language processing models to assist computer scientists to overcome the semantic gap in visual understanding. To handle these problems, this proposal focuses on open-domain video captioning by coupling multiple modalities. Based on this scientific problem, our primary objective is to explore the latent correlation between visual and textual data to construct the deep learning model for video captioning. Moreover, the designed video captioning model will be integrated with the transferable semantic model and the sequential attention model. The proposed model has three advantages: 1) the proposed network architecture is independent of video sources and contents; 2) model learning is not constrained by the limited semantic concepts appearing in the training data; 3) the generated video description can highlight the key semantic conepts of the video contents. Building on these techniques, we will develop a video captioning prototype based on multiple video sources and diverse user’ requirements. Furthermore, the prototype can validate the scientificity and feasibility of the proposed methods. The achievements of this proposal will contribute on the technical knowledge to support large-scale industrialization, as well as enhance smart services in the future smart living.

结题摘要

将视觉大数据自动解析为符合人类认知的自然语言描述，满足信息检索和自动问答等需求，是当前跨媒体计算领域研究热点。本课题突破视觉和语言多模态潜在上下文关联挖掘，探索大规模新语义学习机制，构建面向视觉显著性分析的注意力模型，从而提升视频解析模型生成自然语言描述的内容完整性和语义关联性。创新成果包括：1）针对多模态数据关联挖掘困难，提出基于多层级上下文建模的语义识别法和基于多尺度细粒度对齐的跨模态匹配法，实现多层级语义的级联感知与共享；提出多模异步状态融合的序列生成模型和基于上下文的多步推理语义校正方法，加深模型对复杂上下文关联语义的理解。2）针对大规模新语义迁移学习的困难，提出自适应聚类驱动的多语义识别方法，实现基于聚类驱动的语义识别；提出基于跨模态图结构语义对齐和基于多层级奖惩机制的序列生成模型，提升描述语句的多样化表达能力。3）针对视频序列显著性分析的困难，提出基于图注意力卷积网络和互注意力机制的显著语义感知方法，实现可回溯推理的目标检测和复杂关系识别；提出基于区域协同关联和结构化交互融合的描述生成方法，实现由显著性区域感知引导的描述语句生成。在此基础上，集成多模态上下文关联挖掘、新语义建模、序列显著性分析等创新成果，构建面向社交网络跨媒体信息的互联网舆情分析平台，在相关合作单位开展示范应用。.研究成果：1）发表论文37篇，其中SCI检索25篇，IEEE/ACM汇刊17篇，CCF-A类国际会议论文12篇；受理中国发明专利11项,授权1项；2）奖励：以第一完成人获天津市科技进步特等奖、二等奖各一项；中国多媒体大会最佳论文一篇；3）国际交流：担任Multimedia Systems和Visual Informatics编委，组办2019年欧洲图形学会议的2D图像到3D模型检索比赛，多次担任CCF-A类国际会议ACM MULTIMEDIA的领域主席，9次参加国际会议，并做报告；4）人才培养：本人入选2021年全球顶尖前10万科学家、爱思唯尔高被引学者、天津市131创新人才；晋升副教授2名，培养讲师2名、毕业博士/硕士生3/10名、在读博士/硕士生5/8名。

从官网获取报告原文信息请点击左下角“阅读原文”！

！

以科技赋能产业

以创新引领发展

未来，已来！

打赏

更多>同类资讯

0 条相关评论

推荐图文

推荐资讯

点击排行