基于模态偏差补偿的弱监督时序行为定位方法
CSTR:
作者:
作者单位:

1.重庆大学大数据与软件学院;2.上海交通大学计算机科学与工程系

作者简介:

通讯作者:

中图分类号:

TP389.1???????

基金项目:

国家自然科学基金面上项目(62176031);重庆市技术创新与应用发展专项重点项目(CSTB2022TIAD-KPX0100)。


Weakly supervised temporal action localization method based on modal bias compensation
Author:
Affiliation:

1.School of Big Data Software Engineering,Chongqing University;2.Department of Computer Science and Engineering, Shanghai Jiao Tong University

Fund Project:

National Natural Science Foundation of China General Program(62176031); The Chongqing Special Key Project for Technological Innovation and Application Development(CSTB2022TIAD-KPX0100).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    弱监督时序行为定位由于在智能监控、视频检索等领域展现出应用潜力,且训练数据标注成本较低,成为了视频理解领域的研究热点之一。针对已有基于多模态学习的定位方法忽略各个模态自身的偏差导致的定位性能欠佳问题,构建了RGB运动主体信息抑制模块,并设计了光流主导影响抑制策略,旨在消除各个模态对训练模型造成的定位偏差。在两个基准数据集THUMOS14和ActivityNet v1.2上的实验结果显示,多尺度时序交并比下的平均精确率均值分别达到了45.3%、26.5%,整体定位性能优于主流方法,实验结果表明了所提出方法的有效性。本方法的优势是仅在粗粒度的模态级别探索各个模态带来的定位偏差并进行补偿,提高了基于多模态学习的时序行为定位模型的基础定位性能,有利于和细粒度视角下的定位方法相兼容。

    Abstract:

    Weakly supervised temporal action localization has become one of the research hotspots in the field of video understanding due to its application potential in intelligent monitoring, video retrieval and other fields, and its low cost of training data annotation. In response to the poor localization performance caused by existing multimodal learning-based localization methods ignoring the biases inherent in each modality, we constructed an RGB action subject information compensation module and designed a optical flow-based dominant influence suppression strategy aimed at eliminating the location bias caused by each modality on the training model. Experimental results on two benchmark datasets THUMOS14 and ActivityNet v1.2 show that under multi-scale temporal intersection over union, mean average precision reached 45.3% and 26.5% respectively, overall localization performance is better than some latest methods, which demonstrates effectiveness of our proposed method. This method improves the basic localization performance of temporal action location models by compensating for bias at a coarse-grained modal level. It is also compatible with fine-grained viewpoint localization methods.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-09
  • 最后修改日期:2024-05-22
  • 录用日期:2024-07-31
  • 在线发布日期:
  • 出版日期:
文章二维码