[关键词]
[摘要]
弱监督时序行为定位由于在智能监控、视频检索等领域展现出应用潜力,且训练数据标注成本较低,成为了视频理解领域的研究热点之一。针对已有基于多模态学习的定位方法忽略各个模态自身的偏差导致的定位性能欠佳问题,构建了RGB运动主体信息抑制模块,并设计了光流主导影响抑制策略,旨在消除各个模态对训练模型造成的定位偏差。在两个基准数据集THUMOS14和ActivityNet v1.2上的实验结果显示,多尺度时序交并比下的平均精确率均值分别达到了45.3%、26.5%,整体定位性能优于主流方法,实验结果表明了所提出方法的有效性。本方法的优势是仅在粗粒度的模态级别探索各个模态带来的定位偏差并进行补偿,提高了基于多模态学习的时序行为定位模型的基础定位性能,有利于和细粒度视角下的定位方法相兼容。
[Key word]
[Abstract]
Weakly supervised temporal action localization has become one of the research hotspots in the field of video understanding due to its application potential in intelligent monitoring, video retrieval and other fields, and its low cost of training data annotation. In response to the poor localization performance caused by existing multimodal learning-based localization methods ignoring the biases inherent in each modality, we constructed an RGB action subject information compensation module and designed a optical flow-based dominant influence suppression strategy aimed at eliminating the location bias caused by each modality on the training model. Experimental results on two benchmark datasets THUMOS14 and ActivityNet v1.2 show that under multi-scale temporal intersection over union, mean average precision reached 45.3% and 26.5% respectively, overall localization performance is better than some latest methods, which demonstrates effectiveness of our proposed method. This method improves the basic localization performance of temporal action location models by compensating for bias at a coarse-grained modal level. It is also compatible with fine-grained viewpoint localization methods.
[中图分类号]
TP389.1???????
[基金项目]
国家自然科学基金面上项目(62176031);重庆市技术创新与应用发展专项重点项目(CSTB2022TIAD-KPX0100)。