[关键词]
[摘要]
尽管基于度量学习范式的小样本行为识别方法已经取得了巨大成功,然而有以下问题无法同时解决。(1)无法很好的进行动作关系建模,没能充分利用模态信息。(2)无法处理好不同长度不同速度视频匹配问题,无法处理好视频子动作错位的视频匹配问题。为了解决以上问题,本文提出了一个基于双流互信息联合匹配的小样本行为识别方法,该方法分为两个模块:多模态对比学习模块和联合匹配模块。多模态对比学习模块主要用以探索模态间的互信息,利用对比学习将同一个视频的不同模态特征视为正样本对进行拉近。联合匹配模块主要用以同时解决上述提到的视频匹配问题,通过动态时间规整算法和二部图匹配算法联合匹配得到最终的匹配结果,以获得良好的小样本行为识别分类准确度。本文在两个广泛使用的小样本行为识别数据集SSV2和Kinetics上评估方法的有效性,同时进行了大量的消融实验验证了本文所提方法的有效性。
[Key word]
[Abstract]
Although few-shot action recognition based on metric learning paradigm has achieved significant success, it fails to address the following issues: (1) inadequate action relation modeling and underutilization of multi-modal information;(2) challenges in handling video matching problems with different lengths and speeds, and video matching problems with misalignment of video sub-actions. To address these issues, we propose a Two-Stream Joint Matching method based on mutual information (TSJM), which consists of two modules: Multi-modal Contrastive Learning Module (MCL) and Joint Matching Module (JMM). The objective of the MCL is to extensively investigate the inter-modal mutual information relationships, thereby thoroughly extracting modal information to enhance the modeling of action relationships. The JMM aims to simultaneously address the aforementioned video matching problems. The effectiveness of the proposed method is evaluated on two widely used few shot action recognition datasets, namely, SSv2 and Kinetics. Comprehensive ablation experiments are also conducted to substantiate the efficacy of our proposed approach.
[中图分类号]
TP181???????
[基金项目]