基于双流互信息联合匹配的小样本行为识别
CSTR:
作者:
作者单位:

1.重庆大学 大数据与软件学院,重庆 400044;2.西南计算机有限责任公司,重庆 400060

作者简介:

邓龙(1997—),男,硕士研究生,主要从事小样本行为识别方向研究,(E-mail) longdeng@cqu.edu.cn。

通讯作者:

葛永新,男,教授,博士生导师,(E-mail) yongxinge@cqu.edu.cn。

中图分类号:

TP181

基金项目:

重庆市技术创新与应用发展专项(CSTB2022TIAD-KPX0100);国家自然科学基金(62176031);中央高校基本科研业务费专项(2023CDJYGRHZD05)。


Two-stream joint matching based on mutual information for few-shot action recognition
Author:
Affiliation:

1.School of Big Data and Software Engineering, Chongqing University, Chongqing 400044, P. R. China;2.Southwest Computer Co., Ltd., Chongqing 400060, P. R. China

Fund Project:

Supported by the Specialized Project for Technology Innovation and Application Development of Chongqing (CSTB2022TIAD-KPX0100), National Natural Science Foundation of China (62176031), and the Fundamental Research Funds for the Central Universities (2023CDJYGRHZD05).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    基于度量学习范式的小样本行为识别方法已经取得了巨大成功,仍有以下问题无法同时解决:1)无法很好地进行动作关系建模,没能充分利用模态信息;2)无法处理好不同长度不同速度视频匹配问题,无法处理好视频子动作错位的视频匹配问题。为了解决以上问题,提出了一种基于双流互信息联合匹配的小样本行为识别方法,该方法分为2个模块:多模态对比学习模块和联合匹配模块。多模态对比学习模块主要用以探索模态间的互信息,利用对比学习将同一个视频的不同模态特征视为正样本对进行拉近。联合匹配模块主要解决上述提到的视频匹配问题,通过动态时间规整算法和二部图匹配算法联合匹配得到最终的匹配结果,以获得良好的小样本行为识别分类准确度。笔者在2个广泛使用的小样本行为识别数据集SSV2和Kinetics上对提出的方法进行评估,同时进行了大量的消融实验验证了所提方法的有效性。

    Abstract:

    Although few-shot action recognition based on the metric learning paradigm has achieved significant success, it fails to address the following issues: 1) inadequate action relation modeling and underutilization of multi-modal information; 2) challenges in handling video matching problems with different lengths and speeds, and misaligned video sub-actions. To address these limitations, we propose a two-stream joint matching (TSJM) method based on mutual information, which consists of two modules: multi-modal contrastive learning module (MCL) and joint matching module (JMM). The MCL extensively explores inter-modal mutual information relationships, and thoroughly extracts modal information to enhance the modeling of action relationships. The JMM is primarily designed to simultaneously solve the aforementioned video matching problems. By integrating dynamic time warping (DTW) and bipartite graph matching, it optimizes the matching process to generate the final alignment results, thereby achieving high few-shot action recognition accuracy. We evaluate the proposed method on two widely used few-shot action recognition datasets (SSV2 and Kinetics), and conduct comprehensive ablation experiments to substantiate the efficacy of our approach.

    参考文献
    相似文献
    引证文献
引用本文

邓龙,冯波,葛永新.基于双流互信息联合匹配的小样本行为识别[J].重庆大学学报,2025,48(6):63-73.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-20
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-07-11
  • 出版日期:
文章二维码