Mandarin lip recognition based on MSAF with multimodal task
CSTR:
Author:
Affiliation:

1.China Mobile(Hangzhou) InformationTechnology Co., Ltd., Hangzhou 310000, P. R.China;2.School of Computer Science and Engineering(School of Artificial Intelligence), Chongqing University of Science and Technology, Chongqing 401331, P. R. China;3.School of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China

Clc Number:

Fund Project:

Supported by Foundation Item:Ministry of Education - China Mobile Research Fund(MCM20180404).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Multimodal lip recognition aims to enhance speech recognition accuracy and robustness by integrating lip movements and speech information, while also aiding specific user groups in communication. However, existing lip-speaking models predominantly focus on English datasets, leaving research on Chinese lip recognition in its nascent stage. Addressing challenges in handling data features across different modalities, integrating these features, and achieving comprehensive fusion of multimodal features, we propose a multimodal split attention fusion audio visual recognition (MSAFVR) model. Through experiments utilizing a Chinese Mandarin lip reading (CMLR) dataset, our model, MSAFVR, demonstrates significant advancements, achieving a remarkable 92.95% accuracy in Chinese lip reading, surpassing state-of-the-art Mandarin lip reading models.

    Reference
    Related
    Cited by
Get Citation

荣玉军,吴仙海,蔡枫林,杨同鑫,李鹏华.基于MSAF与多模态任务的普通话唇语识别[J].重庆大学学报,2026,49(4):107~116

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 12,2024
  • Revised:
  • Adopted:
  • Online: April 21,2026
  • Published:
Article QR Code