基于深度卷积神经网络和深度视频的人体行为识别
作者:
中图分类号:

TP181

基金项目:

国家自然科学基金青年科学基金资助项目(61502065);重庆市教委科学技术研究资助项目(KJ1600937,KJ1500922,KJ1501504)。


Action recognition based on deep convolution neural network and depth sequences
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [1]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    传统人体行为识别基于人工设计特征方法涉及的环节多,具有时间开销大,算法难以整体调优的缺点。以深度视频为研究对象,构建了3维卷积深度神经网络自动学习人体行为的时空特征,使用Softmax分类器进行人体行为的分类识别。实验结果表明,提出的方法能够有效提取人体行为的潜在特征,不但在MSR-Action3D数据集上能够获得与当前最好方法一致的识别效果,在UTKinect-Action3D数据集也能够获得与基准项目相当的识别效果。本方法的优势是不需要人工提取特征,特征提取和分类识别构成一个端到端的完整闭环系统,方法更加简单。同时,研究方法也验证了深度卷积神经网络模型具有良好的泛化性能,使用MSR-Action3D数据集训练的模型直接应用于UTKinect-Action3D数据集上行为的分类识别,同样获得了良好的识别效果。

    Abstract:

    Traditional methods for action recognition include several isolated processes and depend on well-designed features, which makes them has the shotcomings of large time cost and difficult to optimize the parameters from the whole. In this paper, we use depth sequences to study deep learning-based action recognition and construct a 3D-based deep convolution neural network to automatically learn spatio-temporal features from raw depth sequences. A Softmax classifier is used on the learned features to take action recognition. Experimental results demonstrate that our method can learn feature representation automatically from depth sequences. The proposed method performs comparable results to the state-of-the-art methods on the MSR-Action3D dataset and achieves good performance in comparison to baseline methods on the UTKinect-Action3D dataset. And the proposed method is simpler in feature extracting and action recognition consist of a closed loop system which can learn features automatically. We further investigate the generalization of the trained model by transferring the learned features from one dataset (MSR-Action3D) to another dataset (UTKinect-Action3D) without retraining and obtain very promising classification accuracy.

    参考文献
    [1] Peng X, Qiao Y, Peng Q. Motion boundary based sampling and 3d co-occurrence descriptors for action recognition. Image and Vision Computing, 2014, 32(9):616-628. Roshtkhari M J, Levine M D. Human activity recognition in videos using a single example. Image and Vision Computing, 2013, 31(11):864-876. O'Hara S, Lui Y M, Draper B A. Using a product manifold distance for unsupervised action recognition. Image and Vision Computing, 2012, 30(3):206-216. Yang X, Zhang C, Tian Y L. Recognizing actions using depth motion maps-based histograms of oriented gradients//ACM International Conference on Multimedia. NewYork:ACM, 2012:1057-1060. Oreifej O, Liu Z. HON4D:Histogram of oriented 4D normals for activity recognition from depth sequences//Computer Vision and Pattern Recognition. NewYork:IEEE, 2013:716-723. Laptev I, Marszalek M, Schmid C, et al. Learning realistic human actions from movies//Computer Vision and Pattern Recognition, 2008, IEEE Conference on. NewYork:IEEE, 2008:1-8. Davis J W, Bobick A F. The representation and recognition of human movement using temporal templates//Conference on Computer Vision and Pattern Recognition. Washington, D.C:IEEE Computer Society, 1997:928. Wu Y. Mining actionlet ensemble for action recognition with depth cameras//IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C:IEEE Computer Society, 2012:1290-1297. Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a lie group//Computer Vision and Pattern Recognition. NewYork:IEEE, 2014:588-595. Li W, Zhang Z, Liu Z. Action recognition based on a bag of 3D points//Computer Vision and Pattern Recognition Workshops. NewYork:IEEE, 2010:9-14. Zanfir M, Leordeanu M, Sminchisescu C. The moving pose:an efficient 3D kinematics descriptor for low-latency action recognition and detection//IEEE International Conference on Computer Vision. NewYork:IEEE, 2014:2752-2759. Richard M, Sastry, et al. A mathematical introduction to robotic manipulation. CRC Press, 1994, 39(9):292. Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints//Computer Vision and Pattern Recognition Workshops. NewYork:IEEE, 2012:20-27. Hinton G, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554. Shuiwang J V, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231. Le Q V, Zou W Y, Yeung S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis//IEEE Conference on Computer Vision and Pattern Recognition. Washington,D.C:IEEE Computer Society, 2011:3361-3368. Sun L, Jia K, Chan T H, et al. DL-SFA:deeply-learned slow feature analysis for action recognition//IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C:IEEE Computer Society, 2014:2625-2632. Du T, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks//IEEE International Conference on Computer Vision. NewYork:IEEE, 2015:4489-4497. Simonyan K, Zisserman A. Two-Stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 2014, 1(4):568-576. Krizhevsky A, Sutskever I, Hinton G E, et al. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 2012:1097-1105. Ye M, Zhang Q, Wang L, et al. A survey on human motion analysis from depth data. Berlin Heidelberg:Springer, 2013:149-187. Valle E A, Starostenko O. Recognition of human walking/running actions based on neural network//International Conference on Electrical Engineering, Computing Science and Automatic Control. NewYork:IEEE, 2013:239-244. Shotton J, Sharp T, Kipman A, et al. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 2013, 56(1):116-124. Yang X, Tian Y L. Super normal vector for activity recognition using depth sequences//Computer Vision and Pattern Recognition. NewYork:IEEE, 2014:804-811. Bengio Y, Courville A, Vincent P. Representation learning:a review and new perspectives. IEEE Transactions onPattern Analysis and Machine Intelligence, 2013, 35(8):1798-1828. Collobert R, Kavukcuoglu K, Farabet C, et al. Torch7:a Matlab-like environment for machine learning//BigLearn, NIPS Workshop. Boston:MIT press, 2012. Shotton J, Sharp T, Kipman A, et al. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 2013, 56(1):116-124. Oquab M, Bottou L, Laptev I, et al. Learning and transferring mid-level image representations using convolutional neural networks//IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C:IEEE Computer Society, 2014:1717-1724.
    引证文献
引用本文

刘智,冯欣,张杰.基于深度卷积神经网络和深度视频的人体行为识别[J].重庆大学学报,2017,40(11):99-106.

复制
分享
文章指标
  • 点击次数:1543
  • 下载次数: 2981
  • HTML阅读次数: 1004
  • 引用次数: 0
历史
  • 收稿日期:2016-02-26
  • 在线发布日期: 2017-11-14
文章二维码