Human action recognition is an active topic in the field of computer vision. It leads to a large number of potential applications based on automatic video analysis such as visual surveillance, human-machine interaction, sports analysis, entertainment systems and video retrieval.
Video-based human action recognition is extremely challenging due to general visual ambiguities, such as illumination variations, texture of clothing, general background noise or view heterogeneity. All of these issues can be mitigated by both motion capture - MoCap - systems (e.g. VICON) and depth sensors (e.g. Microsoft Kinect), which can encode human actions as the temporal evolution of skeletal joint positions. Leveraging on this data type, we are developing state-of-the-art action recognition techniques based on covariance descriptors, as to correlate in time the joints coordinates. As the results show, we are able to accumulate in time those evidences which are displayed by the human body while performing any given actions in a way that our system is able to perform classification in a accurate manner. This is due to the fact that we can overcome the limit of the classical covariance representation, consisting in modeling linear relationship between variables. To this aim, we developed a rigorous kernelization of the covariance operator which allows us to capture arbitrary and more complex relationships, improving the overall system performance, without compromising the computational efficiency.
Actually, one other key aspect of the whole action recognition pipeline is the actual role played by the human agent. Indeed, each human actor can perform the very same action or activity in very different manner, and, on the other hand, different actions can be quite similarly performed depending on the style of the executioner. Despite the latter is a crucial problem, it is simply not considered in classical recognition pipelines. Precisely, it is implicitly tackled through a cross-subject validation attempting to learn a model which is general enough to be applicable to unseen agents.
Very differently, we are actively studying the role of the subject in action recognition, numerically quantifying its impact on the whole recognition performance by either comparing different testing strategies (cross-validation, one-subject-out, held-out,…). Also, we are able to measure inter- and intra-subject variability with newly devised, specific statistics. In addition to be interesting per se, this theoretical analysis is extremely beneficial to setup novel action recognition schemes which actually boost the action classification if compared to classical, subject-unaware frameworks.
References:
- J. Cavazza, A. Zunino, M. San Biagio, V. Murino
"Kernelized Covariance for Action Recogntion"
23rd International Conference on Pattern Recognition (ICPR), 2016
[POSTER] [MATLAB code]
- A. Zunino, J. Cavazza, V. Murino
"Revisiting Human Action Recognition: Personalization vs. Generalization"
arXiv:1605.00392, 2016 [PDF]