Seminar series
Fri, 08 Feb 2019
12:00 - 13:00
Weixin Yang
University of Oxford

Landmark-based human action recognition in videos is a challenging task in computer vision. One crucial step is to design discriminative features for spatial structure and temporal dynamics. To this end, we use and refine the path signature as an expressive, robust, nonlinear, and interpretable representation for landmark-based streamed data. Instead of extracting signature features from raw sequences, we propose path disintegrations and transformations as preprocessing to improve the efficiency and effectiveness of signature features. The path disintegrations spatially localize a pose into a collection of m-node paths from which the signatures encode non-local and non-linear geometrical dependencies, while temporally transform the evolutions of spatial features into hierarchical spatio-temporal paths from which the signatures encode long short-term dynamical dependencies. The path transformations allow the signatures to further explore correlations among different informative clues. Finally, all features are concatenated to constitute the input vector of a linear fully-connected network for action recognition. Experimental results on four benchmark datasets demonstrated that the proposed feature sets with only linear network achieves comparable state-of-the-art result to the cutting-edge deep learning methods. 

