Learning from Legacy MoCap Data for Precision Modelling of 3D Human Motion for Behavioural and Performance Analysis

Speaker:  Ajmal Saeed Mian – Crawley, WN, Australia
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing


Modelling human actions is useful for surveillance, sports and medical applications. State of the art models are based on deep learning from large amounts of annotated data which is expensive to generate. We bypass this step and capitalize on legacy motion capture (MoCap) data for precision modelling of human actions. Legacy MoCap data are used to learn deep knowledge transfer models to perform action recognition in videos and for estimating ground reaction forces/moments making the widely used force plate in sports labs redundant. In our next work, the MoCap data is fitted with 3D humans of varying identities, sizes, gender, clothing and animated in random backgrounds and rendered from multiple camera viewpoints under different lighting conditions. In essence, actions performed by real humans in the past are lived through in different bodies and locations giving us a large corpus of videos where the exact human pose is known along with other athlete performance related annotations. This data is used to learn Hu-man Pose Models (HPM) that map video frames to the respective human poses. HPM extracts invariant features from real video frames and Fourier Temporal Pyramid is used to represent the temporal feature dynamics followed by action classification. Experiments on three cross-view hu-man action datasets showed that our algorithm outperformed existing methods by significant margins for RGB only and RGB-D action recognition. Interestingly, our RGB-only model outper-formed RGB-D methods on the most challenging NTU dataset. Finally, I will show how our learning from synthetic data paradigm generalizes to full 3D human pose recovery from videos. Here, we a step further in precision. We design loose clothing for the 3D humans and use a physics based engine for realistic movements of the clothes with body movements and gravity to generate more realistic videos. We also use pose interpolation to generate novel human motions for data aug-mentation. An end-to-end model is then trained to generate temporally coherent full 3D human models from video clips. The above models are published in CVPR, TPAMI, IJCV and TBME. 

About this Lecture

Number of Slides:  44
Duration:  50 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.