Human Activity Recognition Using Multi-modal Data Fusion

Document Type

Contribution to Book

Publication Date




Source Publication

Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications

Source ISSN



The automated recognition of human activity is an important computer vision task, and it has been the subject of an increasing number of interesting home, sports, security, and industrial applications. Approaches using a single sensor have generally shown unsatisfactory performance. Therefore, an approach that efficiently combines data from a heterogeneous set of sensors is required. In this paper, we propose a new method for human activity recognition fusing data obtained from inertial sensors (IMUs), surface electromyographic recording electrodes (EMGs), and visual depth sensors, such as the Microsoft Kinect®. A network of IMUs and EMGs is scattered on a human body and a depth sensor keeps the human in its field of view. From each sensor, we keep track of a succession of primitive movements over a time window, and combine them to uniquely describe the overall activity performed by the human. We show that the multi-modal fusion of the three sensors offers higher performance in activity recognition than the combination of two or a single sensor. Also, we show that our approach is highly robust against temporary occlusions, data losses due to communication failures, and other events that naturally occur in non-structured environments.


"Human Activity Recognition Using Multi-modal Data Fusion," in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, by Vera-Rodriguez. [Place of publication not identified] Springer International Publishing, 2019: 946-953. DOI.