This Project And Research Internship deals with multimodal action recognition. We aim to create a training robot which supports people in elderly homes by recognizing their gestures and activities and acting accordingly. It may support them in their everyday lives and further saves them when accidents occur.
Our Team consists of students of the University Koblenz-Landau.
For research and training purposes we took part in two online challenges where we created our first training routine. Those challenges were due on June 30th and our confidence paid off. We succeeded on both competitions and won against a few competitors.
The first competiton was the HEART-MET Gesture Recognition Challenge, where the task was to recognize gestures from videos. The gestures are meant to communicate intentions or commands to the robot, and include the stop sign, nodding, pointing etc. Some examples can be seen in this video. The competition aims to benchmark assistive robots performing healthcare-related tasks in unstructured domestic environments. Gesture recognition is one of the functional benchmarks by which assistive robots are evaluated. Gesture recognition is necessary for non-verbal communication with a user. The datasets for this challenge are collected from real robots performing gesture recognition in domestic environments with several different volunteers performing the gestures.
The second one was the HEART-MET Gesture Recognition Challenge, where the task was to recognize daily living activities performed by humans from videos. The videos are recorded from robots operating in a domestic environment and includes activities such as reading a book, drinking water, falling on the floor etc. Some examples can be seen in this video. HEART-MET is one of the competitions in the METRICS project, which has received funding from European Union’s Horizon 2020 research and innovation program under grant agreement No 871252. The competition aims to benchmark assistive robots performing healthcare-related tasks in unstructured domestic environments. Activity recognition is an important skill for a robot which is operating in an assistive capacity for persons who may have care needs. In addition to recognizing daily living activities, it is important for the robot to detect activities or events in which the robot may need to offer help or call for assistance. Therefore, in HEART-MET, activity recognition is one of the functional benchmarks by which assistive robots are evaluated. The datasets for this challenge are collected from real robots performing activity recognition in domestic environments with several different volunteers performing the activities.
OpenAcRec is an action library that is currently under development. With OpenAcRec we aim to develop an action recognition library that serves as a basis for unifying various datasets of various different modalities. We do not necessarily aim for state of the art action recognition performance but a wide applicability may even over different modalities or their fusion. Currently we even support video based datasets that serve as a fundamental basis for action recognition tasks that is widely applicable for many persons as cameras are easily accessible for most people e.g. in smartphone. To tackle the action recognition problem on a video level we integrate human pose detectors like OpenPose, MediaPipe and OpenPifPaf to first estimate human poses. Over time we represent the poses in an motion image. Currently, convolutional neural networks achieve great results on the classification of images. We therefore use convolutional neural networks to train models that allow the classification of the proposed motion images. Next to the classification of videos, OpenAcRec supports data from motion capturing systems, or skeletons estimated from RGB-D cameras. Various datasets like the UTD-MHAD, MMAct and various datasets from challenges e.g. the Heart-Met-Challenge are supported by our library. We apply various continuous integration methods to support our development. E.g. an up-to-date documentation can always be found here.
Meet our baby, Marec.It’s a prototypical machine which aims to apply our research methods on an actual robot that will be able to recognize human gestures and react/ interact accordingly. MAREC stand for Multimodal Action RECognition and serves as a demonstration platform for the integration of our library on an actual system in an elderly care setting. MAREC consists of a (optional) mobile basis and a linear unit that allows to adjust the screen and camera positions to their users. It is under continuous development but already equipped with functionalities like mapping, navigation, speech synthesis, speech recognition and human pose tracking. A LED in the base gives signalizes the state of MAREC. Our plan is to develop MAREC into a supporting robot for senior homes where he supports people in everyday chores and detects emergencies.
Our project currently lacks the access to compute clusters for the development and training of our models on various datasets. The lack of resources usually results in creative solutions like hijacking lab and employee computers and servers for our experiments. To still maintain an overview we developed a system monitor that allows us to monitor all computations at a single place.
Machine learned models tend to improve, the better the training data reflects the unseen test cases. A web-based tool that allows us to send a link to users which then can contribute samples to our action recognition classes are a great way to crowd source training samples under various setups and scenes. Find gesture recognition on: Media Pose Website. Skeleton Detector coming soon.