Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos

Jungin Park, Sangryul Jeon, Seungryong Kim, Jiyoung Lee, Sunok Kim, Kwanghoon Sohn

July, 2018

Abstract

While recognizing human actions and surrounding scenes addresses different aspects of video understanding, they have strong correlations that can be used to complement the singular information of each other. In this paper, we propose an approach for joint action and scene recognition that is formulated in an end-to-end learning framework based on temporal attention techniques and the fusion of them. By applying temporal attention modules to the generic feature network, action and scene features are extracted efficiently, and then they are composed to a single feature vector through the proposed fusion module. Our experiments on the CoVieW18 dataset show that our model is able to detect temporal attention with only weak supervision, and remarkably improves multi-task action and scene classification accuracies.

Type

Conference paper

Publication

In ACM Multimedia Workshop

Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos

Abstract

Jungin Park

PhD, Postdoc Researcher