Graph Regularization Network with Semantic Affinity for Weakly-supervised Temporal Action Localization

Abstract

This paper presents a novel deep architecture for weakly-supervised temporal action localization that not only generates segment-level action responses but also propagates segment-level responses to the neighborhood in a form of graph Laplacian regularization. Specifically, our approach consists of two sub-modules; a class activation module to estimate the action score map over time through the action classifiers, and a graph regularization module to refine the estimated action score map by solving a quadratic programming problem with the predicted segment-level semantic affinities. Since these two modules are integrated with fully differentiable layers, the proposed networks can be jointly trained in an end-to-end manner. Experimental results on Thumos14 and ActivityNet1.2 demonstrate that the proposed method provides outstanding performances in weakly-supervised temporal action localization.

Publication
In IEEE International Conference on Image Processing
Jungin Park
Jungin Park
PhD, Postdoc Researcher

My research interests include computer vision, video understanding, multimodal learning, and vision-language models.