Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

doi 10.18653/v1/n18-2125
Full Text
Abstract

Available in full text

Date
Authors
Publisher

Association for Computational Linguistics