Paper Reading - Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( CVPR 2015 )

佚名 6年前 (2018-08-13) 人工智能 1618人围观抢沙发百度已收录

Link of the Paper: https://arxiv.org/abs/1411.4389

Main Points:

SRE实战互联网时代守护先锋，助力企业售后服务体系运筹帷幄！一键直达领取阿里云限量特价优惠。

A novel Recurrent Convolutional Architecture ( CNN + LSTM ): both Spatially and Temporally Deep.
The recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations.

Other Key Points:

A significant limitation of simple RNN models which strictly integrate state information over time is known as the "vanishing gradient" effect: the ability to backpropogate an error signal through a long-range temporal interval becomes increasingly impossible in practice.
The authors show LSTM-type models provide for improved recognition on conventional video activity challenges and enable a novel end-to-end optimizable mapping from image pixels to sentence-level natural language descriptions.