Convolutional architectures have recently been shown to be competitive on many sequence modelling tasks when compared to the de-facto standard of recurrent neural networks (RNNs), while providing computational and modeling advan- tages due to inherent parallelism. However, currently there remains a performance gap to more expressive stochastic RNN variants, especially those with several lay- ers of dependent random variables. In this work, we propose stochastic temporal convolutional networks (STCNs), a novel architecture that combines the computa- tional advantages of temporal convolutional networks (TCN) with the
representational power and robustness of stochastic latent spaces. In particular, we propose a
hierarchy of stochastic latent variables that captures temporal dependencies at different time-scales. The architecture is modular and flexible due to decoupling of deterministic and stochastic layers. We show that the proposed architecture achieves state of the art log-likelihoods across several tasks. Finally, the model is capable of predicting high-quality synthetic samples over a long-range temporal horizon in modeling of handwritten text.
Our contributions can thus be summarized as: 1) We present a modular and scalable approach to augment temporal convolutional network models with effective stochastic latent variables. 2) We empirically show that the STCN-dense design prevents the model from ignoring latent variables in the upper layers (Zhao et al., 2017). 3) We achieve state-of-the-art log-likelihood performance, measured by ELBO, on the IAM-OnDB, Deepwriting, TIMIT and the Blizzard datasets. 4) Finally we show that the quality of the synthetic samples matches the significant quantitative improvements.
原文发布于微信公众号 - CreateAMind（createamind）