前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Time-Contrastive Learning for Latent Variable Models

Time-Contrastive Learning for Latent Variable Models

作者头像
用户1908973
发布2018-07-24 16:37:59
6100
发布2018-07-24 16:37:59
举报
文章被收录于专栏:CreateAMindCreateAMindCreateAMind

Time-Contrastive Learning for Latent Variable Models

"Aapo did it again!" - I exclaimed while reading this paper yesterday on the train back home (or at least I thought I was going home until I realised I was sitting on the wrong train the whole time. This gave me a couple more hours to think while traveling on a variety of long-distance buses...)

Aapo Hyvärinen is one of my heroes - he did tons of cool work, probably most famous for pseudo-likelihood, score matching and ICA. His recent paper, brought to my attention by my brand new colleague Hugo Larochelle, is similarly excellent:

Summary
  • Time-contrastive learning (TCL) trains a classifier that looks at a datapoint and guesses what part of the time series it came from.
  • it exploits nonstationarity of time series to help representation learning
  • an elegant connection to generative models (nonlinear ICA) is shown, although the assumptions of the model are pretty limiting
  • TCL is the temporal analogue of representation learning with jigsaw puzzles
  • similarly to GANs, logistic regression is deployed as a proxy to learn log-likelihood ratios directly from data

Time-contrastive learning

Time-contrastive learning (TCL) is a technique for learning to extract nonlinear representations from time series data. First, the time series is sliced up into a number of non-overlapping chunks, indexed by ττ. Then, a multivariate logistic regression classifier is trained in a supervised manner to look at a sample taken from the series at an unknown time and predict ττ, the index of the chunk it came from. For this classifier, a neural network is used.

The classifier itself is only a proxy to solving the representation learning problem. It turns out, if you chop off the final linear + softmax layer, the activations in the last hidden layer will learn to represent something fundamental, the log-odds-ratios in a probabilistic generative model (see paper for details). If one runs linear ICA over these hidden layer activations, the resulting network will learn to perform inference in a nonlinear ICA latent variable model.

Moreover, if certain conditions about nonstationarity and the generative model are met, one can prove that the latent variable model is identifiable. This means that if the data was indeed drawn from the nonlinear ICA generative model, the resulting inference network - composed by the chopping off the top of the classifier and replacing it with a linear ICA layer - can infer the true hidden variables exactly.

How practical are the assumptions?

TCL relies on the nonstationarity of time series data: the statistics of data changes depending on which chunk or slice of the time series you are in, but it is also assumed that data are i.i.d. within each chunk. The proof also assumes that the chunk-conditional data distributions are slightly modulated versions of the same nonlinear ICA generative model, this is how the model ends up identifiable - because we can use the different temporal chunks as different perspectives on the latent variables.

I would say that these assumptions are not very practical, or at least on data such as natural video. Something along the lines of slow-feature analysis, with latent variables that exhibit more interesting behaviour over time would be desirable. Nevertheless, the model is complex enough to make a point, and I beleive TCL itself can be deployed more generally for representation learning.

Temporal jigsaw

It's not hard to see that TCL is analogous to a temporal version of the jigsaw puzzle method I wrote about last month. In the jigsaw puzzle method, one breaks up a single image into non-overlapping chunks, shuffles them, and then trains a network to reassemble the pieces. Here, the chunking happens in the temporal domain instead.

There are other papers that use the same general idea: training classifiers that guess the correct temporal ordering of frames or subsequences in videos. To do well at their job, these classifiers can end up learning about objects, motion, perhaps even a notion of inertia, gravity or causality.

In this context, the key contribution of Hyvarinen and Morioka's paper is to provide extra theoretical justification, and relating the idea to generative models. I'm sure one can use this framework to extend TCL to slightly more plausible generative models.

Key takeaway

Logistic regression learns likelihood ratios

This is yet another example of using logistic regression as a proxy to estimating log-probability-ratios directly from data. The same thing happens in generative adversarial networks, where the discriminator learns to represent logP(x)−logQ(x)log⁡P(x)−log⁡Q(x), where PP and QQ are the real and synthetic data distributions, respectively.

This insight provides new ways in which unsupervised or semi-supervised tasks can be reduced to supervised learning problems. As classification is now considered significantly easier than density estimation, direct probability ratio estimation may provide the easiest path forward for representation learning.

https://arxiv.org/abs/1605.06336

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2017-04-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Time-Contrastive Learning for Latent Variable Models
    • Time-contrastive learning
      • How practical are the assumptions?
      • Temporal jigsaw
      • Key takeaway
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档