【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【导读】专知内容组今天推出最新六篇主题模型(Topic Model)相关论文,欢迎查看!

1.Viscovery: Trend Tracking in Opinion Forums based on Dynamic Topic Models(Viscovery:基于动态主题模型观点论坛的趋势跟踪)



作者:Ignacio Espinoza,Marcelo Mendoza,Pablo Ortega,Daniel Rivera,Fernanda Weiss

机构:Universidad T´ecnica Federico Santa Mar´ıa

摘要:Opinions in forums and social networks are released by millions of people due to the increasing number of users that use Web 2.0 platforms to opine about brands and organizations. For enterprises or government agencies it is almost impossible to track what people say producing a gap between user needs/expectations and organizations actions. To bridge this gap we create Viscovery, a platform for opinion summarization and trend tracking that is able to analyze a stream of opinions recovered from forums. To do this we use dynamic topic models, allowing to uncover the hidden structure of topics behind opinions, characterizing vocabulary dynamics. We extend dynamic topic models for incremental learning, a key aspect needed in Viscovery for model updating in near-real time. In addition, we include in Viscovery sentiment analysis, allowing to separate positive/negative words for a specific topic at different levels of granularity. Viscovery allows to visualize representative opinions and terms in each topic. At a coarse level of granularity, the dynamic of the topics can be analyzed using a 2D topic embedding, suggesting longitudinal topic merging or segmentation. In this paper we report our experience developing this platform, sharing lessons learned and opportunities that arise from the use of sentiment analysis and topic modeling in real world applications.

期刊:arXiv, 2018年5月1日

网址

http://www.zhuanzhi.ai/document/f6dc886b856bc775633d21f0c43be443

2.Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time(随时间演变的基于深度Temporal-Recurrent-Replicated-Softmax的主题趋势)



作者:Pankaj Gupta,Subburam Rajaram,Hinrich Schütze,Bernt Andrassy

NAACL-HLT 2018

机构:University of Munich

摘要:Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNNRSM shows better generalization, topic interpretation, evolution and trends. We also introduce a metric (named as SPAN) to quantify the capability of dynamic topic model to capture word evolution in topics over time.

期刊:arXiv, 2018年5月1日

网址

http://www.zhuanzhi.ai/document/4a4d7b3da005c3d84950f36928b9d1eb

3.Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler(Polya Urn Latent Dirichlet Allocation:一个双稀疏的大规模并行采样方法)



作者:Alexander Terenin,Måns Magnusson,Leif Jonsson,David Draper

摘要:Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel P\'olya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that -- contrary to popular belief in the topic modeling literature -- partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora.

期刊:arXiv, 2018年4月23日

网址

http://www.zhuanzhi.ai/document/5597b716e095f10e3fe9d8c3011be251

4.Overlapping Coalition Formation via Probabilistic Topic Modeling(基于概率主题模型的重叠联盟形成)



作者:Michalis Mamakos,Georgios Chalkiadakis

To appear in AAMAS'18

机构:Northwestern University,Technical University of Crete

摘要:Research in cooperative games often assumes that agents know the coalitional values with certainty, and that they can belong to one coalition only. By contrast, this work assumes that the value of a coalition is based on an underlying collaboration structure emerging due to existing but unknown relations among the agents; and that agents can form overlapping coalitions. Specifically, we first propose Relational Rules, a novel representation scheme for cooperative games with overlapping coalitions, which encodes the aforementioned relations, and which extends the well-known MC-nets representation to this setting. We then present a novel decision-making method for decentralized overlapping coalition formation, which exploits probabilistic topic modeling, and in particular, online Latent Dirichlet Allocation. By interpreting formed coalitions as documents, agents can effectively learn topics that correspond to profitable collaboration structures.

期刊:arXiv, 2018年4月14日

网址

http://www.zhuanzhi.ai/document/817e358c94ffb0b4967c34fb1f98987c

5.Large-Scale Stochastic Sampling from the Probability Simplex(概率单纯形的大规模随机采样)



作者:Jack Baker,Paul Fearnhead,Emily B Fox,Christopher Nemeth

机构:University of Washington,Lancaster University

摘要:Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

期刊:arXiv, 2018年6月19日

网址

http://www.zhuanzhi.ai/document/a7256ae91be0736bd81473b0d133c6fc

6.Nonparametric Topic Modeling with Neural Inference(非参主题建模与神经推理)



作者:Xuefei Ning,Yin Zheng,Zhuxi Jiang,Yu Wang,Huazhong Yang,Junzhou Huang

机构:Tsinghua University

摘要:This work focuses on combining nonparametric topic models with Auto-Encoding Variational Bayes (AEVB). Specifically, we first propose iTM-VAE, where the topics are treated as trainable parameters and the document-specific topic proportions are obtained by a stick-breaking construction. The inference of iTM-VAE is modeled by neural networks such that it can be computed in a simple feed-forward manner. We also describe how to introduce a hyper-prior into iTM-VAE so as to model the uncertainty of the prior parameter. Actually, the hyper-prior technique is quite general and we show that it can be applied to other AEVB based models to alleviate the {\it collapse-to-prior} problem elegantly. Moreover, we also propose HiTM-VAE, where the document-specific topic distributions are generated in a hierarchical manner. HiTM-VAE is even more flexible and can generate topic distributions with better variability. Experimental results on 20News and Reuters RCV1-V2 datasets show that the proposed models outperform the state-of-the-art baselines significantly. The advantages of the hyper-prior technique and the hierarchical model construction are also confirmed by experiments.

期刊:arXiv, 2018年6月18日

网址

http://www.zhuanzhi.ai/document/6cff57dd7505ae7ddbae6925cb8fd1cf

-END-

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2018-06-24

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏CreateAMind

5篇生成模型相关 paper

Bayesian GAN Yunus Saatchi Andrew Gordon Wilson

781
来自专栏AI科技评论

动态 | NIPS 2017 今天开启议程,谷歌科学家竟然组团去了450人,还都不是去玩的!

AI 科技评论按:据说,别人去NIPS 2017是这样的: ? 而谷歌去NIPS 2017是这样的: ? 今天,人工智能领域本年度最后一个学术盛会、机器学习领域...

2263
来自专栏目标检测和深度学习

【一文打尽 ICLR 2018】9大演讲,DeepMind、谷歌最新干货抢鲜看

1874
来自专栏CVer

人工智能 | 中国计算机学会推荐国际学术刊物/会议

关注CVer公众号的群体大多以学生为主,特别是研究生。相信在帮boss做事的时候,论文也是核心工作。Amusi平时爱推送一些论文速递,但这么多论文,怎么快速区分...

1701
来自专栏专知

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【导读】专知内容组整理了最近六篇主题模型(Topic Model)相关文章,为大家进行介绍,欢迎查看! 1.Convergence Rates of Laten...

2864
来自专栏生信小驿站

主成分分析①

principal() 含多种可选的方差旋转方法的主成分分析 fa() 可用主轴、最小残差、加权最小平方或最大似然法估计的因子分析 fa.paralle...

902
来自专栏专知

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【导读】专知内容组整理了最近七篇自注意力机制(Self-attention)相关文章,为大家进行介绍,欢迎查看! 1. A Structured Self-at...

4.8K6
来自专栏专知

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【导读】专知内容组既前两天推出十六篇推荐系统相关论文之后,今天为大家又推出十篇推荐系统(Recommendation System)相关论文,欢迎查看!

3623
来自专栏专知

【论文推荐】最新七篇推荐系统相关论文—正则化奇异值、用户视角、CTR预测、Top-k、人机交互、隐反馈

1692
来自专栏大数据挖掘DT机器学习

机器学习&数据挖掘知识点大总结

Basis(基础): MSE(Mean Square Error 均方误差), LMS(LeastMean Square 最小均方), LSM(L...

41114

扫码关注云+社区

领取腾讯云代金券