专栏首页专知【论文推荐】最新6篇主题模型相关论文—正则化变分推断主题模型、非参数先验、在线聊天、词义消歧、神经语言模型

【论文推荐】最新6篇主题模型相关论文—正则化变分推断主题模型、非参数先验、在线聊天、词义消歧、神经语言模型

【导读】专知内容组整理了最近六篇主题模型(Topic Modeling)相关文章,为大家进行介绍,欢迎查看!

1. Topic Modeling on Health Journals with Regularized Variational Inference(基于正则化变分推断主题模型的健康杂志分析)



作者:Robert Giaquinto,Arindam Banerjee

摘要:Topic modeling enables exploration and compact representation of a corpus. The CaringBridge (CB) dataset is a massive collection of journals written by patients and caregivers during a health crisis. Topic modeling on the CB dataset, however, is challenging due to the asynchronous nature of multiple authors writing about their health journeys. To overcome this challenge we introduce the Dynamic Author-Persona topic model (DAP), a probabilistic graphical model designed for temporal corpora with multiple authors. The novelty of the DAP model lies in its representation of authors by a persona --- where personas capture the propensity to write about certain topics over time. Further, we present a regularized variational inference algorithm, which we use to encourage the DAP model's personas to be distinct. Our results show significant improvements over competing topic models --- particularly after regularization, and highlight the DAP model's unique ability to capture common journeys shared by different authors.

期刊:arXiv, 2018年1月16日

网址

http://www.zhuanzhi.ai/document/54eaa7fc454fdd76f151d73e09800876

2. Latent nested nonparametric priors(潜在的嵌套的非参数先验)



作者:Federico Camerlenghi,David B. Dunson,Antonio Lijoi,Igor Prünster,Abel Rodríguez

摘要:Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

期刊:arXiv, 2018年1月16日

网址

http://www.zhuanzhi.ai/document/1d93522cec3fa8b21451dd3738528d05

3. Between an Arena and a Sports Bar: Online Chats of eSports Spectators(在竞技场和体育酒吧之间:电子竞技观众的在线聊天)



作者:Ilya Musabirov,Denis Bulygin,Paul Okopny,Ksenia Konstantinova

摘要:ESports tournaments, such as Dota 2's The International (TI), attract millions of spectators to watch broadcasts on online streaming platforms, to communicate, and to share their experience and emotions. Unlike traditional streams, tournament broadcasts lack a streamer figure to which spectators can appeal directly. Using topic modelling and cross-correlation analysis of more than three million messages from 86 games of TI7, we uncover main topical and temporal patterns of communication. First, we disentangle contextual meanings of emotes and memes, which play a salient role in communication, and show a meta-topics semantic map of streaming slang. Second, our analysis shows a prevalence of the event-driven game communication during tournament broadcasts and particular topics associated with the event peaks. Third, we show that "copypasta" cascades and other related practices, while occupying a significant share of messages, are strongly associated with periods of lower in-game activity. Based on the analysis, we propose design ideas to support different modes of spectators' communication.

期刊:arXiv, 2018年1月9日

网址

http://www.zhuanzhi.ai/document/0595c45b2eed2dc044064cc66c1f85e0

4. Knowledge-based Word Sense Disambiguation using Topic Models(基于主题模型的以知识为基础的词义消歧)



作者:Devendra Singh Chaplot,Ruslan Salakhutdinov

摘要:Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin.

期刊:arXiv, 2018年1月6日

网址

http://www.zhuanzhi.ai/document/7c88481a97379dde4cc5761cde0037b0

5. Topic Compositional Neural Language Model(神经语言模型和主题结合的方法)



作者:Wenlin Wang,Zhe Gan,Wenqi Wang,Dinghan Shen,Jiaji Huang,Wei Ping,Sanjeev Satheesh,Lawrence Carin

摘要:We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topic-dependent weight matrices. The degree to which each member of the ensemble is used is tied to the document-dependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNN-based model and other topic-guided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics.

期刊:arXiv, 2017年12月29日

网址

http://www.zhuanzhi.ai/document/dd777f72fa4cf6222e3a8cfa76c02c73

6. Multilingual Topic Models(多语言主题模型)



作者:Kriste Krstovski,Michael J. Kurtz,David A. Smith,Alberto Accomazzi

摘要:Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of mapping diverse representations into a shared topic space allows us to evaluate representations based on how topically similar they are to the original article. In addition, our proposed approach provides means to discover where different concept vocabularies require improvement.

期刊:arXiv, 2017年12月19日

网址

http://www.zhuanzhi.ai/document/fd0f8c1e4f305e0d3784574b22ccb4d5

本文分享自微信公众号 - 专知(Quan_Zhuanzhi),作者:专知内容组(编)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-01-27

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

    WZEARW
  • 【论文推荐】最新六篇深度强化学习( DRL)相关论文—VR眼镜、参数噪声、恶意软件、合成复杂程序、深度继承表示、自适应

    【导读】专知内容组整理了最近六篇深度强化学习( Deep Reinforcement Learning)相关文章,为大家进行介绍,欢迎查看! 1. VR Gog...

    WZEARW
  • 【论文推荐】最新6篇生成式对抗网络(GAN)相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

    【导读】专知内容组整理了最近六篇生成式对抗网络(GAN)相关文章,为大家进行介绍,欢迎查看! 1. Classification of sparsely lab...

    WZEARW
  • 【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

    WZEARW
  • GCN 论文英语表达总结

    -------------------------------------------------------一条开始认真脸的分界线--------------...

    张凝可
  • 用信息瓶颈的迁移学习和探索

    Transfer and Exploration via the Information Bottleneck

    用户1908973
  • 3 Lessons IBM's Watson Can Teach Us About Our Brains' Biases

    COGNITIVE COMPUTING IS TRANSFORMING THE WAY WE WORK. IT ALSO OFFERS A WINDOW TO ...

    首席架构师智库
  • Graph application with Python, Neo4j, Gephi & Linkurious.js

    I love Python, and to celebrate Packt Python week, I’ve spent some time developi...

    fishexpert
  • 【论文推荐】最新6篇生成式对抗网络(GAN)相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

    【导读】专知内容组整理了最近六篇生成式对抗网络(GAN)相关文章,为大家进行介绍,欢迎查看! 1. Classification of sparsely lab...

    WZEARW
  • 【论文推荐】最新六篇深度强化学习( DRL)相关论文—VR眼镜、参数噪声、恶意软件、合成复杂程序、深度继承表示、自适应

    【导读】专知内容组整理了最近六篇深度强化学习( Deep Reinforcement Learning)相关文章,为大家进行介绍,欢迎查看! 1. VR Gog...

    WZEARW

扫码关注云+社区

领取腾讯云代金券