Topic model 介绍 简介:简要了解主题模型是什么意思,最基本的概念https://en.wikipedia.org/wiki/Topic_model
概率主题模型简介 Introduction to Probabilistic Topic Models 简介:一步让你知道什么是lda,翻译了提出主题模型大神对概率主题模型的描述。中文文档更适合入门。David M. Blei所写的《Introduction to Probabilistic Topic Models》的译文http://www.cnblogs.com/siegfang/archive/2013/01/30/2882391.html
Latent dirichlet allocation:开山之作LDA原论文。了解了主题模型的基础知识之后可以开始看原论文了。原文看不太懂也不要着急,可以先看个大概~ 作者:David M. Blei, Andrew Y. Ng, and Michael I. Jordan 顺便介绍一下Blei大神:David M. Blei Professor in the Statistics and Computer Science departments at Columbia University. Prior to fall 2014 he was an Associate Professor in the Department of Computer Science at Princeton University. His work is primarily in machine learninghttp://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
A correlated topic model of science Blei的大作,引入了主题之间的关联。考虑到了潜在主题的子集将是高度相关的。 http://www.cs.columbia.edu/~blei/papers/BleiLafferty2007.pdf(ppt) http://www-users.cs.umn.edu/~banerjee/Teaching/Fall07/talks/Muhammed_slides.pdf
Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning.文本流推理 作者:A Banerjee , S Basu http://www-users.cs.umn.edu/~banerjee/papers/07/sdm-topics-long.pdf
Topical n-grams: Phrase and topic discovery, with an application to information retrieval 在LDA基础上考虑了词与词之间的顺序 作者:X Wang , A Mccallum , X Weihttp://www.cs.cmu.edu/~xuerui/papers/ngram_tr.pdf
Hierarchical Dirichlet processes. 基于DirichletProcess的变形,即HDP模型,可以自动的学习出主题的数目。该方法:1、在一定程度之上解决了主题模型中自动确定主题数目这个问题,2、代价是必须小心的设定、调整参数的设置,3、实际中运行复杂度更高,代码复杂难以维护。 所以在实际中,往往取一个折中,看看自动确定主题数目这个问题对于整个应用的需求到底有多严格,如果经验设定就可以满足的话,就不用采用基于非参数贝叶斯的方法了,但是如果为了引入一些先验只是或者结构化信息,往往非参数是优先选择,例如树状层次的主题模型和有向无环图的主题模型 作者:Yee Whye Michael I. Jordan J Beal David M. Bleihttps://people.eecs.berkeley.edu/~jordan/papers/hdp.pdf
*Modeling online reviews with multi-grain topic models * 从用户评论数据中进行无监督主题抽取,考虑了一个多级背景主题模型:词~句子~段落~文档,解决了传统LDA模型提出的主题往往对应品牌而不是可以ratable的主题。 作者:I Titov , R Mcdonaldhttp://delivery.acm.org/10.1145/1370000/1367513/p111-titov.pdf
A joint model of text and aspect ratings for sentiment summarization. 本文将一些具有结构化信息的特征融入到主题模型中,具体来说,我们同时关联两个生成过程,一个就是文档中词的生成,另一个就是这些结构化特征的生成。 作者:Titov , Ivan , McDonald , Ryanhttp://www.aclweb.org/anthology/P08-1036
Comparing twitter and traditional media using topic models. 用于社交媒体研究的方法,提出Twtter-LDA,传统LDA并不适用于短文本,这篇论文解决了这一缺点。 作者:WX Zhao J Jiang,J Weng, J H EP Lim https://link.springer.com/chapter/10.1007%2F978-3-642-20161-5_34
更多Papers推荐
Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis. 将主题模型用于多媒体分析,同时考虑了opinion,view,collection等因素 作者:Shengsheng Qian Tianzhu Zhang Changsheng Xu http://delivery.acm.org/10.1145/2970000/2964294/p2-qian.pdf
TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency RNN与主题模型结合,结合了主题模型的全局信息和RNN的局部特征。 作者:AB Dieng, C Wang, J Gao, J Paisley https://arxiv.org/pdf/1611.01702.pdf
Cross-media Topic Detection with Refined CNN based Image-Dominant Topic Model CNN与主题模型结合 作者:Z Wang, L Li , Q Huanghttp://delivery.acm.org/10.1145/2810000/2806309/p1171-wang.pdf
Gaussian LDA for Topic Models with Word Embeddings word embedding 应用于LDA变形 作者:R Das, M Zaheer, C Dyer http://rajarshd.github.io/papers/acl2015.pdf
一些主题模型的应用场景
Papers for NLP
Topic modeling: beyond bag-of-words 为文本语料库建模提供了一种替代方法。 作者:Hanna M. Wallach http://delivery.acm.org/10.1145/1150000/1143967/p977-wallach.pdfhttps://people.cs.umass.edu/~wallach/talks/beyond_bag-of-words.pdf (ppt)
Topical n-grams: Phrase and topic discovery, with an application to information retrieval 本文介绍了主题n-gram即一个发现主题以及主题短语的主题模型。 作者:Andrew McCallum, Xing Wei University of Massachusetts http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4470313
A topic model for word sense disambiguation 用WORDNET(LDAWN)开发LDA 作者:JL Boyd-Graber , DM Blei , X Zhu http://www.aclweb.org/anthology/D07-1109
Papers for opinion mining
Topic sentiment mixture: modeling facets and opinions in weblogs 定义了Weblogs主题情感分析的问题,并提出了一种概率模型来同时捕捉主题和情绪的混合。 作者:Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, Chengxiang Zhaihttp://delivery.acm.org/10.1145/1250000/1242596/p171-mei.pdf
A joint model of text and aspect ratings for sentiment summarization 提出一个统计模型,能够在文本中发现相应的主题,并从支持每个方面评级的评论中提取文本证据。 作者:Titov, Ivan, McDonald, Ryan http://www.aclweb.org/anthology/P08-1036
Current State of Text Sentiment Analysis from Opinion to Emotion Mining 较新的文章,很全面的介绍了opinion挖掘的当前状况。 作者:OR Zaiane
http://delivery.acm.org/10.1145/3060000/3057270/a25-yadollahi.pdf
Thread-based probabilistic models for expert finding in enterprise Microblogs. 提出一个概率文件候选模型,该模型可以在企业微博中找到更多专家。 作者:Zhe Xu Jay Ramanathan Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, United States https://ac.els-cdn.com/S0957417415004406/1-s2.0-S0957417415004406-main.pdf
Papers for information extraction
Employing Topic Models for Pattern-based Semantic Class Discovery 从语义类的角度出发,做信息提取。具体可以参考ppt 作者:Huibin Zhang Nankai University Mingjie Zhu University of Science and Technology of China huming Shi Ji-Rong Wen Microsoft Research Asiahttp://www.aclweb.org/anthology/P09-1052https://pdfs.semanticscholar.org/604b/c2fb02b48d6d106215955a6a30629314df14.pdf (ppt)
Combining Concept Hierarchies and Statistical Topic Models 提供一个通用的数据驱动框架,用于从大量文本文档中自动发现高级知识。 作者:C Chemudugunta , P Smyth , M Steyvers http://delivery.acm.org/10.1145/1460000/1458337/p1469-chemudugunta.pdf
An Unsupervised Framework for Extracting and Normalizing Product Attributes from Multiple Web Sites 开发了一个无监督的框架,用于从源自不同站点的多个网页同时提取和归一化产品的属性。 作者:Tak-Lam Wong Wai Lam The Tik-Shun Wong The Chinese University of Hong Kong, Hong Kong, Hong Konghttp://delivery.acm.org/10.1145/1400000/1390343/p35-wong.pdf
Tutorials
Courses 哥伦比亚大学给出的教程,David M. Blei的课程http://www.cs.columbia.edu/~blei/courses.html
Probabilistic Topic Models: Origins and Challenges 权威综述,介绍了很多基本的主题模型,还包括这些模型之间渐进的关系 作者:David M. Bleihttp://www.cs.columbia.edu/~blei/talks/Blei_Topic_Modeling_Workshop_2013.pdf
Probabilistic Topic Models 作者:David M. Bleihttp://www.cs.columbia.edu/~blei/talks/Blei_MLSS_2012.pdf
Ivan Titov Иван Титов 图模型方面的专家,有许多高水平论文。博客中有很多好的资源可以使读者了解主题模型的发展。 http://www.ivan-titov.org/
Eric xing
My principal research interests lie in the development of machine learning and statistical methodology, andlarge-scale computational system and architecture, for solving problems involving automated learning, reasoning, and decision-making in high-dimensional, multimodal, and dynamic possible worlds in artificial, biological, and social systems. http://www.cs.cmu.edu/~epxing/
朱军
My research focuses on developing statistical machine learning methods to understand complex scientific and engineering data. My current interests are in latent variable models, large-margin learning, Bayesian nonparametrics, and deep learning. Before joining Tsinghua in 2011, I was a post-doc researcher and project scientist at the Machine Learning Department in Carnegie Mellon University.http://ml.cs.tsinghua.edu.cn/~jun/index.shtml