首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >如何从gensim打印LDA主题模型?Python

如何从gensim打印LDA主题模型?Python
EN

Stack Overflow用户
提问于 2013-02-22 10:47:42
回答 10查看 40.8K关注 0票数 27

使用gensim,我能够从LSA中的一组文档中提取主题,但是如何访问从LDA模型生成的主题呢?

打印lda.print_topics(10)时,由于print_topics()返回NoneType,代码显示以下错误

代码语言:javascript
复制
Traceback (most recent call last):
  File "/home/alvas/workspace/XLINGTOP/xlingtop.py", line 93, in <module>
    for top in lda.print_topics(2):
TypeError: 'NoneType' object is not iterable

代码:

代码语言:javascript
复制
from gensim import corpora, models, similarities
from gensim.models import hdpmodel, ldamodel
from itertools import izip

documents = ["Human machine interface for lab abc computer applications",
              "A survey of user opinion of computer system response time",
              "The EPS user interface management system",
              "System and human system engineering testing of EPS",
              "Relation of user perceived response time to error measurement",
              "The generation of random binary unordered trees",
              "The intersection graph of paths in trees",
              "Graph minors IV Widths of trees and well quasi ordering",
              "Graph minors A survey"]

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

# remove words that appear only once
all_tokens = sum(texts, [])
tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1)
texts = [[word for word in text if word not in tokens_once]
         for text in texts]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# I can print out the topics for LSA
lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)
corpus_lsi = lsi[corpus]

for l,t in izip(corpus_lsi,corpus):
  print l,"#",t
print
for top in lsi.print_topics(2):
  print top

# I can print out the documents and which is the most probable topics for each doc.
lda = ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50)
corpus_lda = lda[corpus]

for l,t in izip(corpus_lda,corpus):
  print l,"#",t
print

# But I am unable to print out the topics, how should i do it?
for top in lda.print_topics(10):
  print top
EN

回答 10

Stack Overflow用户

发布于 2013-02-22 11:00:40

经过一番折腾之后,ldamodel版的print_topics(numoftopics)似乎有了一些bug。因此,我的解决方法是使用print_topic(topicid)

代码语言:javascript
复制
>>> print lda.print_topics()
None
>>> for i in range(0, lda.num_topics-1):
>>>  print lda.print_topic(i)
0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system
...
票数 21
EN

Stack Overflow用户

发布于 2015-04-29 17:04:57

我认为show_topics的语法随着时间的推移而发生了变化:

代码语言:javascript
复制
show_topics(num_topics=10, num_words=10, log=False, formatted=True)

对于num_topics主题数量,返回num_words most default(默认情况下,每个主题10个单词)。

主题以列表形式返回-如果格式化为True,则返回字符串列表;如果为False,则返回(probability,word) 2元组列表。

如果log为True,也将此结果输出到log。

与LSA不同,LDA中的主题之间没有自然的顺序。因此,返回的所有主题的num_topics <= self.num_topics子集是任意的,并且可能在两次LDA训练运行之间发生变化。

票数 11
EN

Stack Overflow用户

发布于 2018-10-23 06:29:43

我认为以单词列表的形式查看主题总是更有帮助的。下面的代码片段有助于实现这一目标。我假设您已经有一个名为lda_model的lda模型。

代码语言:javascript
复制
for index, topic in lda_model.show_topics(formatted=False, num_words= 30):
    print('Topic: {} \nWords: {}'.format(idx, [w[0] for w in topic]))

在上面的代码中,我决定显示属于每个主题的前30个单词。为了简单起见,我展示了我得到的第一个主题。

代码语言:javascript
复制
Topic: 0 
Words: ['associate', 'incident', 'time', 'task', 'pain', 'amcare', 'work', 'ppe', 'train', 'proper', 'report', 'standard', 'pmv', 'level', 'perform', 'wear', 'date', 'factor', 'overtime', 'location', 'area', 'yes', 'new', 'treatment', 'start', 'stretch', 'assign', 'condition', 'participate', 'environmental']
Topic: 1 
Words: ['work', 'associate', 'cage', 'aid', 'shift', 'leave', 'area', 'eye', 'incident', 'aider', 'hit', 'pit', 'manager', 'return', 'start', 'continue', 'pick', 'call', 'come', 'right', 'take', 'report', 'lead', 'break', 'paramedic', 'receive', 'get', 'inform', 'room', 'head']

我真的不喜欢上面的主题,所以我通常会修改代码,如下所示:

代码语言:javascript
复制
for idx, topic in lda_model.show_topics(formatted=False, num_words= 30):
    print('Topic: {} \nWords: {}'.format(idx, '|'.join([w[0] for w in topic])))

..。输出(显示的前两个主题)如下所示。

代码语言:javascript
复制
Topic: 0 
Words: associate|incident|time|task|pain|amcare|work|ppe|train|proper|report|standard|pmv|level|perform|wear|date|factor|overtime|location|area|yes|new|treatment|start|stretch|assign|condition|participate|environmental
Topic: 1 
Words: work|associate|cage|aid|shift|leave|area|eye|incident|aider|hit|pit|manager|return|start|continue|pick|call|come|right|take|report|lead|break|paramedic|receive|get|inform|room|head
票数 7
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/15016025

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档