问如何从gensim打印LDA主题模型？Python
EN

Stack Overflow用户

提问于 2013-02-22 10:47:42

回答 10查看 40.8K关注 0票数 27

使用gensim，我能够从LSA中的一组文档中提取主题，但是如何访问从LDA模型生成的主题呢？

打印lda.print_topics(10)时，由于print_topics()返回NoneType，代码显示以下错误

Traceback (most recent call last):
  File "/home/alvas/workspace/XLINGTOP/xlingtop.py", line 93, in <module>
    for top in lda.print_topics(2):
TypeError: 'NoneType' object is not iterable

代码：

from gensim import corpora, models, similarities
from gensim.models import hdpmodel, ldamodel
from itertools import izip

documents = ["Human machine interface for lab abc computer applications",
              "A survey of user opinion of computer system response time",
              "The EPS user interface management system",
              "System and human system engineering testing of EPS",
              "Relation of user perceived response time to error measurement",
              "The generation of random binary unordered trees",
              "The intersection graph of paths in trees",
              "Graph minors IV Widths of trees and well quasi ordering",
              "Graph minors A survey"]

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

# remove words that appear only once
all_tokens = sum(texts, [])
tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1)
texts = [[word for word in text if word not in tokens_once]
         for text in texts]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# I can print out the topics for LSA
lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)
corpus_lsi = lsi[corpus]

for l,t in izip(corpus_lsi,corpus):
  print l,"#",t
print
for top in lsi.print_topics(2):
  print top

# I can print out the documents and which is the most probable topics for each doc.
lda = ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50)
corpus_lda = lda[corpus]

for l,t in izip(corpus_lda,corpus):
  print l,"#",t
print

# But I am unable to print out the topics, how should i do it?
for top in lda.print_topics(10):
  print top

python

nlp

lda

topic-modeling

gensim

回答 10

Stack Overflow用户

发布于 2013-02-22 11:00:40

经过一番折腾之后，ldamodel版的print_topics(numoftopics)似乎有了一些bug。因此，我的解决方法是使用print_topic(topicid)

>>> print lda.print_topics()
None
>>> for i in range(0, lda.num_topics-1):
>>>  print lda.print_topic(i)
0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system
...

票数 21

Stack Overflow用户

发布于 2015-04-29 17:04:57

我认为show_topics的语法随着时间的推移而发生了变化：

show_topics(num_topics=10, num_words=10, log=False, formatted=True)

对于num_topics主题数量，返回num_words most default(默认情况下，每个主题10个单词)。

主题以列表形式返回-如果格式化为True，则返回字符串列表；如果为False，则返回(probability，word) 2元组列表。

如果log为True，也将此结果输出到log。

与LSA不同，LDA中的主题之间没有自然的顺序。因此，返回的所有主题的num_topics <= self.num_topics子集是任意的，并且可能在两次LDA训练运行之间发生变化。

票数 11

Stack Overflow用户

发布于 2018-10-23 06:29:43

我认为以单词列表的形式查看主题总是更有帮助的。下面的代码片段有助于实现这一目标。我假设您已经有一个名为lda_model的lda模型。

for index, topic in lda_model.show_topics(formatted=False, num_words= 30):
    print('Topic: {} \nWords: {}'.format(idx, [w[0] for w in topic]))

在上面的代码中，我决定显示属于每个主题的前30个单词。为了简单起见，我展示了我得到的第一个主题。

Topic: 0 
Words: ['associate', 'incident', 'time', 'task', 'pain', 'amcare', 'work', 'ppe', 'train', 'proper', 'report', 'standard', 'pmv', 'level', 'perform', 'wear', 'date', 'factor', 'overtime', 'location', 'area', 'yes', 'new', 'treatment', 'start', 'stretch', 'assign', 'condition', 'participate', 'environmental']
Topic: 1 
Words: ['work', 'associate', 'cage', 'aid', 'shift', 'leave', 'area', 'eye', 'incident', 'aider', 'hit', 'pit', 'manager', 'return', 'start', 'continue', 'pick', 'call', 'come', 'right', 'take', 'report', 'lead', 'break', 'paramedic', 'receive', 'get', 'inform', 'room', 'head']

我真的不喜欢上面的主题，所以我通常会修改代码，如下所示：

for idx, topic in lda_model.show_topics(formatted=False, num_words= 30):
    print('Topic: {} \nWords: {}'.format(idx, '|'.join([w[0] for w in topic])))

..。输出(显示的前两个主题)如下所示。

Topic: 0 
Words: associate|incident|time|task|pain|amcare|work|ppe|train|proper|report|standard|pmv|level|perform|wear|date|factor|overtime|location|area|yes|new|treatment|start|stretch|assign|condition|participate|environmental
Topic: 1 
Words: work|associate|cage|aid|shift|leave|area|eye|incident|aider|hit|pit|manager|return|start|continue|pick|call|come|right|take|report|lead|break|paramedic|receive|get|inform|room|head

票数 7

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/15016025

复制

相似问题

问如何从gensim打印LDA主题模型？Python
EN

回答 10

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从gensim打印LDA主题模型？PythonEN

回答 10

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从gensim打印LDA主题模型？Python
EN