我想得到符号(单词的字母)的二元模型。例如,对于单词"done“和"dog”,我希望能够找到二元语法"do“。
我试着用gensim.Phrases写,但对我不起作用。
下面是我的代码:
from gensim.models import Phrases
documents = ["God", "Good","happy","hangry","pypi"]
documents_proc = [list(doc) for doc in documents]
bigram = Phrases(documents_proc, min_count=1)
trigram = Phrases(bigram[documents_proc], min_count=1)
for sent in documents_proc:
print(sent, bigram[sent])
bigrams_ = [b for b in bigram[sent] if b.count('_') == 1]
trigrams_ = [t for t in trigram[bigram[sent]] if t.count('_') == 2]
print(bigrams_)
print(trigrams_)
print()
我期望得到['Go', 'od', 'ha', 'py']
的输出,但是输出中没有任何内容。我做错了什么?
谢谢。
https://stackoverflow.com/questions/55892073
复制相似问题