问寻找一个"word-web“库，最好是python
EN

Stack Overflow用户

提问于 2021-11-18 20:30:00

回答 1查看 26关注 0票数 0

我正在寻找一个实用程序(库)，它将接受一个关键字集合(可能20:例如，从一个文本语料库上运行的LDA的结果)，并返回一些(2-5)词的描述，什么是最好的联系在一起的原始单词集合。这样的实用程序可能通过查找每个关键字的同义词(例如，使用WordNet)，向它们添加这些同义词的同义词，然后找到表示最大重叠的短词短语(可能在K-means意义上)。有没有人知道这样一个实用工具。

nlp

lda

回答 1

Stack Overflow用户

发布于 2021-11-19 13:47:44

如果我们处理Wordnet和单个单词，它们可能是你正在寻找的最低常见的上位词，也就是说，最具体的概念，比如你所有的单词都是这个概念的特例。

根据答案Find lowest common hypernym given multiple words in WordsNet (Python)，我们可以编写一个查找LCH的函数，如下所示：

import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn

def find_common(words):
    all_hypernyms = {}
    for word in words:
        synsets = wn.synsets(word)
        if not synsets:
            print(f'word "{word}" has no synsets, skipping it')
            continue
        all_hypernyms[word] = set(
                self_synset
                for synset in synsets
                for self_synsets in synset._iter_hypernym_lists()
                for self_synset in self_synsets
            )
    if not all_hypernyms:
        print("No valid words to calculate hyprnyms")
        return 
    common_hypernyms = set.intersection(*all_hypernyms.values())
    if not common_hypernyms:
        print("The words have no common hypernyms")
        return
    ordered_hypernyms = sorted(common_hypernyms, key=lambda x: -x.max_depth())
    return ordered_hypernyms[0]

然后，您可以使用此函数查找一组单词的最低常见上位词(如果有)

result = find_common(['cat', 'dog', 'mouse', 'wtf'])
# word "wtf" has no synsets, skipping it
print(result.lemma_names()[0])
# placental
print(result.definition())
# mammals having a placenta; all mammals except monotremes and marsupials

result = find_common(['house', 'cathedral', 'castle'])
print(result.lemma_names()[0])
# building
print(result.definition())
# a structure that has a roof and walls and stands more or less permanently in one place

当然，如果我们添加一个与集合中所有其他单词没有密切关系的单词，这将会中断。但是，如果您对单词执行凝聚聚类(当单词之间的距离是它们之间的最短wordnet路径时)，您可以处理这些异常值，以找到适合在一起的单词子集。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70026324

复制

相似问题

问寻找一个"word-web“库，最好是python
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问寻找一个"word-web“库，最好是pythonEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问寻找一个"word-web“库，最好是python
EN