在NLTK或SpaCy中有没有一个函数可以提供所有可能的术语,这些术语可以从给定的引理单词中派生出来?例如:如果引理是“呼吸”,我需要“呼吸”的所有派生术语,如“呼吸”,“呼吸”等。如果词根是“吃”,我需要术语“吃”,“吃”,“吃”等等。
SpaCy中的.lemma_属性和NLTK中的WordNetLemmatizer()函数可以用来确定一个词的引理,但是我如何做相反的任务,即确定给定引理词中的所有派生项?
发布于 2021-05-01 17:03:51
您可以使用可用作spacy扩展的pyinflect
。你需要先用pip install pyinflect
安装它。使用它的示例:
import spacy
import pyinflect
nlp = spacy.load("en_core_web_sm")
verbs = "eating goes touch felt hit sleeping"
doc = nlp(verbs)
for token in doc:
base = token._.inflect("VB")
gerund = token._.inflect("VBG")
past_tense = token._.inflect("VBD")
past_participle = token._.inflect("VBN")
print(token.text, "-", base, "-", gerund, "-", past_tense, "-", past_participle)
# Output:
# eating - eat - eating - ate - eaten
# goes - go - going - went - gone
# touch - touch - touching - touched - touched
# felt - feel - feeling - felt - felt
# hit - hit - hitting - hit - hit
# sleeping - sleep - sleeping - slept - slept
编辑:要获得带有简短描述的整个标签列表,请运行以下代码:
nlp = spacy.load('en_core_web_sm')
for label in nlp.get_pipe("tagger").labels:
print(label, " -- ", spacy.explain(label))
https://stackoverflow.com/questions/67342461
复制相似问题