文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在自然语言处理中筛选专有名词后辨别一般人名

问如何在自然语言处理中筛选专有名词后辨别一般人名
EN

Stack Overflow用户

提问于 2020-10-12 14:44:12

回答 1查看 202关注 0票数 1

背景

我想知道在NLP中筛选专有名词后如何区分一般人的名字。

preferred output
['Hanna', 'Mike', 'Cathy', 'Tom']

问题

我可以使用nlp库提取专有名词，比如spaCy，但是

输出

['Hawaii', 'Hanna', 'Mike', 'Barbacoa', 'Mexico', 'Cathy', 'Tom']

代码

import spacy
nlp = spacy.load("en_core_web_sm")

ppnouns = []

texts = [
"Mike, Tom, Cathy agreed; it was a magnificent evening.",

"Mike hopes that, when he's built up my savings, he'll be able to travel to Mexico to eat Barbacoa.",

"Of all the places to travel, Hawaii is at the top of Tom's list.",

"Would you like to travel with Hanna?"
]

#extract proper nouns
for i in range(len(texts)):
    text = texts[i]
    for word in nlp(text):
        if word.pos_ == 'PROPN':
            ppnouns.append(word.text)

print(list(set(ppnouns)))

原文来自以下网页：https://examples.yourdictionary.com/reference/examples/examples-of-complete-sentences.html

我已经为我的代码编辑了上面的示例句子。

我想要做的是

我试图使用一个大型的英语词汇数据库WordNet找出类别，但是没有人的名字或不同的类别的结果。

我目前的输入和输出都很小，但我计划处理更多的大输入，所以我没有像下面这样自己创建字典。

dic = {'given_names'['Jack', 'Mike', 'Mary', 'Cathy', 'Tom', 'Jessica', 'Megan', 'Hanna'], 
'family_names':['Smith', 'Miller', 'Lopez', 'Williams', 'Johnson']}

我该如何解决这个问题？是否有任何解决方案或工具来实现我想要做的事情？

form - WordNet Search - 3.1

#input
Hanna
#output
Your search did not return any results.

#input
Tom
#output
S: (n) tom, tomcat (male cat)
S: (n) turkey cock, gobbler, tom, tom turkey (male turkey)

开发环境

Python 3.8

python-3.x

nlp

nltk

spacy

wordnet

Stack Overflow用户

回答已采纳

发布于 2020-10-12 15:46:32

您要做的是提取标签为"PERSON“的命名实体。使用当前的spacy，您可以执行以下操作：

import spacy
nlp = spacy.load("en_core_web_sm")

texts = [
"Mike, Tom, Cathy agreed; it was a magnificent evening.",
"Mike hopes that, when he's built up my savings, he'll be able to travel to Mexico to eat Barbacoa.",
"Of all the places to travel, Hawaii is at the top of Tom's list.",
"Would you like to travel with Hanna?"
]

docs = nlp.pipe(texts)

names = []
for doc in docs:
    names.extend([ent for ent in doc.ents if ent.label_=="PERSON"])
print(names)
[Mike, Tom, Cathy, Mike, Tom]

请注意，列表中缺少Hanna，这意味着spacy的概率语言模型不会将其识别为名称。如果你想要一个确定性的模型，最好是定义一个字典来表示你想要提取的东西。

票数 2

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64312886

复制

相似问题

问如何在自然语言处理中筛选专有名词后辨别一般人名
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在自然语言处理中筛选专有名词后辨别一般人名EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在自然语言处理中筛选专有名词后辨别一般人名
EN