首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >使用python的stanford-nlp中的回指解析

使用python的stanford-nlp中的回指解析
EN

Stack Overflow用户
提问于 2018-04-24 22:54:54
回答 3查看 3.3K关注 0票数 5

我正在尝试做回指解析,下面是我的代码。

首先,我导航到下载stanford模块的文件夹。然后,我在命令提示符下运行该命令来初始化stanford nlp模块

代码语言:javascript
复制
java -mx4g -cp "*;stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

在那之后,我用Python执行以下代码

代码语言:javascript
复制
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

我想把句子Tom is a smart boy. He know a lot of thing.改成Tom is a smart boy. Tom know a lot of thing.,Python语言中没有可用的教程和帮助。

我所能做的就是用Python对下面的代码进行注释

共指消解

代码语言:javascript
复制
output = nlp.annotate(sentence, properties={'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

并通过解析coref

代码语言:javascript
复制
coreferences = output['corefs']

我在JSON下面

代码语言:javascript
复制
coreferences

{u'1': [{u'animacy': u'ANIMATE',
   u'endIndex': 2,
   u'gender': u'MALE',
   u'headIndex': 1,
   u'id': 1,
   u'isRepresentativeMention': True,
   u'number': u'SINGULAR',
   u'position': [1, 1],
   u'sentNum': 1,
   u'startIndex': 1,
   u'text': u'Tom',
   u'type': u'PROPER'},
  {u'animacy': u'ANIMATE',
   u'endIndex': 6,
   u'gender': u'MALE',
   u'headIndex': 5,
   u'id': 2,
   u'isRepresentativeMention': False,
   u'number': u'SINGULAR',
   u'position': [1, 2],
   u'sentNum': 1,
   u'startIndex': 3,
   u'text': u'a smart boy',
   u'type': u'NOMINAL'},
  {u'animacy': u'ANIMATE',
   u'endIndex': 2,
   u'gender': u'MALE',
   u'headIndex': 1,
   u'id': 3,
   u'isRepresentativeMention': False,
   u'number': u'SINGULAR',
   u'position': [2, 1],
   u'sentNum': 2,
   u'startIndex': 1,
   u'text': u'He',
   u'type': u'PRONOMINAL'}],
 u'4': [{u'animacy': u'INANIMATE',
   u'endIndex': 7,
   u'gender': u'NEUTRAL',
   u'headIndex': 4,
   u'id': 4,
   u'isRepresentativeMention': True,
   u'number': u'SINGULAR',
   u'position': [2, 2],
   u'sentNum': 2,
   u'startIndex': 3,
   u'text': u'a lot of thing',
   u'type': u'NOMINAL'}]}

对此有什么帮助吗?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-08-10 04:37:00

以下是使用CoreNLP输出的数据结构的一种可能的解决方案。所有信息都已提供。这并不是一个完整的解决方案,可能需要扩展来处理所有情况,但这是一个很好的起点。

代码语言:javascript
复制
from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')


def resolve(corenlp_output):
    """ Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
    for coref in corenlp_output['corefs']:
        mentions = corenlp_output['corefs'][coref]
        antecedent = mentions[0]  # the antecedent is the first mention in the coreference chain
        for j in range(1, len(mentions)):
            mention = mentions[j]
            if mention['type'] == 'PRONOMINAL':
                # get the attributes of the target mention in the corresponding sentence
                target_sentence = mention['sentNum']
                target_token = mention['startIndex'] - 1
                # transfer the antecedent's word form to the appropriate token in the sentence
                corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']


def print_resolved(corenlp_output):
    """ Print the "resolved" output """
    possessives = ['hers', 'his', 'their', 'theirs']
    for sentence in corenlp_output['sentences']:
        for token in sentence['tokens']:
            output_word = token['word']
            # check lemmas as well as tags for possessive pronouns in case of tagging errors
            if token['lemma'] in possessives or token['pos'] == 'PRP$':
                output_word += "'s"  # add the possessive morpheme
            output_word += token['after']
            print(output_word, end='')


text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
       "hers is blue. It is older than hers. The big cat ate its dinner."

output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

resolve(output)

print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)

这将产生以下输出:

代码语言:javascript
复制
Original: Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but hers is blue. It is older than hers. The big cat ate his dinner.
Resolved: Tom and Jane are good friends. Tom and Jane are cool. Tom knows a lot of things and so does Jane. Tom's car is red, but Jane's is blue. His car is older than Jane's. The big cat ate The big cat's dinner.

正如您所看到的,当代词具有句子首字母(title- case )先行词(最后一句中的"the big cat“而不是”the big cat“)时,此解决方案不处理纠正大小写的问题。这取决于先行词的类别-普通名词先行词需要小写,而专有名词先行词则不需要。其他一些特殊处理可能是必要的(对于我测试句子中的所有格)。它还假定您不希望重用原始输出令牌,因为它们已被此代码修改。解决这个问题的一种方法是复制原始数据结构,或者创建一个新属性并相应地更改print_resolved函数。纠正任何解析错误也是另一个挑战!

票数 6
EN

Stack Overflow用户

发布于 2018-07-13 12:29:07

我也遇到过类似的问题。在尝试了核心nlp之后,我使用神经coref解决了它。通过使用以下代码,您可以轻松地通过neural coref完成这项工作:

代码语言:javascript
复制
import spacy

nlp = spacy.load('en_coref_md')

doc = nlp(u'Phone area code will be valid only when all the below conditions are met. It cannot be left blank. It should be numeric. It cannot be less than 200. Minimum number of digits should be 3. ')

print(doc._.coref_clusters)

print(doc._.coref_resolved)

上述代码的输出为:

[Phone area code: [Phone area code, It, It, It]]

只有在满足以下所有条件时,电话区号才有效。电话区号不能为空。电话区号应为数字。电话区号不能小于200。最小位数应为3。

为此,您将需要spacy,以及英语模型,可以是en_coref_mden_coref_lgen_coref_sm。您可以参考以下链接以获得更好的解释:

https://github.com/huggingface/neuralcoref

票数 3
EN

Stack Overflow用户

发布于 2019-08-13 00:34:12

代码语言:javascript
复制
from stanfordnlp.server import CoreNLPClient
from nltk import tokenize

client = CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'parse', 'coref'], memory='4G', endpoint='http://localhost:9001')

def pronoun_resolution(text):

    ann = client.annotate(text)
    modified_text = tokenize.sent_tokenize(text)

    for coref in ann.corefChain:

        antecedent = []
        for mention in coref.mention:
            phrase = []
            for i in range(mention.beginIndex, mention.endIndex):
                phrase.append(ann.sentence[mention.sentenceIndex].token[i].word)
            if antecedent == []:
                antecedent = ' '.join(word for word in phrase)
            else:
                anaphor = ' '.join(word for word in phrase)
                modified_text[mention.sentenceIndex] = modified_text[mention.sentenceIndex].replace(anaphor, antecedent)

    modified_text = ' '.join(modified_text)

    return modified_text

text = 'Tom is a smart boy. He knows a lot of things.'
pronoun_resolution(text)

输出:“汤姆是个聪明的孩子,他知道很多事情。”

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50004797

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档