使用python的stanford-nlp中的回指分辨率

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (2)
  • 关注 (0)
  • 查看 (221)

我正在尝试做回指分辨率,下面是我的代码。

首先,我导航到我已下载stanford模块的文件夹。然后我在命令提示符下运行命令来初始化stanford nlp模块

java -mx4g -cp "*;stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

之后我在Python中执行下面的代码

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

我想将句子更改Tom is a smart boy. He know a lot of thing.为,Tom is a smart boy. Tom know a lot of thing.并且没有Python中的教程或任何帮助。

我能做的就是用Python下面的代码注释

共同决议

output = nlp.annotate(sentence, properties={'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

并通过解析coref

coreferences = output['corefs']

我低于JSON

coreferences

{u'1': [{u'animacy': u'ANIMATE',
   u'endIndex': 2,
   u'gender': u'MALE',
   u'headIndex': 1,
   u'id': 1,
   u'isRepresentativeMention': True,
   u'number': u'SINGULAR',
   u'position': [1, 1],
   u'sentNum': 1,
   u'startIndex': 1,
   u'text': u'Tom',
   u'type': u'PROPER'},
  {u'animacy': u'ANIMATE',
   u'endIndex': 6,
   u'gender': u'MALE',
   u'headIndex': 5,
   u'id': 2,
   u'isRepresentativeMention': False,
   u'number': u'SINGULAR',
   u'position': [1, 2],
   u'sentNum': 1,
   u'startIndex': 3,
   u'text': u'a smart boy',
   u'type': u'NOMINAL'},
  {u'animacy': u'ANIMATE',
   u'endIndex': 2,
   u'gender': u'MALE',
   u'headIndex': 1,
   u'id': 3,
   u'isRepresentativeMention': False,
   u'number': u'SINGULAR',
   u'position': [2, 1],
   u'sentNum': 2,
   u'startIndex': 1,
   u'text': u'He',
   u'type': u'PRONOMINAL'}],
 u'4': [{u'animacy': u'INANIMATE',
   u'endIndex': 7,
   u'gender': u'NEUTRAL',
   u'headIndex': 4,
   u'id': 4,
   u'isRepresentativeMention': True,
   u'number': u'SINGULAR',
   u'position': [2, 2],
   u'sentNum': 2,
   u'startIndex': 3,
   u'text': u'a lot of thing',
   u'type': u'NOMINAL'}]}

对此有何帮助?

提问于
用户回答回答于

这里有一个可能的解决方案,它使用CoreNLP输出的数据结构。所有信息都提供了。这并不是一个完整的解决方案-可能需要扩展来处理所有情况,但这是一个很好的起点。

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')


def resolve(corenlp_output):
    """ Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
    for coref in corenlp_output['corefs']:
        mentions = corenlp_output['corefs'][coref]
        antecedent = mentions[0]  # the antecedent is the first mention in the coreference chain
        for j in range(1, len(mentions)):
            mention = mentions[j]
            if mention['type'] == 'PRONOMINAL':
                # get the attributes of the target mention in the corresponding sentence
                target_sentence = mention['sentNum']
                target_token = mention['startIndex'] - 1
                # transfer the antecedent's word form to the appropriate token in the sentence
                corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']


def print_resolved(corenlp_output):
    """ Print the "resolved" output """
    possessives = ['hers', 'his', 'their', 'theirs']
    for sentence in corenlp_output['sentences']:
        for token in sentence['tokens']:
            output_word = token['word']
            # check lemmas as well as tags for possessive pronouns in case of tagging errors
            if token['lemma'] in possessives or token['pos'] == 'PRP$':
                output_word += "'s"  # add the possessive morpheme
            output_word += token['after']
            print(output_word, end='')


text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
       "hers is blue. It is older than hers. The big cat ate its dinner."

output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

resolve(output)

print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)

This gives the following output:

Original: Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but hers is blue. It is older than hers. The big cat ate his dinner.
Resolved: Tom and Jane are good friends. Tom and Jane are cool. Tom knows a lot of things and so does Jane. Tom's car is red, but Jane's is blue. His car is older than Jane's. The big cat ate The big cat's dinner.

正如您所看到的,当代词有一个句子的首字母(标题-大小写)先行词(“大猫”而不是最后一句中的“大猫”)时,这个解决方案并不涉及纠正这个案例。这取决于先行词的类别-普通名词先行词需要使用较低的词,而正确的前语则不需要。其他一些特殊处理可能是必要的(如我测试句中的所有物)。它还假定您不希望重用原始的输出令牌,因为它们是由这段代码修改的。一种解决办法就是复制一份。纠正任何决议错误也是另一个挑战!

用户回答回答于

我有类似的问题。在尝试使用核心nlp之后,我使用神经coref解决了它。您可以使用以下代码轻松地通过neural coref完成工作:

进口spacy

nlp = spacy.load('en_coref_md')

doc = nlp(u'Phone区号仅在满足以下所有条件时才有效。不能留空。它应为数字。不能小于200.最小位数应为​​3.')

打印(DOC ._。coref_clusters)

打印(DOC ._。coref_resolved)

以上代码的输出是:[电话区号:[电话区号,它,它,它]]

电话区号仅在满足以下所有条件时才有效。电话区号不能留空。电话区号应为数字。电话区号不能小于200.最小位数应为​​3。

为此,您需要具有spacy以及可以是en_coref_md或en_coref_lg或en_coref_sm的英语模型。您可以参考以下链接以获得更好的解释:

https://github.com/huggingface/neuralcoref

扫码关注云+社区

领取腾讯云代金券