文章/答案/技术大牛

发布

社区首页 >问答首页 >将Hunspell与Python配合使用进行拼写无法处理带符号的葡萄牙语单词

问将Hunspell与Python配合使用进行拼写无法处理带符号的葡萄牙语单词
EN

Stack Overflow用户

提问于 2018-07-04 22:10:25

回答 2查看 989关注 0票数 1

我正在尝试纠正拼写错误，为此，我将Spacy与Hunspell和Python一起使用。我写了以下代码来查找"cardaço“的建议单词，这在葡萄牙语中写"cadarço”是一个错误的方式。

import hunspell
from spacy.tokens import Token
import spacy

class spaCyHunSpell(object):
    name = 'spacy_hunspell'

    def __init__(self, dic_path, aff_path):
        self.hobj = hunspell.HunSpell(dic_path, aff_path)
        Token.set_extension('hunspell_spell', default=None)
        Token.set_extension('hunspell_suggest', getter=self.get_suggestion)

    def __call__(self, doc):
        for token in doc:
            token._.hunspell_spell = self.hobj.spell(token.text)
        return doc

    def get_suggestion(self, token):
        return self.hobj.suggest(token.text)

nlp = spacy.load('pt')
hunspell = spaCyHunSpell('/usr/share/hunspell/pt_BR.dic',     '/usr/share/hunspell/pt_BR.aff')
nlp.add_pipe(hunspell)
doc = nlp(u'cardaço')
print(doc[0]._.hunspell_suggest)

我已经正确安装了所有的库，上面的代码可以很好地处理单词"feninine“。我的问题是"ç“。

我得到的错误是：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 5: invalid continuation byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/netshoes/PycharmProjects/migracao-sas/modelo_NICHO2/main.py", line 29, in <module>
    print(doc[0]._.hunspell_suggest)
  File "/usr/local/lib/python3.6/dist-packages/spacy/tokens/underscore.py", line 31, in __getattr__
    return getter(self._obj)
  File "/home/netshoes/PycharmProjects/migracao-sas/modelo_NICHO2/main.py", line 23, in get_suggestion
    return self.hobj.suggest(token.text)
SystemError: <built-in method suggest of HunSpell object at 0x7f6b3560fe50> returned a result with an error set

我尝试使用unidecode，但没有成功。

我的Python版本是3.6

spacy

hunspell

python-3.x

nlp

Stack Overflow用户

发布于 2019-07-24 23:20:58

如果有人对此有意见的话就给我一张纸条。有一个名为spacy_hunspell的包，它是针对Python和spaCy的Hunspell的包装器。它使用的是Hunspell python版本的0.5.0，它的编码问题就像在这个线程中提到的here一样。

要解决这个问题，只需将spacy_hunspell的setup.py文件修改为hunspell==0.5.5，这个问题就解决了。

票数 0

查看全部 2 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51175686

复制

相似问题

问将Hunspell与Python配合使用进行拼写无法处理带符号的葡萄牙语单词
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将Hunspell与Python配合使用进行拼写无法处理带符号的葡萄牙语单词EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将Hunspell与Python配合使用进行拼写无法处理带符号的葡萄牙语单词
EN