文章/答案/技术大牛

发布

社区首页 >问答首页 >NLTK中的单字标注

问NLTK中的单字标注
EN

Stack Overflow用户

提问于 2020-03-03 03:46:55

回答 1查看 460关注 0票数 1

使用NLTK Unigram Tagger，我正在用Brown Corpus训练句子

我尝试不同的categories，得到了大致相同的值。价值在0.9328附近..。对于每一个categories，如fiction、romance或humor

from nltk.corpus import brown


# Fiction    
brown_tagged_sents = brown.tagged_sents(categories='fiction')
brown_sents = brown.sents(categories='fiction')
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.evaluate(brown_tagged_sents)
>>> 0.9415956079897209

# Romance
brown_tagged_sents = brown.tagged_sents(categories='romance')
brown_sents = brown.sents(categories='romance')
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.evaluate(brown_tagged_sents)
>>> 0.9348490474422324

为什么是这种情况？是因为他们来自同一个corpus吗？或者他们的part-of-speech标签是一样的？

nlp

nltk

stanford-nlp

allennlp

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-03-03 15:45:19

看起来，您正在进行培训，然后在相同的培训数据上评估经过培训的UnigramTagger。看看nltk.tag的文档，特别是关于评估的零件。

使用您的代码，您将获得一个很明显的高分，因为您的培训数据和评估/测试数据是相同的。如果要更改测试数据与培训数据不同的地方，则会得到不同的结果。我的例子如下：

类别:小说

这里我使用了训练集作为brown.tagged_sents(categories='fiction')[:500]，测试/评估集使用为brown.tagged_sents(categories='fiction')[501:600]。

from nltk.corpus import brown
import nltk

# Fiction    
brown_tagged_sents = brown.tagged_sents(categories='fiction')[:500]
brown_sents = brown.sents(categories='fiction') # not sure what this line is doing here
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.evaluate(brown.tagged_sents(categories='fiction')[501:600])

这给了你0.7474610697359513分

类别:浪漫

这里我使用了训练集作为brown.tagged_sents(categories='romance')[:500]，测试/评估集使用为brown.tagged_sents(categories='romance')[501:600]。

from nltk.corpus import brown
import nltk

# Romance
brown_tagged_sents = brown.tagged_sents(categories='romance')[:500]
brown_sents = brown.sents(categories='romance') # not sure what this line is doing here
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.evaluate(brown.tagged_sents(categories='romance')[501:600])

这给了你0.7046799354491662分

我希望这能帮助和回答你的问题。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60499791

复制

相似问题

问NLTK中的单字标注
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NLTK中的单字标注EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NLTK中的单字标注
EN