文章/答案/技术大牛

发布

社区首页 >问答首页 >NLTK的BLEU分数和SacreBLEU有什么不同？

问NLTK的BLEU分数和SacreBLEU有什么不同？
EN

Stack Overflow用户

提问于 2020-12-26 16:05:00

回答 1查看 1.4K关注 0票数 1

我很好奇是否有人熟悉使用NLTK's BLEU score calculation和SacreBLEU library之间的区别。

特别是，我使用的是两个库中句子的BLEU分数，在整个数据集上进行平均。两者给出了不同的结果：

>>> from nltk.translate import bleu_score
>>> from sacrebleu import sentence_bleu
>>> print(len(predictions))
256
>>> print(len(targets))
256
>>> prediction = "this is the first: the world's the world's the world's the \
... world's the world's the world's the world's the world's the world's the world \
... of the world of the world'"
...
>>> target = "al gore: so the alliance for climate change has launched two campaigns."
>>> print(bleu_score.sentence_bleu([target], prediction))
0.05422283394039736
>>> print(sentence_bleu(prediction, [target]).score)
0.0
>>> print(sacrebleu.corpus_bleu(predictions, [targets]).score)
0.678758518214081
>>> print(bleu_score.corpus_bleu([targets], [predictions]))
0

正如您所看到的，有很多令人困惑的不一致之处。我的BLEU分数不可能是67.8%，但也不应该是0% (有很多重叠的n-gram，比如"the")。

如果有人能对此有所了解我将不胜感激。谢谢。

nltk

machine-translation

bleu

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-12-28 18:36:19

NLTK和SacreBLEU使用不同的标记化规则，主要是在处理标点符号的方式上。NLTK使用自己的标记化，而SacreBLEU复制了2002年的原始Perl实现。在NLTK中，标记化规则可能更详细，但它们使数量无法与原始实现相比较。

你从SacreBLEU得到的语料库BLEU不是67.8%，而是0.67% - SacreBLEU的数字已经是100的倍数了，这与NLTK不同。所以，我不会说分数之间有很大的差异。

句子级的BLEU可以使用不同的smoothing techniques，以确保即使3-gram的4-gram精度为零，得分也会得到合理的值。但是，请注意，BLEU作为句子级指标是非常不可靠的。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65454578

复制

相似问题

问NLTK的BLEU分数和SacreBLEU有什么不同？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NLTK的BLEU分数和SacreBLEU有什么不同？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NLTK的BLEU分数和SacreBLEU有什么不同？
EN