我正在尝试运行python代码,主要是基于NLTK书籍,从我的GujaratiTextCorpus中为ngram POS标记古吉拉特语文本。我遇到了一个ValueError。
我在Windows10中使用Python 3.7.3。我通过anaconda使用jupyter笔记本。我是一个使用python的初学者。我研究了stackoverflow上可用的答案。com来修复我的ValueError,但解决不了它。
import nltk
f = open('C:\\Users\\BHOGAYATA\\Documents\\GujaratiPosTagging\\cts260.txt', encoding = 'utf8')
raw = f.read()
train2_sents = nltk.sent_tokenize(raw)
text2 = nltk.Text(train2_sents)
train2_sents
import nltk
f = open('C:\\Users\\BHOGAYATA\\Documents\\GujaratiPosTagging\\txt42_sents.txt', encoding = 'utf8')
raw = f.read()
bs_sents = nltk.sent_tokenize(raw)
text3 = nltk.Text(bs_sents)
bs_sents
unigram_tagger = nltk.UnigramTagger(train2_sents)
unigram_tagger.tag(bs_sents)我预计这两个古吉拉特语句子的单词将被标记为POS。我发现以下错误消息:
ValueError
Traceback (most recent call last)
<ipython-input-3-5fae0b92393e> in <module>
11 text3 = nltk.Text(bs_sents)
12 bs_sents
---> 13 unigram_tagger = nltk.UnigramTagger(train2_sents)
14 unigram_tagger.tag(bs_sents)
15
~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in __init__(self, train, model, backoff, cutoff, verbose)
344
345 def __init__(self, train=None, model=None, backoff=None, cutoff=0, verbose=False):
--> 346 NgramTagger.__init__(self, 1, train, model, backoff, cutoff, verbose)
347
348 def encode_json_obj(self):
~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in __init__(self, n, train, model, backoff, cutoff, verbose)
293
294 if train:
--> 295 self._train(train, cutoff, verbose)
296
297 def encode_json_obj(self):
~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in _train(self, tagged_corpus, cutoff, verbose)
181 fd = ConditionalFreqDist()
182 for sentence in tagged_corpus:
--> 183 tokens, tags = zip(*sentence)
184 for index, (token, tag) in enumerate(sentence):
185 # Record the event.
ValueError: not enough values to unpack (expected 2, got 1)发布于 2019-07-27 15:09:10
它说明您传递的变量有一个输出,但您期望有两个输出。
例如:
for a, b in [("a", "b")]:
print("a:", a, "b:", b)
This will work
for a, b in [("a")]:
print("a:", a, "b:", b)
This will not work编辑:
查看第一个参数的UnigramTagger,它采用以下类型的标记句子列表
list(list(tuple(str, str)))您给出的train2_sents类型为
list(tuple(str,str)您的list(tuple(str,str)与train2_sents相同的位置
https://stackoverflow.com/questions/57229847
复制相似问题