问当我提供一个文本文件作为输入时，如何才能得到一个pos标记的文件作为输出？
EN

Stack Overflow用户

提问于 2019-02-28 07:18:17

回答 1查看 84关注 0票数 0

这是我正在尝试的代码，但是代码正在生成一个错误。

import nltk
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize, sent_tokenize 
stop_words = set(stopwords.words('english')) 

file_content = open("Dictionary.txt").read()
tokens = nltk.word_tokenize(file_content)

# sent_tokenize is one of instances of 
# PunktSentenceTokenizer from the nltk.tokenize.punkt module 

tokenized = sent_tokenize(tokens) 
for i in tokenized: 
    
    # Word tokenizers is used to find the words 
    # and punctuation in a string 
    wordsList = nltk.word_tokenize(i) 

    # removing stop words from wordList 
    wordsList = [w for w in wordsList if not w in stop_words] 

    # Using a Tagger. Which is part-of-speech 
    # tagger or POS-tagger. 
    tagged = nltk.pos_tag(wordsList) 

    print(tagged)

错误：

回溯(最近一次调用)：文件"tag.py"，第12行，在sent_tokenize(令牌)文件第105行，在sent_tokenize中返回tokenizer.tokenize(文本)文件tokenizer.tokenize 第1269行，在标记返回列表( "/home/mahadev/anaconda3/lib/python3.7/site-packages/nltk/tokenize/punkt.py"，(self.sentences_from_text(text，realign_boundaries))中) 第1323行，在"/home/mahadev/anaconda3/lib/python3.7/site-packages/nltk/tokenize/punkt.py"，中返回[text:e for s，e in self.span_tokenize(text，realign_boundaries)] File self.span_tokenize 第1323行，作为回报[text:e for s，e in self.span_tokenize(text，realign_boundaries)] File realign_boundaries 第1313行，用于片中sl的span_tokenize : File span_tokenize 第1354行，_realign_boundaries for sl1，sl2 in _pair_iter(片)：File sl1 第317行，在"/home/mahadev/anaconda3/lib/python3.7/site-packages/nltk/tokenize/punkt.py"，prev = next(it) _pair_iter中第1327行，在self._lang_vars.period_context_re().finditer(text)：TypeError中用于匹配的_slices_from_text :预期的字符串或类似字节的对象

python

text

part-of-speech

回答 1

Stack Overflow用户

发布于 2019-02-28 09:19:48

不知道您的代码应该做什么，但是您得到的错误是由令牌变量的数据类型造成的。它需要字符串，但它得到了一个不同数据类型的列表。

您应该将该行更改为：

tokens = str(nltk.word_tokenize(file_content))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54920358

复制

相似问题

问当我提供一个文本文件作为输入时，如何才能得到一个pos标记的文件作为输出？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当我提供一个文本文件作为输入时，如何才能得到一个pos标记的文件作为输出？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当我提供一个文本文件作为输入时，如何才能得到一个pos标记的文件作为输出？
EN