文章/答案/技术大牛

发布

社区首页 >问答首页 >如何用Python从nltk.text.Text中读取nltk.book文件？

问如何用Python从nltk.text.Text中读取nltk.book文件？
EN

Stack Overflow用户

提问于 2018-03-14 17:07:07

回答 3查看 4.3K关注 0票数 2

我学到了很多关于自然语言处理的nltk，可以做很多事情，但我无法找到从包中阅读文本的方法。我尝试过这样的方法：

from nltk.book import *
text6 #Brings the title of the text
open(text6).read()
#or
nltk.book.text6.read()

但它似乎不起作用，因为它没有丝状。以前似乎没有人问过这个问题，所以我想答案应该很简单。你知道如何阅读这些文本或如何将它们转换成字符串吗？提前感谢

python

nltk

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-03-15 09:48:59

让我们深入研究一下代码=)

首先，nltk.book代码驻留在https://github.com/nltk/nltk/blob/develop/nltk/book.py上。

如果我们仔细查看，文本将作为一个nltk.Text对象加载，例如，对于来自https://github.com/nltk/nltk/blob/develop/nltk/book.py#L36的text6：

text6 = Text(webtext.words('grail.txt'), name="Monty Python and the Holy Grail")

Text对象来自https://github.com/nltk/nltk/blob/develop/nltk/text.py#L286，您可以阅读更多关于如何从http://www.nltk.org/book/ch02.html中使用它的信息。

webtext是nltk.corpus的一个语料库，因此要获取nltk.book.text6的原始文本，可以直接加载get文本。

>>> from nltk.corpus import webtext
>>> webtext.raw('grail.txt')

只有在加载PlaintextCorpusReader对象时才会出现PlaintextCorpusReader，而不是从Text对象(处理对象)加载：

>>> type(webtext)
<class 'nltk.corpus.reader.plaintext.PlaintextCorpusReader'>
>>> for filename in webtext.fileids():
...     print(filename)
... 
firefox.txt
grail.txt
overheard.txt
pirates.txt
singles.txt
wine.txt

票数 7

Stack Overflow用户

发布于 2018-03-14 18:19:49

看来他们已经为你把它拆开了。

from nltk.book import text6

text6.tokens

票数 2

Stack Overflow用户

发布于 2021-02-21 13:17:13

#生成排序令牌

print(sorted(set(text6))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49283774

复制

相似问题

问如何用Python从nltk.text.Text中读取nltk.book文件？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何用Python从nltk.text.Text中读取nltk.book文件？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何用Python从nltk.text.Text中读取nltk.book文件？
EN