NLTK的word_tokenize与str.split()有哪些不同之处？_考虑到与IOS的不兼容性，WebRTC getUserMedia有哪些替代方案？_有哪些方法可以将EF中的长时间运行任务与C#中的Web API分离？ - 腾讯云开发者社区

、、、

我在一些相当大的数据集上使用NLTK执行自然语言处理，并希望利用我的所有处理器核。似乎多处理模块就是我所追求的，当我运行下面的测试代码时，我看到所有的内核都被使用了，但是代码始终没有完成。在没有多个处理的情况下执行相同的任务，大约在一分钟内完成。 debian的Python2.7.11。 from nltk.tokenize import word_tokenize import io import time import multiprocessing as mp def open_file(filepath): #open and parse file file = i

浏览 4提问于2016-02-19得票数 1

1回答

set()和word_tokenize()之间有什么区别？

、、

from nltk.tokenize import sent_tokenize ,word_tokenize sentence = 'jainmiah I love you but you are not bothering about my request, please yaar consider me for the sake' word_tok = word_tokenize(sentence) print(word_tok) set_all = set(word_tokenize(sentence)) print(set_all) 实际

浏览 0提问于2019-02-15得票数 1

回答已采纳

1回答

如何从语料库中删除无意义或不完整的单词？

、、、、

我正在使用一些文本进行一些NLP分析。我已经清除了文本，采取步骤删除非字母数字字符、空格、重复单词和停止词，并执行词干和柠檬化： from nltk.tokenize import word_tokenize import nltk.corpus import re from nltk.stem.snowball import SnowballStemmer from nltk.stem.wordnet import WordNetLemmatizer import pandas as pd data_df = pd.read_csv('path/to/file/data.csv&#

浏览 0提问于2018-07-09得票数 1

1回答

如何使用line_tokenize或word_tokenize使用nltk来分隔新行？

、、

尝试使用word_tokenize和sent_tokenize标记带有新行的段落，但它无法识别新行。尝试通过将其拆分到新行中来将其分成段落，但仍然不起作用。 from nltk import sent_tokenize, word_tokenize, pos_tag para="the new line \n new char" sent=sent_tokenize(para) print(sent) 输出： ['the new line \n new char'] 如果在python中以字符串格式指定数据，但在从docx文件中提取数据时失败，则它可以工作对

浏览 12提问于2018-02-05得票数 1

回答已采纳

2回答

POS标签- NLTK- Python

、

我想使用word_tokenize, pos_tag, FreqDist。我不想默认下载所有的nltk。我想使用nltk.download(info_or_id='')。我应该在info_or_id中添加哪些选项来获取词性标记及其频率。POS标签- Penn Treebank POS。

浏览 1提问于2015-09-07得票数 0

1回答

NLTK字标记不返回任何内容

、、

我试图标记一个句子，我相信代码是正确的，但是没有输出。有什么问题吗？这是密码。 import nltk from nltk.tokenize import word_tokenize text = word_tokenize("And now for something completely different") nltk.pos_tag(text) text = word_tokenize("They refuse to permit us to obtain the refuse permit") nltk.pos_tag(text)

浏览 2提问于2019-11-19得票数 0

回答已采纳

1回答

如何避免标记化中的双引号字符串、站点URL和电子邮件地址

、

我是如何阻止word_tokenize从"pass_word"、"https://www.gmail.com"和"tempemail@mail.com"这样的字符串中分离的？引文应该能阻止这一切，但事实并非如此。我尝试过不同的regex选项。 from nltk import word_tokenize s = 'open "https://www.gmail.com" url. Enter "tempemail@mail.com" in email. Enter "pass_word"

浏览 0提问于2019-08-22得票数 1

回答已采纳

1回答

R-通过网格解析Python树

、、、、

我正在尝试使用Python的NLTK包，在R中使用Retic所得包。在很大程度上，我是成功的。现在，我希望执行命名实体识别(即确定哪些令牌表示命名实体以及它们代表的命名实体的类型)。使用NLTK的ne_chunk()函数。我的问题是，函数返回类nltk.tree.Tree的一个对象，我不知道如何在R中解析该对象。如果ne_chunk()受够了十个标记对，它将返回一个可以使用as.character()转换为字符的结果，该字符可以通过正则表达式函数进行解析(这只是一个黑客，我对此并不满意)。然而，超过10对，它将返回树的速记表示，其中没有使用R方法提取有意义的数据。下面是一个可复制性最低的

浏览 1提问于2018-01-31得票数 2

回答已采纳

2回答

在python nltk.word_tokenize中继续跟踪标点符号

、、、

移除标点符号有很多可用的地方，但我似乎找不到任何东西保留它。如果我这样做了： from nltk import word_tokenize test_str = "Some Co Inc. Other Co L.P." word_tokenize(test_str) Out[1]: ['Some', 'Co', 'Inc.', 'Other', 'Co', 'L.P', '.'] 最后一个“。被推入自己的标记中。然而，如果最后还有一个词，那就是“最后一个”。保存如下

浏览 0提问于2017-02-07得票数 3

回答已采纳

1回答

宏，以在Writer中生成大图。

、

如何使用基本语言生成大图？我可以用Python这样做..。 import nltk, sys from nltk.tokenize import word_tokenize sys.stdout = open("mygram1.txt", "w") with open("mytext.txt") as f: for text in f: tokens = nltk.word_tokenize(text) bigrm = (nltk.bigrams(tokens)) print(*map(' '.join,

浏览 2提问于2022-05-14得票数 0

回答已采纳

4回答

Python (nltk) - UnicodeDecodeError：'ascii‘编解码器无法解码字节

、、、

我是NLTK的新手。我得到了这个错误，我已经搜索了编码/解码，特别是UnicodeDecodeError，但这个错误似乎是NLTK源代码特有的。下面是错误： Traceback (most recent call last): File "A:\Python\Projects\Test\main.py", line 2, in <module> print(pos_tag(word_tokenize("John's big idea isn't all that bad."))) File "A:\Pytho

浏览 0提问于2014-08-26得票数 9

1回答

如何从nltk pos_tag中获取标签集？

、、

我试图从nltk pos_tag中获取完整的标签，但我找不到使用nltk的简单方法。例如，使用tagsets='universal'。 from nltk.tokenize import word_tokenize def nltk_pos(text): token = word_tokenize(text) return (nltk.pos_tag(token)[0])[1] nltk_pos('home') output: 'NN' expected output: 'NOUN'

浏览 8提问于2020-09-22得票数 0

回答已采纳

1回答

将Pandas dataframe中的行分组，应用自定义函数并将结果存储为新的数据格式行

、、、、

我有一个熊猫数据中心( dataframe df_org )，它有三列--索引(整数)、标题(字符串)和日期(日期)。我有一个方法process_title(text)，，它将一个字符串作为输入并标记，删除停止词并对输入字符串进行归一化，并将这些单词作为列表返回。 from nltk.tokenize import word_tokenize from nltk.stem import WordNetLemmatizer from nltk.corpus import stopwords stop_words = set(stopwords.words('english&#

浏览 1提问于2021-08-29得票数 0

回答已采纳

1回答

如何在西班牙语中使用词义消歧？

、

我正在做一个udemy课程(所有的例子都是英文的)，但问题是当我开始使用西班牙语时，总是缺乏库或兼容性。我从https://www.datos.gov.co/Ciencia-Tecnolog-a-e-Innovaci-n/LAS-WordNet-una-WordNet-para-el-espa-ol-obtenida-c/8z8d-85m7下载了CSV格式的数据，但我正在尝试执行以下代码，但由于上下文的描述而崩溃，由于上下文的描述，有人知道如何处理它吗？谢谢 import nltk #nltk.download("omw") from nltk.corpus import w

浏览 6提问于2020-06-16得票数 0

回答已采纳

0回答

为什么NLTK的Text.similar()返回None？

、

现在我正在使用nltk中的类似()方法。但是is并没有像预期的那样工作。请看下面这段代码： from nltk import word_tokenize; import nltk; text = """ The girl is very pretty. """; text = nltk.Text(word_tokenize(text)); text.similar('beautiful'); #it returns "no matches" but pretty is synonym of beautiful.

浏览 2提问于2017-01-05得票数 1

1回答

如何保存到img "dispersion_plot“NLTK？

、

我开始学习NLTK了。有没有一种方法可以保存到镜像的dispersion_plot？这是我的代码： import nltk from nltk import word_tokenize raw="""This is the text where includes the word lawyers""" text1 = nltk.Text(word_tokenize(raw)) print(text1.dispersion_plot(["lawyers"])) 因此，它打印绘图，但我想将其保存到图像文件中。谢谢!

浏览 24提问于2020-08-12得票数 0

2回答

python中NLTK中的POS标记中出现错误的zip文件

、、、、

我是python和NLTK的新手，我想在这里做word标记化和词性标记。我在我的Ubuntu14.04中安装了Nltk3.0，默认的Python2.7.6。首先，我尝试对一个简单的sentence.But进行标记化，我得到了一个错误，告诉我"BadZipfile：..I不是压缩文件“.How解决这个问题？ ..One更多疑问..即当我安装Nltk data (使用命令行)时，我给出的路径为"/usr/share/nltk_data“。由于某些errors.But，包的.Some无法安装。当我使用命令"nltk.data.path”检查时，它显示其他路径，其他路径实际上是

浏览 1提问于2015-01-24得票数 1

1回答

如何使用python在多个文本文件上加载和应用相同的算法

、、、

我是python编程的新手。现在，我正在对文本文件进行自然语言处理。问题是我有大约200个文本文件，所以很难单独加载每个文件并应用相同的方法。下面是我的程序： import nltk from nltk.collocations import * from nltk.tokenize import word_tokenize from nltk.corpus import stopwords from nltk import FreqDist with open("c:/users/user/desktop/datascience/sotu/stopwords.txt", &

浏览 2提问于2014-11-26得票数 0

1回答

是否有BigramTagger工作所需的最小数据大小？

、、

我正在nltk库中学习BigramTagger类。我用nltk附带的棕色语料库来训练一个“部分句子”标签。我注意到，如果我对这个语料库进行训练，然后在语料库的第一句中标注几个单词，它就会非常有效。 from nltk.corpus import brown from nltk.tag import BigramTagger from nltk import word_tokenize # Works completely fine: brown_train = brown.tagged_sents(categories='news') bigram_tagger = Bi

浏览 4提问于2017-08-22得票数 3

回答已采纳

3回答

特定单词的NLTK搭配

、、

我知道如何使用NLTK获得二元语法和三元语法搭配，并将它们应用于我自己的语料库。代码如下。然而，我不确定(1)如何获得特定单词的搭配？(2) NLTK是否有基于对数似然比的搭配度量？ import nltk from nltk.collocations import * from nltk.tokenize import word_tokenize text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentenc

浏览 0提问于2014-01-16得票数 14

回答已采纳

1回答

用于文本分类的nltk naivebayes分类器

、、、、

在下面的代码中，我知道我的naivebayes分类器工作正常，因为它在trainset1上工作正常，但是为什么它不能在trainset2上工作？我甚至在两个分类器上尝试了它，一个来自TextBlob，另一个直接来自nltk。 from textblob.classifiers import NaiveBayesClassifier from textblob import TextBlob from nltk.tokenize import word_tokenize import nltk trainset1 = [('I love this sandwich.',

浏览 12提问于2016-09-06得票数 3

回答已采纳

1回答

我将两个相同的句子与BLEU NLTK进行比较，得不到1.0。为什么？

、、、、

我正在尝试使用来自NLTK的BLEU分数来评估机器翻译的质量。我想用两个相同的句子检查这段代码，这里我使用method1作为平滑函数，因为我比较的是两个句子，而不是语料库。我设置了4-gram和权重0.25 (1/4)。但结果是，我得到了0.0088308。我做错了什么？两个相同的句子应该得到1.0分。我在Python3，Windows7上，用PyCharm编写代码。我的代码： import nltk from nltk import word_tokenize from nltk.translate.bleu_score import SmoothingFunction ref =

浏览 6提问于2021-08-25得票数 1

2回答

Pycharm无法识别nltk (随Anaconda安装)

、、、

我正在使用PyCharm编写一个使用nltk包的程序。我的第一行是： from nltk import word_tokenize, sent_tokenize 我在PyCharm的2.7Python环境(我正在处理的环境)中导入了nltk包，如下所示：但是，PyCharm不能识别from nltk..行。它是灰色的；它还显示了这个错误： This inspection detects names that should resolve but don't. Due to dynamic dispatch and duck typing, this is possible i

浏览 0提问于2016-12-18得票数 0

1回答

nltk wordpunct_tokenize vs word_tokenize

、

有人知道nltk的wordpunct_tokenize和word_tokenize之间的区别吗?我使用的是nltk=3.2.4，但wordpunct_tokenize的文档字符串中没有任何东西可以解释这种不同。我在nltk的文档中也找不到这方面的信息(也许我没有搜索到正确的位置！)。我本以为第一个会去掉标点符号之类的东西，但它没有。

浏览 2提问于2018-05-09得票数 18

回答已采纳

1回答

如何获取数据帧中每一行的特定单词的频率

、、

我正在尝试创建一个函数，它从dataframe获取特定单词的频率。我使用Pandas将CSV文件转换为dataframe和NLTK来标记文本。我能够得到整个列的计数，但是我很难得到每一行的频率。以下是我迄今所做的工作。 import nltk import pandas as pd from nltk.tokenize import word_tokenize from collections import defaultdict words = [ "robot", "automation", "collaborative&#

浏览 2提问于2020-03-18得票数 0

回答已采纳

1回答

word_tokenize TypeError:预期的字符串或缓冲区

、、、、

当调用word_tokenize时，我得到以下错误： File "C:\Python34\lib\site-packages\nltk\tokenize\punkt.py", line 1322, in _slices_from_text for match in self._lang_vars.period_context_re().finditer(text): TypeError: expected string or buffer 我有一个大的文本文件(1500.txt)，我想从其中删除停止词。我的代码如下： from nltk.corpus impor

浏览 1提问于2015-11-18得票数 2

回答已采纳

1回答

英语以外的其他语言的POS

、

我是新来的。这使我可以根据句子的词类对其进行标记。但是，在对其他语言执行此操作时，涉及哪些步骤？ import nltk sentence = "I'm not sure!" tokens = nltk.word_tokenize(sentence) tagged = nltk.pos_tag(tokens) 更新我有兴趣从西班牙语开始。更新2 import nltk from nltk.tokenize import word_tokenize training_set = [[(w.lower(),t) for w,t in s] for s in nltk

浏览 0提问于2016-12-09得票数 1

回答已采纳

1回答

Python未能在我的脚本中“导入nltk`”，但在解释器中工作

、、、

我知道了原因，我把原始脚本文件命名为nltk.py，所以python试图从脚本文件orz导入word_tokenize。为这个愚蠢的错误道歉。我试图在Windows上的Python中使用nltk。我安装了nltk和nltk数据。但是，当我试图在命令行中运行python -u 'filename.py'时，会出现如下错误。 Traceback (most recent call last): File "filename.py", line 1, in (module) from nltk import word_tokenize File

浏览 2提问于2014-04-18得票数 3

回答已采纳

3回答

所有文本都保存在一行中

、、

因此，我尝试使用Python中的NLTK对文本文件进行词性标记。这是我使用的代码 import nltk from nltk import word_tokenize, pos_tag f = open('all.txt') raw = f.read() text = word_tokenize(raw) paosted = nltk.pos_tag(text) saveFile = open('ol.txt', 'w') saveFile.write(str(paosted)) saveFile.close() 代码确实起作用了，但问题是它将所

浏览 3提问于2018-08-27得票数 0

1回答

在nltk标记的文档中使用计算功能

、、、、

我是nltk和python的新手。我试图使用evaluate特性来测试我阅读的文本文档的准确性。这就是我到目前为止所拥有的。 from nltk.tag import UnigramTagger from nltk.corpus import treebank from nltk.tokenize import word_tokenize train_sents = treebank.tagged_sents() tagger = UnigramTagger(train_sents) text1 = "This is the fir

浏览 0提问于2016-04-06得票数 0

1回答

属性错误消息

import nltk import random from nltk.corpus import movie_reviews from nltk.classify.scikitlearn import SklearnClassifier import pickle import sys sys.getdefaultencoding() import os from sklearn.naive_bayes import MultinomialNB, BernoulliNB from sklearn.linear_model import SGDClassifier from nltk.cl

浏览 2提问于2017-02-14得票数 0

回答已采纳

1回答

将word_tokenize转换为句子

、

我是Python nltk的新手目前，我有一个程序可以从句子中执行word_tokenize。然后对word_tokenize进行处理，将一些大小写更正为一些名词。这个过程运行得很好，现在我想再次将处理后的word_tokenize转换为句子。我可以通过一个循环很容易地做到这一点，对于每个显示，我只需要添加空间。但在某些情况下，这对"it's，I'm，don't and etc“这样的单词不起作用。因为word_tokenize将这两个单词分开保存。这样一来，我处理过的word_tokenize将被转换为"it 's，I 'm，don

浏览 4提问于2019-10-24得票数 1

2回答

Sklearn:向CountVectorizer添加柠檬酸盐

、、、

我将柠檬化添加到我的countvectorizer中，就像在这个上解释的那样。 from nltk import word_tokenize from nltk.stem import WordNetLemmatizer class LemmaTokenizer(object): def __init__(self): self.wnl = WordNetLemmatizer() def __call__(self, articles): return [self.wnl.lemmatize(t) for t in word

浏览 4提问于2017-11-21得票数 17

回答已采纳

2回答

西班牙语单词记号

、

我想把西班牙语句子翻译成单词。以下是正确的方法，还是有更好的方法？ import nltk from nltk.tokenize import word_tokenize def spanish_word_tokenize(s): for w in word_tokenize(s): if w[0] in ("¿","¡"): yield w[0] yield w[1:] else: yield w sentenc

浏览 9提问于2016-12-26得票数 3

回答已采纳

2回答

NLTK标签荷兰句

、

我从NLTK开始，想标记一个荷兰句子，但我在指定语料库时遇到了困难。 from nltk.tag import pos_tag from nltk.tokenize import word_tokenize from nltk.corpus import alpino pos_tag(word_tokenize("Python is een goede data science taal."), tagset = 'alpino') 给予， [('Python', 'UNK'), ('is', 'UNK

浏览 2提问于2016-10-24得票数 3

回答已采纳

2回答

如何有效地合并字典中的所有列表？

、、、

我有一个字典df2，其中每个值都是一个单词列表。然后，我想将所有这些列表合并到df3中。为此，我使用一个循环，它需要超过1分钟才能完成。 import nltk from nltk.corpus import twitter_samples from nltk.tokenize import word_tokenize df = twitter_samples.strings('tweets.20150430-223406.json') df2 = {} for i in range(len(df)): df2[i] = word_tokenize(df[i]) d

浏览 3提问于2021-02-26得票数 1

回答已采纳

2回答

为什么Python没有在西班牙语中正确标记？

、、、、

我有以下代码： import nltk sent='El gato está bajo la mesa de cristal.' nltk.pos_tag(word_tokenize(sent), lang='spa') 但输出一点也不准确： [('El', 'NNP'), ('gato', 'NN'), ('está', 'NN'), ('bajo', 'NN'), ('la', 'FW'),

浏览 0提问于2018-10-21得票数 1

回答已采纳

1回答

如何对列表中的列表进行词条划分

、、

我正在尝试这个kaggle竞赛：。此数据集中的“配料”列由配料列表组成。我试图对这一列进行分类，但在运行下面的代码之后，它看起来好像成分列根本没有改变。有人能告诉我我的代码出了什么问题吗？ import nltk from nltk.stem import WordNetLemmatizer from nltk.tokenize import word_tokenize get_ipython().run_line_magic('matplotlib', 'inline') nltk.download('punkt') nltk.downlo

浏览 0提问于2019-07-29得票数 0

1回答

异常中的if窗体: TypeError: unhashable类型：'list‘在Python中

、、、

我有跟踪错误。 if form in exceptions: TypeError: unhashable type: 'list' 以下是我的密码。 from nltk.tokenize import word_tokenize from nltk.stem.wordnet import WordNetLemmatizer sentence = 'missed you' w_tokenize = (word_tokenize(sentence)) for word in w_tokenize: print WordNetLemmatizer().lem

浏览 10提问于2017-06-06得票数 1

回答已采纳

1回答

从nltk导入word_tokenize和从nltk.tokenize导入word_tokenize？

、

word_tokenize之间有什么区别，一个直接从nltk导入，另一个从nltk的令牌包导入？

浏览 0提问于2019-07-29得票数 0

回答已采纳

3回答

用NLTK标记阿拉伯语单词

、、

我正在使用NLTK word_tokenizer将一个句子拆分成单词。我想标记这句话： في_بيتنا كل شي لما تحتاجه يضيع ...ادور على شاحن فجأة يختفي ..لدرجة اني اسوي نفسي ادور شيء 我写的代码是： import re import nltk lex = u" في_بيتنا كل شي لما تحتاجه يضيع ...ادور على شاحن فجأة يختفي ..لدرجة اني اسوي نفسي ادور شيء" wordsArray = nltk.word_t

浏览 13提问于2012-10-23得票数 23

回答已采纳

1回答

使用NLTK RegexpParser查找主语、宾语、动词组合

我正在尝试使用NLTK工具包提取主语、宾语和动词组合。到目前为止，这是我的代码。我怎么能做到这一点呢？ import nltk from nltk.tokenize import sent_tokenize, word_tokenize grammar = r""" NP: {<.*>+} # Chunk everything }<VBD|VBZ|VBP|IN>+{ # Chink sequences of VBD and IN """ cp = nltk.Regexp

浏览 4提问于2014-08-20得票数 2

1回答

python中的POS标签

、、

我正在努力找出一个特定句子中的词性。我尝试使用下面给出的代码来完成此操作 from nltk import word_tokenize import nltk.data a=raw_input() text = word_tokenize(a) pairs=nltk.pos_tag(text) print pairs 但它总是将'Delete‘显示为JJ(形容词)，而它应该是动词。我如何改进代码？提前感谢

浏览 0提问于2015-03-25得票数 0

2回答

如何在数据框中的单个列上使用Word Tokenize (Python)

、、、

我正在尝试对作为数据框打开的excel文件使用NLTK word_tokenize。我想在其上使用word_tokenize的列包含句子。如何从数据框中提取特定列以对其进行标记化？我试图访问的列的名称称为“投诉/查询详细信息”。 import pandas as pd from nltk import word_tokenize file = "List of Complaints.xlsx" df = pd.read_excel(file, sheet_name = "All Complaints" ) token = df["Complaint

浏览 0提问于2018-10-18得票数 1

1回答

TypeError:使用NLTK word_tokenize时应为类似字符串或字节的对象

、

我正在尝试导入CSV文件，然后使用NLTK分析文本。CSV文件包含几列，但到目前为止，我只想分析该文件中的一列。 csv文件示例为：sample data from csv file 读取CSV文件和使用word_tokenize的代码如下： import pandas as pd import nltk #nltk.download('all') data=pd.read_csv("Output-analysis.csv") print (data.SAT_COMMENTS) from nltk.tokenize import word_tokenize

浏览 30提问于2020-08-19得票数 0

1回答

使用nltk用n-gram模型生成新句子

、、、、

我用文本文件做了2克和3克的模型。 from nltk import * text = open('Alice in Wonderland.txt', 'r').read() table = string.maketrans('', '') text = text.translate(table, string.punctuation) tokens = word_tokenize(text.lower()) bigram = nltk.bigrams(tokens) trigram = nltk.trigrams(tokens)

浏览 1提问于2015-11-08得票数 2

1回答

基于NLTK的熊猫文本处理

、、、、

当使用nltk时，标点符号和数字小写不起作用。我的代码 stopwords=nltk.corpus.stopwords.words('english')+ list(string.punctuation) user_defined_stop_words=['st','rd','hong','kong'] new_stop_words=stopwords+user_defined_stop_words def preprocess(text): return [wo

浏览 5提问于2018-01-01得票数 8

回答已采纳

2回答

python 3.4上用nltk 3.0标记pos中的编码错误

、、

我在Python3.4中使用NLTK 3.0，由于以下错误无法执行POS标记:我已经阅读了所有的，但找不到解决问题的方法。大多数帖子提到升级到NLTK 3.0可以解决这个问题，但是我已经有了NLTK 3.0。根据这些帖子，改变nltk的data.py解决了这个问题，但是NLTK的人不愿意这么做。这是我的代码： from nltk.tag import pos_tag from nltk.tokenize import word_tokenize pos_tag(word_tokenize("John's big idea isn't all that bad."

浏览 4提问于2014-10-27得票数 2

回答已采纳

1回答

NLTK中的类似函数总是返回'No Matches‘结果

、

我试着在python中使用NLTK中的类似函数，但它总是返回'No Matches'，即使我在句子中输入了类似的单词。我的代码在这里 from nltk.tokenize import word_tokenize import nltk raw = "Analyzing text to find common terms using Python and NLTK" text = nltk.Text(raw) text.similar('mutual') 有什么想法吗？

浏览 0提问于2018-06-13得票数 0

1回答

如何在nltk中使用word_tokenize并保留空格？

、

据我所知，nltk中的word_tokenize函数接受表示句子的字符串，并返回其所有单词的列表： >>> from nltk import word_tokenize, wordpunct_tokenize >>> s = ("Good muffins cost $3.88\nin New York. Please buy me\n" ... "two of them.\n\nThanks.") >>> word_tokenize(s) ['Good', 'muffin

浏览 1提问于2014-04-29得票数 2

回答已采纳