首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >初学者|手把手带你学TextBlob

初学者|手把手带你学TextBlob

作者头像
yuquanle
发布2019-10-08 16:39:22
2.6K0
发布2019-10-08 16:39:22
举报

跟着博主的脚步,每天进步一点点

本文介绍了TextBlob的使用方法,这是一个用Python编写的开源的文本处理库。它可以用来执行很多自然语言处理的任务,比如,词性标注,名词性成分提取,情感分析,文本翻译,等等。

简介

TextBlob是一个用Python编写的开源的文本处理库。它可以用来执行很多自然语言处理的任务,比如,词性标注,名词性成分提取,情感分析,文本翻译,等等。

Github地址:https://github.com/sloria/TextBlob

官方文档:https://textblob.readthedocs.io/en/dev/

实战

1.安装

# 安装:pip install textblob
# 配置国内源安装:pip install textblob  -i  https://pypi.tuna.tsinghua.edu.cn/simple
# 参考:https://textblob.readthedocs.io/en/dev/quickstart.html
from textblob import TextBlob
text = 'I love natural language processing! I am not like fish!'
blob = TextBlob(text)

2.词性标注

blob.tags

[('I', 'PRP'),
 ('love', 'VBP'),
 ('natural', 'JJ'),
 ('language', 'NN'),
 ('processing', 'NN'),
 ('I', 'PRP'),
 ('am', 'VBP'),
 ('not', 'RB'),
 ('like', 'IN'),
 ('fish', 'NN')]

3.短语抽取

np = blob.noun_phrases
for w in np:
    print(w)

natural language processing

4.计算句子情感值

for sentence in blob.sentences:
    print(sentence + '------>' +  str(sentence.sentiment.polarity))

I love natural language processing!------>0.3125
i am not like you!------>0.0

5.Tokenization(把文本切割成句子或者单词)

token = blob.words
for w in token:
    print(w)

I
love
natural
language
processing
I
am
not
like
fish
sentence = blob.sentences
for s in sentence:
    print(s)
I love natural language processing!
I am not like fish!

6.词语变形(Words Inflection)

token = blob.words
for w in token:
    # 变复数
    print(w.pluralize())
    # 变单数
    print(w.singularize())

we
I
love
love
naturals
natural
languages
language
processings
processing
we
I
ams
am
nots
not
likes
like
fish
fish

7.词干化(Words Lemmatization)

from textblob import Word
w = Word('went')
print(w.lemmatize('v'))
w = Word('octopi')
print(w.lemmatize())

go
octopus

8.集成WordNet

from textblob.wordnet import VERB
word = Word('octopus')
syn_word = word.synsets
for syn in syn_word:
    print(syn)
Synset('octopus.n.01')
Synset('octopus.n.02')
# 指定返回的同义词集为动词
syn_word1 = Word("hack").get_synsets(pos=VERB)
for syn in syn_word1:
    print(syn)

Synset('chop.v.05')
Synset('hack.v.02')
Synset('hack.v.03')
Synset('hack.v.04')
Synset('hack.v.05')
Synset('hack.v.06')
Synset('hack.v.07')
Synset('hack.v.08')

# 查看synset(同义词集)的具体定义
Word("beautiful").definitions

['delighting the senses or exciting intellectual or emotional admiration',
 '(of weather) highly enjoyable']

9.拼写纠正(Spelling Correction)

sen = 'I lvoe naturl language processing!'
sen = TextBlob(sen)
print(sen.correct())

I love nature language processing!

# Word.spellcheck()返回拼写建议以及置信度
w1 = Word('good')
w2 = Word('god')
w3 = Word('gd')
print(w1.spellcheck())
print(w2.spellcheck())
print(w3.spellcheck())

[('good', 1.0)]
[('god', 1.0)]
[('go', 0.586139896373057), ('god', 0.23510362694300518), ('d', 0.11658031088082901), ('g', 0.03626943005181347), ('ed', 0.009067357512953367), ('rd', 0.006476683937823834), ('nd', 0.0038860103626943004), ('gr', 0.0025906735751295338), ('sd', 0.0006476683937823834), ('md', 0.0006476683937823834), ('id', 0.0006476683937823834), ('gdp', 0.0006476683937823834), ('ga', 0.0006476683937823834), ('ad', 0.0006476683937823834)]

10.句法分析(Parsing)

text = TextBlob('I lvoe naturl language processing!')
print(text.parse())

I/PRP/B-NP/O lvoe/NN/I-NP/O naturl/NN/I-NP/O language/NN/I-NP/O processing/NN/I-NP/O !/./O/O

11.N-Grams

text = TextBlob('I lvoe naturl language processing!')
print(text.ngrams(n=2))

[WordList(['I', 'lvoe']), WordList(['lvoe', 'naturl']), WordList(['naturl', 'language']), WordList(['language', 'processing'])]

12.TextBlob实战之朴素贝叶斯文本分类

# 一个使用TextBlob进行Naive Bayes classifier
# 参考:https://textblob.readthedocs.io/en/dev/classifiers.html#classifiers
# 1.准备数据集:训练集和测试集
train = [
...     ('I love this sandwich.', 'pos'),
...     ('this is an amazing place!', 'pos'),
...     ('I feel very good about these beers.', 'pos'),
...     ('this is my best work.', 'pos'),
...     ("what an awesome view", 'pos'),
...     ('I do not like this restaurant', 'neg'),
...     ('I am tired of this stuff.', 'neg'),
...     ("I can't deal with this", 'neg'),
...     ('he is my sworn enemy!', 'neg'),
...     ('my boss is horrible.', 'neg')
... ]
test = [
...     ('the beer was good.', 'pos'),
...     ('I do not enjoy my job', 'neg'),
...     ("I ain't feeling dandy today.", 'neg'),
...     ("I feel amazing!", 'pos'),
...     ('Gary is a friend of mine.', 'pos'),
...     ("I can't believe I'm doing this.", 'neg')
... ]

# 2.创建朴素贝叶斯分类器
from textblob.classifiers import NaiveBayesClassifier

# 3.把训练丢进去训练
nb_model = NaiveBayesClassifier(train)

# 4.预测新来的样本
dev_sen = "This is an amazing library!"
print(nb_model.classify(dev_sen))

pos

# 也可以计算属于某一类的概率
dev_sen_prob = nb_model.prob_classify(dev_sen)
print(dev_sen_prob.prob("pos"))

0.980117820324005

# 5.计算模型在测试集上的精确度
print(nb_model.accuracy(test))

0.8333333333333334

代码已上传:

1.https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/TextBlobDemo.ipynb

2.https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/TextBlob2TextClassifier.ipynb

The End

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-10-03,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 AI小白入门 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
NLP 服务
NLP 服务(Natural Language Process,NLP)深度整合了腾讯内部的 NLP 技术,提供多项智能文本处理和文本生成能力,包括词法分析、相似词召回、词相似度、句子相似度、文本润色、句子纠错、文本补全、句子生成等。满足各行业的文本智能需求。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档