一些常用python工具包的词性标注demo,供小白入门练手找感觉~~
jieba词性标注(part of specch)
安装:pip install jieba
国内源安装更快:pip install jieba -i https://pypi.tuna.tsinghua.edu.cn/simple
先导包:jieba.posseg.dt 为默认词性标注分词器
标注句子分词后每个词的词性,采用和 ictclas 兼容的标记法。
jieba貌似不能处理英文,后面会介绍处理英文的
import jieba.posseg as pseg
words = pseg.cut("我爱自然语言处理技术!")
for word, pos in words:
print(word, pos)
我 r 爱 v 自然语言 l 处理 v 技术 n ! x
SnowNLP词性标注
安装:pip install snownlp
国内源安装:pip install snownlp -i https://pypi.tuna.tsinghua.edu.cn/simple
使用snownlp进行词性标注
from snownlp import SnowNLP
model = SnowNLP(u'我爱自然语言处理技术!')
for word, pos in model.tags:
print(word, pos)
我 r 爱 v 自然 n 语言 n 处理 vn 技术 n ! w
THULAC词性标注
安装:pip install thulac
国内源安装:pip install thulac -i https://pypi.tuna.tsinghua.edu.cn/simple
使用thulac进行词性标注
import thulac thulac_model = thulac.thulac()
wordseg = thulac_model.cut("我爱自然语言处理技术!")
print(wordseg)
Model loaded succeed
[['我', 'r'], ['爱', 'v'], ['自然', 'n'], ['语言', 'n'], ['处理', 'v'], ['技术', 'n'], ['!', 'w']]
Stanford CoreNLP分词
安装:pip install stanfordcorenlp
国内源安装:pip install stanfordcorenlp -i https://pypi.tuna.tsinghua.edu.cn/simple
使用stanfordcorenlp进行词性标注
同时支持英文和中文的词性标注
from stanfordcorenlp import StanfordCoreNLP
zh_model = StanfordCoreNLP(r'stanford-corenlp-full-2018-02-27',)
s_zh = '我爱自然语言处理技术!'
word_pos_zh = zh_model.pos_tag(s_zh)
print(word_pos_zh)
[('我爱', 'NN'), ('自然', 'AD'), ('语言', 'NN'), ('处理', 'VV'), ('技术', 'NN'), ('!', 'PU')]
eng_model = StanfordCoreNLP(r'stanford-corenlp-full-2018-02-27')
s_eng = 'I love natural language processing technology!'
word_pos_eng = eng_model.pos_tag(s_eng)
print(word_pos_eng)
[('I', 'PRP'), ('love', 'VBP'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('technology', 'NN'), ('!', '.')]
Hanlp词性标注
安装:pip install pyhanlp
国内源安装:pip install pyhanlp -i https://pypi.tuna.tsinghua.edu.cn/simple
使用pyhanlp进行词性标注
from pyhanlp import *
s = '我爱自然语言处理技术!'
word_seg = HanLP.segment(s)
for term in word_seg:
print(term.word, term.nature)
我 rr 爱 v 自然语言处理 nz 技术 n ! w
NLTK词性标注
安装:pip install nltk
国内源安装:pip install nltk -i https://pypi.tuna.tsinghua.edu.cn/simple
nltk只能处理英文
import nltk
s = 'I love natural language processing technology!'
s = nltk.word_tokenize(s)
s_pos = nltk.pos_tag(s)
print(s_pos)
[('I', 'PRP'), ('love', 'VBP'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('technology', 'NN'), ('!', '.')]
spaCy词性标注
安装:pip install spaCy
国内源安装:pip install spaCy -i https://pypi.tuna.tsinghua.edu.cn/simple
下载不了模型,需要python -m spacy download en。
The easiest solution is to re-run the command as admin(意思是用用户管理权限打开CMD下载即可)
import spacy eng_model = spacy.load('en')
s = 'I love natural language processing technology!'
词性标注
s_token = eng_model(s)
for token in s_token:
print(token, token.pos_, token.pos)
I PRON 94 love VERB 99 natural ADJ 83 language NOUN 91 processing NOUN 91 technology NOUN 91 ! PUNCT 96
另外,代码我已经上传github:https://github.com/yuquanle/StudyForNLP/blob/master/NLPbasic/POS.ipynb
更多个人笔记请关注:
公众号:StudyForAI(小白人工智能入门学习)
知乎专栏:https://www.zhihu.com/people/yuquanle/columns
领取专属 10元无门槛券
私享最新 技术干货