作者:明天依旧可好 | 柯尊柏 邮箱:ke.zb@qq.com
spaCy 是一个是具有工业级强度的Python NLP工具包,完成了NLP领域的很多任务比如词性标注,命名实体识别,依存句法分析,归一化,停用词等等,支持Unix/Linux,macOS/os X和Windows操作系统,可以通过pip,conda方式安装。
通过pip安装spaCy:
pip install spaCy
1.支持的语言:
NAME | LANGUAGE | TYPE |
---|---|---|
en_core_web_sm | English | Vocabulary, syntax, entities |
en_core_web_md | English | Vocabulary, syntax, entities, vectors |
en_core_web_lg | English | Vocabulary, syntax, entities, vectors |
en_vectors_web_lg | English | Word vectors |
de_core_news_sm | German | Vocabulary, syntax, entities |
es_core_news_sm | Spanish | Vocabulary, syntax, entities |
es_core_news_md | Spanish | Vocabulary, syntax, entities, vectors |
pt_core_news_sm | Portuguese | Vocabulary, syntax, entities |
fr_core_news_sm | French | Vocabulary, syntax, entities |
fr_core_news_md | French | Vocabulary, syntax, entities, vectors |
it_core_news_sm | Italian | Vocabulary, syntax, entities |
nl_core_news_sm | Dutch | Vocabulary, syntax, entities |
xx_ent_wiki_sm | Multi-language | Named entities |
2.语言模型的安装:
这个安装比较费劲(速度挺慢的)
pip install en_core_web_lg
3.语言模型的使用
import spacy
nlp = spacy.load('en_core_web_lg') #加载模型
doc = nlp(u'This is a sentence.')