
上次我们简单介绍了Spacy,学习了它的安装以及实体识别等基本的方法。今天我继续给大家介绍一下它的其他功能如何操作,主要有词性还原,词性标注,名词块识别,依存分析等内容。废话不多说,直接看代码。
import en_core_web_smparser = en_core_web_sm.load()sentences = "There is an art, it says, or rather, a knack to flying." \"The knack lies in learning how to throw yourself at the ground and miss." \"In the beginning the Universe was created. This has made a lot of people " \"very angry and been widely regarded as a bad move."print("解析文本中包含的句子:")sents = [sent for sent in parser(sentences).sents]for x in sents:print(x)"""There is an art, it says, or rather, a knack to flying.The knack lies in learning how to throw yourself at the ground and miss.In the beginning the Universe was created.This has made a lot of people very angry and been widely regarded as a bad move."""print("- * -"*20)# 分词print()tokens = [token for token in sents[0] if len(token) > 1]print(tokens)print("- * -"*20)# 词性还原lemma_tokens = [token.lemma_ for token in sents[0] if len(token) > 1]print(lemma_tokens)print("- * -"*20)# 简化版的词性标注pos_tokens = [token.pos_ for token in sents[0] if len(token) > 1]print(pos_tokens)print("- * -"*20)# 词性标注的细节版tag_tokens = [token.tag_ for token in sents[0] if len(token) > 1]print(tag_tokens)print("- * -"*20)# 依存分析dep_tokens = [token.dep_ for token in sents[0] if len(token) > 1]print(dep_tokens)print("- * -"*20)print("名词块分析")doc = parser(u"Autonomous cars shift insurance liability toward manufacturers")# 获取名词块文本chunk_text = [chunk.text for chunk in doc.noun_chunks]print(chunk_text)print("- * -"*20)# 获取名词块根结点的文本chunk_root_text = [chunk.root.text for chunk in doc.noun_chunks]print(chunk_root_text)print("- * -"*20)# 依存分析chunk_root_dep_ = [chunk.root.dep_ for chunk in doc.noun_chunks]print(chunk_root_dep_)print("- * -"*20)#chunk_root_head_text = [chunk.root.head.text for chunk in doc.noun_chunks]print(chunk_root_head_text)print("- * -"*20)最后给大家附上一个句法依存分析的结果解释的资料,是斯坦福自然语言处理的一个依存句法分析的解释文档
链接:https://nlp.stanford.edu/software/dependencies_manual.pdf
如果下载不下来,可以微信和我要。
百度文库有中文版:https://wenku.baidu.com/view/1e92891dbceb19e8b8f6bae5.html