考虑下面的例子“在所有人工智能课程上的10%。”在本例中,我必须提取两个预定义的类,如人工智能和课程。即使是程序也必须将ANN,CNN,RNN,AI等词归入人工智能类别。我已经使用spacy进行了训练,但我对结果并不印象深刻,因为它的标签不正确。在Python中,有没有从句子中提取实体的替代方法?
发布于 2020-07-18 20:21:52
下面是我会尝试的几个选项。
1.使用Rasa进行自定义实体提取。
https://rasa.com/docs/rasa/nlu/entity-extraction/#custom-entities
用于自定义实体的基于
的以下存储库
https://github.com/allenai/scibert
https://github.com/dmis-lab/biobert
发布于 2020-07-21 10:25:17
你可以使用flashtext来做这件事。
from flashtext import KeywordProcessor
kp = KeywordProcessor()
# make a dictionary and create key , insert all keyword in one key (i.e CNN, ANN RNN will come under artificial Intelligence, whenever this value will appear it will extract key for you )
dict_= {'Artificial Intelligence': ['ANN','CNN','RNN','AI','Artificial Intelligence'],'courses' : ['courses']}
kp.add_keywords_from_dict(dict_)
# here Artificial Intelligence, ANN and CNN come under Artificial Intelligence key , that why it will extract the tag as Artificial Intelligence
kp.extract_keywords('10% of on all Artificial Intelligence, ANN, and CNN courses.')
#op
['Artificial Intelligence',
'Artificial Intelligence',
'Artificial Intelligence',
'courses']
有关更多信息,请参阅flashtext https://readthedocs.org/projects/flashtext/downloads/pdf/latest/文档
https://stackoverflow.com/questions/62968281
复制相似问题