我的意见是:
text = "Apple est une entreprise, James Alfred travaille ici"
spans = [
{
"start":0,
"end":5,
"label":"ORG"
},
{
"start":26,
"end":38,
"label":"PER"
}
]
correspondance_dict = {"PER":2, "ORG": 4 , "O" : 0}
我想要标记文本并根据跨列表构造标签,即:
我想要产出:
tokenized_text = ["Apple", "est", "une", "entreprise", "," , "James","Alfred", "travaille", "ici"]
labels = [4,0,0,0,0,2,2,0,0] #this list constructed with correspondance_dict and spans (4 because Apple is ORG and the "2,2" because "James,Alfred" is person
发布于 2022-11-26 06:23:18
如果您试图在程序的其他部分使用huggingface
的管道,那么使用适当的策略来聚合输出文本块是很容易的。
详细解释的文档可获得这里!
from transformers import pipeline
# Initialize the NER pipeline
ner = pipeline("ner", aggregation_strategy="simple")
# Phrase
phrase = "David helped Peter enter the building, where his house is located."
# NER task
ner_result = ner(phrase)
# Print result
print(ner_result)
产出:
[{'entity_group': 'PER', 'score': 0.99642086, 'word': 'David', 'start': 0, 'end': 5}, {'entity_group': 'PER', 'score': 0.99559766, 'word': 'Peter', 'start': 13, 'end': 18}]
https://stackoverflow.com/questions/74577951
复制相似问题