我有一些长度不同的文本,我想把它分成不同的分句,但我也想保留主题。
例如;
# single subject
Original: "Coffee is very good, but wasn't hot enough"
split: ["Coffee is very good", "Coffee wasn't hot enough"]
Original: "Joe was the top performer of last year's dance competition, he is also a good singer"
split: ["Joe was the top performer of last year's dance competition", "Joe is a good singer"]
# multiple subjects
Original: "Delicious food service, but we struggled with the app."
split: ["Delicious food service", "We struggled with the app"]我不知道如何做到这一点,我们可以根据标点符号和连词来分割句子(可能不准确),但我们如何保留它的主题。
如果你需要更多的信息,请告诉我。
发布于 2022-03-10 05:15:21
经过大量的研究,我想出了如何用各自的主语来代替代词。它利用neuralcoref作为spaCy 2.1+的流水线扩展,利用神经网络对同轴电缆簇进行标注和解析。
但是,它只适用于spacy v2和python3.7 --我在conda环境中使用以下工具版本对其进行了测试
python==3.7
spacy==2.1.0
neuralcoref解决办法是这样的
import spacy
import neuralcoref
nlp = spacy.load('en_core_web_sm')
neuralcoref.add_to_pipe(nlp)
doc = nlp("Coffee is good but it wasn't hot enough!")
print(f'\n[REPLACED]:\n{doc._.coref_resolved}')
# output
Coffee is good but Coffee wasn't hot enough!更多样本输出
[Enter your text]:
Joe was the top performer of last year's dance competition, he is also a good singer
[REPLACED]:
Joe was the top performer of last year's dance competition, Joe is also a good singer
[CONTINUE(Y/N)?]: y
[Enter your text]:
Paul was amazing and so was our waiter I loved the squash pizza and the dessert he recommended will definitely come back soon.
[REPLACED]:
Paul was amazing and so was our waiter I loved the squash pizza and the dessert Paul recommended will definitely come back soon.我还在努力找出一种更好的方法来分句。一旦我想出答案就会更新答案。
https://datascience.stackexchange.com/questions/108832
复制相似问题