我希望有人能帮我做以下事情:
我希望使用空格来标识数据帧中的模式
这是dataframe: Input
Import pandas as pd
testNet=pd.DataFrame([[12,"Excellent but I want to buy it"],
                      [18,"Super I wish to buy it"],
                      [23,"We hope to buy now"],
                      [24,"She hope to buy now and I want to buy now"],
                     ],columns=["ID","CONTENT"])模式如下:
import spacy
nlp = spacy.load("en_core_web_sm")
from spacy.matcher import Matcher
doc1=nlp("Excellent but I want to buy it")
matcher = Matcher(nlp.vocab)
pattern = [{"POS": "PRON"},{"POS": "VERB"},{"TEXT": "to", "OP": "?"}, {"LEMMA": "buy"}]
# Add the pattern to the matcher and apply the matcher to the doc
matcher.add("BUY_PATTERN", None, pattern)
matches = matcher(doc1)
print("Total matches found:", len(matches))
# Iterate over the matches and print the span text
for match_id, start, end in matches:
    print("Match found:", doc1[start:end].text)问题是我不能使用模式中的数据帧,我只能逐行输入信息,我想输入数据帧。因为原始数据帧有300.000行。
我希望得到的输出将如下所示
发布于 2020-04-13 22:34:58
您可以简单地定义一个函数,例如get_matches(),该函数接受文本作为输入,并返回“匹配”,然后将lambda函数应用于数据帧,如下所示:
testNet['MATCH'] = testNet.CONTENT.apply(lambda x : get_matches(x))https://stackoverflow.com/questions/61179660
复制相似问题