文章/答案/技术大牛

发布

社区首页 >问答首页 >从一个长句子/段落中提取多个数据点

问从一个长句子/段落中提取多个数据点
EN

Stack Overflow用户

提问于 2022-12-03 21:55:00

回答 1查看 16关注 0票数 0

我正在寻找一种方法或任何有用的库来从一个段落中提取对应于不同年份的多个数据点。

就像。

The total volume of the sales in the year 2019 is 400 whereas in the year 2020 is 600. 
That's about 50% \increase in size

在上面的例子中，我需要提取，

1. sales year 2019 --> 400 
2. sales year 2020 --> 600

假设

，您可以假设实体已经被知道了。以上示例

中的销售

有谁能建议一下吗？提前感谢

接近。已存在的图书馆等

python

nlp

Stack Overflow用户

发布于 2022-12-03 21:58:21

您可以采取的一种方法是使用正则表达式在文本中搜索与您要查找的信息相匹配的模式。例如，在“2019年销售总量为400,2020年为600”这句话中，可以使用以下正则表达式来匹配每年的销售数据：\d{4} is \d+。这个正则表达式将匹配任何四位数的数字，后面跟着“is”，然后是一个或多个数字。

一旦匹配了相关的数据点，就可以使用像Python的re模块这样的库来提取所需的信息。例如，在Python中，您可以这样做：

import re

text = "The total volume of the sales in the year 2019 is 400 whereas in the year 2020 is 600."

# Use the regular expression to find all matches in the text
matches = re.findall(r"\d{4} is \d+", text)

# Loop through the matches and extract the year and sales data
for match in matches:
    year, sales = match.split(" is ")
    print(f"Year: {year}, Sales: {sales}")

该代码将输出以下内容：

Year: 2019, Sales: 400
Year: 2020, Sales: 600

另一种选择是使用自然语言处理(NLP)库(如spaCy或NLTK )提取所需的信息。这些库可以帮助您从文本中识别和提取特定实体，如日期和数字。

例如，使用spaCy可以这样做：

import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Parse the text
text = "The total volume of the sales in the year 2019 is 400 whereas in the year 2020 is 600."
doc = nlp(text)

# Loop through the entities in the document
for ent in doc.ents:
    # If the entity is a date and a number, print the year and the sales data
    if ent.label_ == "DATE" and ent.label_ == "CARDINAL":
        print(f"Year: {ent.text}, Sales: {ent.text}")

此代码将输出与前面示例相同的结果。

总之，可以采取许多方法从单个段落中提取多个数据点。您选择的方法将取决于您的任务的特定需求和您正在使用的数据。

票数 1

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74671055

复制

相似问题

问从一个长句子/段落中提取多个数据点
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从一个长句子/段落中提取多个数据点EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从一个长句子/段落中提取多个数据点
EN