文章/答案/技术大牛

发布

社区首页 >问答首页 >Python在大段落的标记之间查找多个字符串

问Python在大段落的标记之间查找多个字符串
EN

Stack Overflow用户

提问于 2021-05-15 10:16:32

回答 2查看 50关注 0票数 0

我正在试着用长字符串列出公司的列表。

公司名称往往是随机分散在字符串中的，但它们总是在名称“，”之前有一个逗号和一个空格，并且它们总是以Inc、LLC、Corporation或Corp.结尾。

此外，字符串的开头总是列出了一家公司。它大概是这样的：

Companies = 'Apples Inc, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, Bananas LLC, 
Carrots Corp, xxxx.'

我一直在尝试使用regex来解决这个问题，但是我对python太缺乏经验了。

我最近的尝试是这样的：

r = re.compile(r' .*? Inc | .*? LLC | .*? Corporation | .*? Corp',
flags = re.I | re.X)

r.findall(Companies)

但是我的输出总是

['Apples Inc', ', xxxxxxxxxxxxxxxxxxx, Bananas LLC', ', Carrots Corp']

当我需要它的时候

['Apples Inc', 'Bananas LLC', 'Carrots Corp']

我很恼火，我谦虚地请求帮助。

*编辑

我已经想出了一种方法，如果公司名称中包含逗号，就可以找到它，比如苹果公司。

在我对长字符串运行任何分析之前，我将让程序检查在Inc.之前是否存在任何逗号，然后删除它们。

然后，我将运行程序列出公司名称。

python

回答 2

Stack Overflow用户

发布于 2021-05-15 10:25:53

我认为这是一个很好的例子，说明了何时不使用regex。只需根据逗号拆分字符串并检查指定的后缀是否存在于任何已拆分的段中，即可获得结果。

例如：

paragraph = 'Apples Inc, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, Bananas LLC, Carrots Corp, xxxx.'

suffixes = ["Inc", "Corp", "Corporation", "LLC"]

companies = []
#Split paragraph by commas
for term in paragraph.split(", "):
    #Go through the suffixes and see if any of them match with the split field
    for suffix in suffixes:
        if suffix in term:
            companies.append(term)

print(companies)

这段代码比regex更具可读性，也更容易理解。

票数 0

Stack Overflow用户

发布于 2021-05-15 10:39:01

在此特定情况下，您可以执行以下操作：

targets=('Inc', 'LLC', 'Corp', 'Corporation')

>>> [x for x in Companies.split(', ') if any(x.endswith(y) for y in targets)]
['Apples Inc', 'Bananas LLC', 'Carrots Corp']

但是，如果名称中或名称和实体类型之间存在,，则此操作不起作用。

如果您可能有Apple, Inc. (这是典型的)，您可以这样做：

Companies = 'Apples, Inc., xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, Bananas, LLC, Carrots Corp., xxxx.'


targets=('Inc', 'LLC', 'Corp', 'Corporation')

>>> re.findall(rf'([^,]+?(?:, )?(?:{"|".join(targets)})\.?)', Companies)
['Apples, Inc.', ' Bananas, LLC', ' Carrots Corp.']

Demo and explanation of regex

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67542736

复制

相似问题

问Python在大段落的标记之间查找多个字符串
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python在大段落的标记之间查找多个字符串EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python在大段落的标记之间查找多个字符串
EN