文章/答案/技术大牛

发布

社区首页 >问答首页 >UserWarning:这个模式有匹配的组。要实际获得组，请使用str.extract

问UserWarning:这个模式有匹配的组。要实际获得组，请使用str.extract
EN

Stack Overflow用户

提问于 2016-10-06 16:47:36

回答 5查看 38.4K关注 0票数 40

我有一个dataframe，我尝试获取字符串，其中的列包含一些字符串Df看起来像

member_id,event_path,event_time,event_duration
30595,"2016-03-30 12:27:33",yandex.ru/,1
30595,"2016-03-30 12:31:42",yandex.ru/,0
30595,"2016-03-30 12:31:43",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:44",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:45",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:46",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:49",kinogo.co/,1
30595,"2016-03-30 12:32:11",kinogo.co/melodramy/,0

和另一个带有urls的df

url
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_bq_phoenix
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_fly_
003\.ru\/sonyxperia
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony\/brands5D5Bbr_23
1click\.ru\/sonyxperia
1click\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/chasy-motorola

我使用

urls = pd.read_csv('relevant_url1.csv', error_bad_lines=False)
substr = urls.url.values.tolist()
data = pd.read_csv('data_nts2.csv', error_bad_lines=False, chunksize=50000)
result = pd.DataFrame()
for i, df in enumerate(data):
    res = df[df['event_time'].str.contains('|'.join(substr), regex=True)]

但它还我

UserWarning: This pattern has match groups. To actually get the groups, use str.extract.

我怎么才能解决呢？

python

regex

pandas

回答 5

Stack Overflow用户

回答已采纳

发布于 2016-10-06 17:30:23

urls中至少有一个正则表达式必须使用捕获组。str.contains只对df['event_time']中的每一行返回真假--它不使用捕获组。因此，UserWarning提醒您正则表达式使用捕获组，但不使用匹配。

如果希望删除UserWarning，可以从regex模式中找到并删除捕获组。它们没有显示在您发布的regex模式中，但它们必须在您的实际文件中。查找字符类之外的括号。

或者，您可以通过将

import warnings
warnings.filterwarnings("ignore", 'This pattern has match groups')

在打电话给str.contains之前。

下面是一个简单的示例，演示了这个问题(以及解决方案)：

# import warnings
# warnings.filterwarnings("ignore", 'This pattern has match groups') # uncomment to suppress the UserWarning

import pandas as pd

df = pd.DataFrame({ 'event_time': ['gouda', 'stilton', 'gruyere']})

urls = pd.DataFrame({'url': ['g(.*)']})   # With a capturing group, there is a UserWarning
# urls = pd.DataFrame({'url': ['g.*']})   # Without a capturing group, there is no UserWarning. Uncommenting this line avoids the UserWarning.

substr = urls.url.values.tolist()
df[df['event_time'].str.contains('|'.join(substr), regex=True)]

版画

  script.py:10: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  df[df['event_time'].str.contains('|'.join(substr), regex=True)]

从regex模式中移除捕获组：

urls = pd.DataFrame({'url': ['g.*']})

避免使用UserWarning。

票数 40

Stack Overflow用户

发布于 2019-12-17 05:15:12

消除警告的另一种方法是更改regex，使其成为匹配的组，而不是捕获组。这是(?:)符号。

因此，如果匹配组是(url1|url2)，则应该用(?:url1|url2)替换它。

票数 51

Stack Overflow用户

发布于 2020-05-20 09:21:09

您可以使用str.match代替。在您的代码中：

res = df[df['event_time'].str.match('|'.join(substr), regex=True)]

解释

当正则表达式包括组时，警告由str.contains触发，例如，在regex r'foo(bar)'中，(bar)部件被认为是一个组，因为它在括号中。因此，理论上可以从正则表达式中提取这一点。

但是，警告是没有意义的，首先，contains只应该“测试模式或正则表达式是否包含在一个序列或索引的字符串中”(熊猫文献)。抽提小组一点也不重要。

在任何情况下，str.match都不会抛出警告，而且当前的操作与str.contains几乎相同，只是(1)字符串必须完全匹配，(2)不能从str.match中禁用regex (str.contains有一个regex参数来禁用它们)。

票数 8

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/39901550

复制

相似问题

问UserWarning:这个模式有匹配的组。要实际获得组，请使用str.extract
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问UserWarning:这个模式有匹配的组。要实际获得组，请使用str.extractEN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问UserWarning:这个模式有匹配的组。要实际获得组，请使用str.extract
EN