文章/答案/技术大牛

发布

社区首页 >问答首页 >将re.search函数应用于Python中的列

问将re.search函数应用于Python中的列
EN

Stack Overflow用户

提问于 2017-12-21 10:49:57

回答 1查看 2.8K关注 0票数 3

我有以下Python代码(我需要文本字段中特定数字的第一次匹配)：

import numpy as np
import pandas

data = {'A': [1, 2, 3], 'B': ['bla 4044 bla', 'bla 5022 bla', 'bla 6045 bla']}
df = pandas.DataFrame(data)

def fun_subjectnr(column):
    column = str(column)
    subjectnr = re.search(r"(\b[4][0-1][0-9][0-9]\b)",column)
    subjectnr1 = re.search(r"(\b[2-3|6-8][0-9][0-9][0-5]\b)",column)
    subjectnr = np.where(subjectnr == "" and subjectnr1 != "", subjectnr1, 
    subjectnr)
    return subjectnr1

df['C'] = df['B'].apply(fun_subjectnr)

想要的产出：

 A    B                C
 1    bla 4044 bla    4044
 2    bla 5022 bla    None
 3    bla 6045 bla    6045

似乎不起作用。当我将a添加到regex代码时，它会给出一个错误.(subjectnr= re.search(r"(\b40-9\b)"，列)

谁知道该怎么做？提前感谢！

regex

pandas

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-12-21 11:15:53

您可以使用str.extract来完成这个任务。您也可以压缩您的模式，如我下面所示。

p = r'\b(4[0-1]\d{2}|(?:[2-3]|[6-8])\d{2}[0-5])\b'
df['C'] = df.B.str.extract(p, expand=False)

df

   A             B     C
0  1  bla 4044 bla  4044
1  2  bla 5022 bla   NaN
2  3  bla 6045 bla  6045

这应该比调用apply快得多。

详细信息

\b                 # word boundary
(                  # first capture group
   4               # match digit 4
   [0-1]           # match 0 or 1
   \d{2}           # match any two digits
|
   (?:             # non-capture group (prevent ambiguity during matching)
       [2-3]       # 2 or 3
       |           # regex OR metacharacter
       [6-8]       # 6, 7, or 8
   )
   \d{2}           # any two digits
   [0-5]           # any digit b/w 0 and 5
)
\b

票数 5

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47923327

复制

相似问题

问将re.search函数应用于Python中的列
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将re.search函数应用于Python中的列EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将re.search函数应用于Python中的列
EN