我正在使用Pandas进行一些数据清理,我有一个非常长的正则表达式,我想把它分成多行。以下内容在Pandas中很好,因为它都在一行上:
df['REMARKS'] = df['REMARKS'].replace(to_replace =r'(?=[^\])}]*([\[({]|$))\b(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL)\b(?:\s*(?:,\s*)?(?:(?:or|and)\s+)?(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL))*\b', value = r'<\g<0>>', regex = True)
然而,这是很难管理的。我尝试了以下详细的方法,它在常规Python中工作:
df['REMARKS'] = df['REMARKS'].replace(to_replace =r"""(?=[^\])}]*([\[({]|$))
\b(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL)
\b(?:\s*(?:,\s*)?(?:(?:or|and)\s+)?
(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL))*\b""", value = r'<\g<0>>', regex = True)
但这在潘达斯不起作用。知道我错过了什么吗?
下面是一些用于测试的示例文本:
GR,MDT,CMR,HLDS,NEXT,NGI @ 25273,COMPTG
在9-7/8 LNR、LWDGR、RES、APWD、SONVIS、MDTS (PRESS & SAMP) ROT SWC、TSTG BOP上安装13.72
LWDGR,RES,APWD,SONVIS,GR,RES,NGI,PPC @ 31937,MDTS (PRESS & SAMP) TKG ROT SWC
LWDGR,RES @ 12586,IND,FDC,CNL,GR @ 12586,SWC,RAN CSG,PF 12240-12252,RR (新增信息)
谢谢!
发布于 2021-01-12 17:09:38
一个选项是创建一个字符串列表,然后在调用join
时使用replace
RegEx = [r'(?=[^\])}]*([\[({]|$))\b(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL)',
r'\b(?:\s*(?:,\s*)?(?:(?:or|and)\s+)?',
r'(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL))*\b']
df['REMARKS'] = df['REMARKS'].replace(to_replace=''.join(RegEx), value=r'<\g<0>>', regex=True)
使用re
import re
s = r"""(?=[^\])}]*([\[({]|$))\b(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL)
\b(?:\s*(?:,\s*)?(?:(?:or|and)\s+)?
(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL))*\b"""
df['REMARKS'] = df['REMARKS'].replace(to_replace=re.compile(s, re.VERBOSE), value=r'<\g<0>>')
https://stackoverflow.com/questions/65688240
复制相似问题