我需要解析数据帧中的数据,消除所有不在括号中的数据,然后将这些数据移动到新列。理想情况下,如果可以在新专栏中删除括号,那也是很好的,但我认为这两种结果都将创建预期的解决方案:
current column new column
/reports/industry(5315)/2018 (5315)
/reports/limit/sector(139)/2017 (139)
/reports/sector/region(147,189 and 132)/2018 (147,189 and 132)
谢谢你,任何你能给出的方向都会很棒!
发布于 2018-08-10 08:02:09
IIUC摘录
df.current.str.extract('.*\((.*)\).*',expand=True)
Out[785]:
0
0 5315
1 139
2147,189 and 132
发布于 2018-08-10 08:00:07
您可以使用如下所示的regex来完成此操作:
old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])
df['new_column'] = df['current_column'].str.extract(r'\((.*)\)')
输出如下所示:
current_column new_column
0 /reports/industry(5315)/2018 5315
1 /reports/limit/sector(139)/2017 139
2 /reports/sector/region(147,189 and 132)/2018 147,189 and 132
发布于 2018-08-10 07:58:56
>>> import re
>>> re.sub('.*(\(.*\)).*', '\\1', '/reports/industry(5315)/2018')
'(5315)'
完整的示例
import pandas as pd
import re
old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])
def grab_dat(x):
dat = re.sub('.*(\(.*\)).*', '\\1', x)
return(dat)
df['new_col'] = df['current_column'].apply(grab_dat)
https://stackoverflow.com/questions/51777179
复制相似问题