我需要通过列表布尔构建两个新列。这是用于移动数据分类的。
样本数据:
mobile_phone
85295649956
85398745632
8612345678945
34512654
这是我的密码:
import csv
import re
import pandas as pd
import numpy as np
df = pd.read_csv('test.csv',delimiter='|',dtype = str)
a = r'852[4-9]|853[4-9]|86'
print(list(map(lambda x: bool(re.match(a, x)), df['mobile_phone'])))
现在我的回应是:
[True,True,True,False]
我可以列出布尔值,但我不知道如何使用它。
我试过这样的方法:
import csv
import re
import pandas as pd
import numpy as np
df = pd.read_csv('test.csv',delimiter='|',dtype = str)
a = r'852[4-9]|853[4-9]|86'
df['mobile'] = np.where(
(lambda x: bool(re.match(a, x)), df['mobile_phone']) = True
,df['mobile_phone']
,nan
)
df['phone'] = np.where(
(lambda x: bool(re.match(a, x)), df['mobile_phone']) = True,
nan,
df['mobile_phone']
)
我试着使用np.where
,但这是行不通的。因为这向我显示了错误keyword can't be an experession
我怎样才能这样显示结果呢?
预期结果:
mobile_phone mobile phone
85295649956 85295649956 nan
85398745632 85398745632 nan
8612345678945 8612345678945 nan
34512654 nan 34512654
发布于 2022-01-05 07:34:42
您只需使用Series.apply
将值处理为新列。例如:
import pandas as pd
import re
import math
df = pd.DataFrame({'mobile_phone': ['85295649956', '85398745632', '8612345678945', '34512654', '54861245'] })
a = r'852[4-9]|853[4-9]|86'
df['mobile'] = df['mobile_phone'].apply(lambda p: p if re.match(a, p) else math.nan)
df['phone'] = df['mobile_phone'].apply(lambda p: math.nan if re.match(a, p) else p)
df
输出:
mobile_phone mobile phone
0 85295649956 85295649956 NaN
1 85398745632 85398745632 NaN
2 8612345678945 8612345678945 NaN
3 34512654 NaN 34512654
4 54861245 NaN 54861245
https://stackoverflow.com/questions/70588862
复制相似问题