我需要根据条件在DataFrame中重写列的一些字段。
我有这个数据:
Vitamins Sign
0 B -
1 C +
2 A NaN
3 Z 2
4 E +
5 I Expired
6 D + Severe Cases
7 K Expired+ Last Year
8 J +New我需要根据以下条件重写这两列:
'Sign'中的字段包含符号'+',则应该复制并粘贴在'Vitamins'列上的同一行中,最后一个单词和符号之间没有任何空格。然后,应该删除来自'Sign'列(该字段)的符号。
结果是以下数据:
Vitamins Sign
0 B -
1 C+ NaN
2 A NaN
3 Z 2
4 E+ NaN
5 I Expired
6 D Severe Cases
7 K+ Expired Last Year
8 J+ New我编写了这段代码:
import pandas as pd
import numpy as np
data = {'Vitamins': ['B', 'C', 'A', 'Z', 'E', 'I', 'D', 'K', 'J'],
'Sign': ['-', '+', np.nan, 2, '+', 'Expired', '+ Severe Cases', 'Expired+ Last Year', '+New']}
df = pd.DataFrame (data, columns = ['Vitamins', 'Sign'])
mask = (df.loc[:, 'Sign'].str.contains('+', na=False, regex = False))
df['Vitamins'] = str(df.loc[mask, 'Vitamins']) + '+'
df['Sign'] = df.loc[mask, 'Sign'].str.replace('+', '')但不幸的是,它并没有做它所需要的事情。
如何解决这个问题呢?
非常感谢你的进阶!
发布于 2021-04-07 11:09:36
您可以先通过掩码选择行并添加+,然后在replace中使用\+转义特殊的rgex字符+,并将空字符串替换为缺失的值:
mask = (df.loc[:, 'Sign'].str.contains('+', na=False, regex = False))
df.loc[mask, 'Vitamins'] += '+'
#if need also convert to strings
#df.loc[mask, 'Vitamins'] = df.loc[mask, 'Vitamins'].astype(str) + '+'
df['Sign'] = df['Sign'].str.replace('\+', '', regex=True).replace('', np.nan)
print (df)
Vitamins Sign
0 B -
1 C+ NaN
2 A NaN
3 Z 2
4 E+ NaN
5 I Expired
6 D+ Severe Cases
7 K+ Expired Last Year
8 J+ New发布于 2021-04-07 11:13:41
In [1552]: import numpy as np
In [1553]: df['Vitamins'] = np.where(df['Sign'].str.contains('+', na=False, regex = False), df['Vitamins'] + '+', df['Vitamins'])
In [1557]: df['Sign'] = df['Sign'].replace('+', np.nan).replace('\+', '', regex=True)
In [1558]: df
Out[1558]:
Vitamins Sign
0 B -
1 C+ NaN
2 A NaN
3 Z 2
4 E+ NaN
5 I Expired
6 D+ Severe Cases
7 K+ Expired Last Year
8 J+ Newhttps://stackoverflow.com/questions/66984804
复制相似问题