我希望在这个df中有一个新列,其条件如下。列education
是一个从1到5的分类值(1是较低的教育水平,5是较高的教育水平)。我想用以下逻辑创建一个函数(以便在df中创建一个新列)
首先,对于任何身份证检查,如果至少有一个教育水平毕业,那么新的一栏必须有较高的教育程度毕业。
第二,如果没有某一特定身份的毕业教育水平(必须在“课程”中包含所有教育水平)。因此,必须检查教育的最高水平和减一。
df
id education stage
1 2 Graduated
1 3 Graduated
1 4 In course
2 3 In course
3 2 Graduated
3 3 In course
4 2 In course
预期产出:
id education stage new_column
1 2 Graduated 3
1 3 Graduated 3
1 4 In course 3
2 3 In course 2
3 2 Graduated 2
3 3 In course 2
4 2 In course 1
发布于 2018-04-01 00:50:42
你可以这样做:
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 1, 2, 3, 3, 4], 'education': [2, 3, 4, 3, 2, 3, 2],
'stage': ['Graduated', 'Graduated', 'In course', 'In course', 'Graduated', 'In course', 'In course']})
max_gr = df[df.stage == 'Graduated'].groupby('id').education.max()
max_ic = df[df.stage == 'In course'].groupby('id').education.max()
# set all cells to the value from max_ed
df['new_col'] = df.id.map(max_gr)
# set cells that have not been filled to the value from max_ic - 1
df.loc[df.new_col.isna(), ['new_col']] = df.id.map(max_ic - 1)
series.map(other_series)
返回一个新的系列,其中来自series
的值已被来自other_series
的值替换。
发布于 2018-04-01 00:59:18
这是一条路。
df['new'] = df.loc[df['stage'] == 'Graduated']\
.groupby('id')['education']\
.transform(max).astype(int)
df['new'] = df['new'].fillna(df.loc[df['stage'] == 'InCourse']\
.groupby('id')['education']\
.transform(max).sub(1)).astype(int)
结果
id education stage new
0 1 2 Graduated 3
1 1 3 Graduated 3
2 1 4 InCourse 3
3 2 3 InCourse 2
4 3 2 Graduated 2
5 3 3 InCourse 2
6 4 2 InCourse 1
解释
发布于 2018-04-01 01:27:22
基于Markus ffler的替代解决方案。
max_ic = df[df.stage.eq('In course')].groupby('id').education.max() - 1
max_gr = df[df.stage.eq('Graduated')].groupby('id').education.max()
# Update with max_gr
max_ic.update(max_gr)
df['new_col'] = df.id.map(max_ic)
https://stackoverflow.com/questions/49593656
复制相似问题