我在pandas中有以下数据帧
employee_name age location salary
Harish 31 Mumbai 450000
Marina 30 Mumbai 600000
Meena 31 Pune 750000
Sachin 32 Mumbai 1200000
Tarun 27 Mumbai 1400000
Mahesh 41 Pune 1500000
Satish 42 Delhi 650000
Heena 34 Delhi 800000我想要从这个数据框中得到的是,所有不同地点的年龄组> 30和< 35的所有员工都能获得最高工资
我想要的数据帧是
employee_name age location salary
Sachin 32 Mumbai 1200000
Meena 31 Pune 750000
Heena 34 Delhi 800000我在pandas中做了以下操作,但它给出了一个错误
df.groupby('location').filter(lambda x : (x['age'] > 30) & (x['age'] < 35))['salary'].max()我如何在熊猫中做到这一点?
发布于 2020-08-27 21:41:00
您可以先进行筛选,然后查找具有最大值的行:
(df.loc[df['age'].between(31,34)]
.sort_values('salary')
.drop_duplicates('location', keep='last')
)输出:
employee_name age location salary
2 Meena 31 Pune 750000
7 Heena 34 Delhi 800000
3 Sachin 32 Mumbai 1200000发布于 2020-08-27 22:01:55
请尝试使用idxmax,注意此处的过滤器将不起作用
df.loc[df[df['age'].between(31,34)].groupby('location')['salary'].idxmax()]
Out[110]:
employee_name age location salary
7 Heena 34 Delhi 800000
3 Sachin 32 Mumbai 1200000
2 Meena 31 Pune 750000发布于 2020-08-27 22:13:21
您可以尝试此选项:
df = df.query('age > 30 & age < 35')
df = df.drop_duplicates(subset="age", keep="last")
print(df)https://stackoverflow.com/questions/63617223
复制相似问题