我有一个csv文件,在该文件中,我需要对列执行一些操作,而不需要指定列值。
输入csv(df)如下:
weather speed type cal_A cal_B
good 0-3 cold 12 10
good 0-3 cold 21 7
good 0-3 cold 31 5
good 0-3 cold 17 1
good 3-5 cold 19 17
bad 0-3 hot 15 4
bad 6-9 hot 21 13
bad 6-9 hot 15 7
bad 6-9 cold 21 4
rainy 0-3 cold 14 7
rainy 5-8 cold 21 10
rainy 5-8 cold 2 3
rainy 5-8 cold 18 16在这个csv中,我需要将名为cal_A、cal_B的列按照列的天气、类型和速度分组,然后找到最小值、最大值和平均值,并将它们作为单独的列。
最小值、最大值和平均值是在cal_A和cal_B列除法后计算出来的。
输出文件如下:
weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
good 0-3 cold 1.2 17
good 3-5 cold 1.11 1.11
bad 0-3 hot 3.75 3.75
bad 6-9 hot 1.61 2.14
bad 6-9 cold 5.25 5.25
rainy 0-3 cold 2 2
rainy 5-8 cold 0.6 2.1我尝试过的代码如下:
df=df.groupby(['weather','speed','type'],as_index=False).min().eval('cal_A/cal_B(min)=cal_A/cal_B') df=df.groupby(['weather','speed','type'],as_index=False).max().eval('cal_A/cal_B(max)=cal_A/cal_B')
上面的代码将列(天气、速度和类型)分组,然后提供计算中的最小值和值,但这段代码没有为我提供预期的输出。
发布于 2019-12-19 18:37:54
您将首先执行除法,然后分组和聚合该系列。
(df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False).agg(['min', 'max', 'mean'])如果您想要准确地再现您的输出,我们可以使用add_prefix/suffix方法(尽管可能更有效地重命名列对象)。
((df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False)
.agg(['min', 'max', 'mean'])
.add_prefix('cal_A/cal_B(')
.add_suffix(')')
.reset_index()) weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
0 good 0-3 cold 1.200000 17.000000 6.850000
1 good 3-5 cold 1.117647 1.117647 1.117647
2 bad 0-3 hot 3.750000 3.750000 3.750000
3 bad 6-9 hot 1.615385 2.142857 1.879121
4 bad 6-9 cold 5.250000 5.250000 5.250000
5 rainy 0-3 cold 2.000000 2.000000 2.000000
6 rainy 5-8 cold 0.666667 2.100000 1.297222发布于 2019-12-19 18:42:40
如果与熊猫NamedAgg合作,您可以使用0.25+来解决这个问题:
import pandas as pd
import numpy as np
data = {'weather':['good','good','good','good','good','bad','bad','bad','bad','rainy','rainy','rainy','rainy'],'speed':['0-3','0-3','0-3','0-3','3-5','0-3','6-9','6-9','6-9','0-3','5-8','5-8','5-8'],'type':['cold','cold','cold','cold','cold','hot','hot','hot','cold','cold','cold','cold','cold'],'cal_A':[12,21,31,17,19,15,21,15,21,14,21,2,18],'cal_B':[10,7,5,1,17,4,13,7,4,7,10,3,16]}
df = pd.DataFrame(data)
df['divided'] = df['cal_A']/df['cal_B']
output = df.groupby(['weather','speed','type']).agg(
minimum=pd.NamedAgg(column='divided',aggfunc='min'),
maximum=pd.NamedAgg(column='divided',aggfunc='max'),
mean=pd.NamedAgg(column='divided',aggfunc='mean'))
print(output)产出:
minimum maximum mean
weather speed type
bad 0-3 hot 3.750000 3.750000 3.750000
6-9 cold 5.250000 5.250000 5.250000
hot 1.615385 2.142857 1.879121
good 0-3 cold 1.200000 17.000000 6.850000
3-5 cold 1.117647 1.117647 1.117647
rainy 0-3 cold 2.000000 2.000000 2.000000
5-8 cold 0.666667 2.100000 1.297222https://stackoverflow.com/questions/59415175
复制相似问题