我目前正在做一个大的DF,需要参考列中的数据,以便滚动窗口计算。所有行都有一个单独的滚动窗口值,因此我需要引用该列,但我正在获取输出
ValueError: window must be an integer 0 or greater
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,20,size=(20, 4)), columns=list('abcd'))
df['op'] = (np.random.randint(0,20, size=20))
a b c d op
0 6 17 3 5 9
1 8 3 13 7 2
2 19 12 18 3 8
3 8 8 5 4 17
4 0 5 9 3 19
5 0 5 19 9 11
6 7 7 13 8 10
7 7 5 12 0 4
8 13 17 4 4 17
9 7 0 16 9 7
10 7 8 13 10 13
11 18 3 1 11 16
12 4 4 5 13 4
13 9 8 14 19 9
14 13 10 10 7 10
15 9 16 11 16 3
16 5 7 3 0 11
17 13 14 10 1 16
18 6 14 13 4 18
19 1 9 8 0 19
试图引用df‘’op‘中滚动平均值的值。
df['SMA'] = df.a.rolling(window=df.op).mean()
产生错误ValueError:窗口必须是整数0或更高
如前所述,我正在处理一个大型数据框架,所以上面是示例代码。
发布于 2022-02-01 08:18:58
解决方案
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,20,size=(20, 4)),
columns=list('abcd'))
df['op'] = (np.random.randint(0,20, size=20))
def lookback_window(row, values, lookback, method='mean', *args, **kwargs):
loc = values.index.get_loc(row.name)
lb = lookback.loc[row.name]
return getattr(values.iloc[loc - lb: loc + 1], method)(*args, **kwargs)
df['SMA'] = df.apply(lookback_window, values=df['a'], lookback=df['op'], axis=1)
df
a b c d op SMA
0 17 19 11 9 0 17.000000
1 0 10 9 11 19 NaN
2 13 8 11 2 16 NaN
3 9 2 4 4 8 NaN
4 11 10 0 17 18 NaN
5 14 19 17 10 17 NaN
6 6 12 17 1 4 10.600000
7 10 1 3 18 2 10.000000
8 7 6 12 3 19 NaN
9 1 9 7 5 9 8.800000
10 17 1 3 13 1 9.000000
11 19 17 0 2 7 10.625000
12 18 5 2 4 12 10.923077
13 18 5 4 2 1 18.000000
14 5 11 17 11 11 11.250000
15 16 9 2 11 16 NaN
16 15 17 1 8 14 11.933333
17 15 2 0 3 6 15.142857
18 18 3 18 3 10 13.545455
19 7 0 12 15 3 13.750000
https://stackoverflow.com/questions/70932728
复制相似问题