我有一份df表格的熊猫数据:
timeCol dataCol
2 5
9.135 8
11 4
12 6我希望在dataCol上执行3秒间隔的滚动平均值,以便返回表单new_df的数据:
startTime endTime meanCol
0 3 5.0
1 4 5.0
2 5 5.0
3 6 0.0
4 7 0.0
5 8 0.0
6 9 0.0
7 10 8.0
8 11 6.0
9 12 6.0
10 13 5.0
11 14 5.0
12 15 6.0 注意,例如,在new_df中,对于时间范围(8-11)和(9-12),返回的值为6.0 (因为mean(8,4)=6.0和mean(8,4,6)=6.0分别为mean(8,4,6)=6.0)。所有列都是浮动类型。time_col将永远被订购。实现这一目标的一种有效的、仿生的方法是什么?
发布于 2019-08-08 02:00:26
我使用的是numpy板
df=pd.DataFrame({'startTime':np.arange(13),'endTime':np.arange(13)+3})
s=ori.timeCol[:,None]
s1=(df.startTime.values-s<=0)&(df.endTime.values-s>=0)
df['New']=ori.dataCol.dot(s1)/s1.sum(axis=0)
df
startTime endTime New
0 0 3 5.0
1 1 4 5.0
2 2 5 5.0
3 3 6 NaN
4 4 7 NaN
5 5 8 NaN
6 6 9 NaN
7 7 10 8.0
8 8 11 6.0
9 9 12 6.0
10 10 13 5.0
11 11 14 5.0
12 12 15 6.0发布于 2019-08-08 02:04:12
有一种方法可以做到:
import pandas as pd
# Source data
data = {
'timeCol': [2, 9.135, 11, 12],
'dataCol': [5, 8, 4, 6]
}
df = pd.DataFrame(data=data)
# Build list of rows based on time series
rows = []
for startTime in range(12):
endTime = startTime + 3
print(startTime, ' to ', endTime)
# Get only rows from source data that match current time interval
filtered = df.loc[(df['timeCol'] >= startTime) &
(df['timeCol'] <= endTime)]
# Append current row, including mean of matching source rows
rows.append([startTime, endTime, filtered['dataCol'].mean()])
# Create final dataframe, replacing any missing values with 0
res = pd.DataFrame(data=rows, columns=['startTime', 'endTime', 'meanCol']).fillna(0)
print(res)您还可以先构建结果集,然后循环遍历它,然后计算其中每一行的平均值。
https://stackoverflow.com/questions/57404110
复制相似问题