MovingAverage可翻译为滑动平均或移动平均,是做时间序列预测时用到的简单方法。
计算方法:对于一个给定的数列,首先设定一个固定的值k,然后分别计算第1项到第k项,第2项到第k+1项,第3项到第k+2项的平均值,依次类推。
下面代码取自TensorFlow源代码:
class MovingAverage {
public:
explicit MovingAverage(int window);
~MovingAverage();
void Clear();
double GetAverage() const;
void AddValue(double v);
private:
const int window_; // Max size of interval
double sum_; // Sum over interval
double* data_; // Actual data values
int head_; // Offset of the newest statistic in data_
int count_; // # of valid data elements in window
};
// 构造函数
MovingAverage::MovingAverage(int window)
: window_(window),
sum_(0.0),
data_(new double[window_]),
head_(0),
count_(0) {
CHECK_GE(window, 1);
}
// 析构函数
MovingAverage::~MovingAverage() { delete[] data_; }
void MovingAverage::Clear() {
count_ = 0;
head_ = 0;
sum_ = 0;
}
double MovingAverage::GetAverage() const {
if (count_ == 0) {
return 0;
} else {
return static_cast<double>(sum_) / count_;
}
}
void MovingAverage::AddValue(double v) {
if (count_ < window_) {
// This is the warmup phase. We don't have a full window's worth of data.
head_ = count_;
data_[count_++] = v;
} else {
if (window_ == ++head_) {
head_ = 0;
}
// Toss the oldest element
sum_ -= data_[head_];
// Add the newest element
data_[head_] = v;
}
sum_ += v;
}
以1、2、3、4、5共5个数为例,window为3,计算过程为:(1+2+3)/3=2,(2+3+4)/3=3,(3+4+5)/3=4。
...
MovingAverage ma(3);
ma.AddValue(1);
ma.AddValue(2);
ma.AddValue(3);
printf("%lf\n", ma.GetAverage()); // 2.0
ma.AddValue(4);
printf("%lf\n", ma.GetAverage()); // 3.0
ma.AddValue(5);
printf("%lf\n", ma.GetAverage()); // 4.0
...
数据取自:tf19: 预测铁路客运量
import matplotlib.pyplot as plt
import pandas as pd
import requests
import io
import numpy as np
def moving_average(l, N):
sum = 0
result = list( 0 for x in l)
for i in range( 0, N ):
sum = sum + l[i]
result[i] = sum / (i+1)
for i in range( N, len(l) ):
sum = sum - l[i-N] + l[i]
result[i] = sum / N
return result
# 使用效率更高的numpy
# http://stackoverflow.com/questions/13728392/moving-average-or-running-mean
def fast_moving_average(x, N):
return np.convolve(x, np.ones((N,))/N)[(N-1):]
url = 'http://blog.topspeedsnail.com/wp-content/uploads/2016/12/铁路客运量.csv'
ass_data = requests.get(url).content
df = pd.read_csv(io.StringIO(ass_data.decode('utf-8'))) # python2使用StringIO.StringIO
data = np.array(df['铁路客运量_当期值(万人)'])
ma_data = moving_average(data.to_list(), 3)
plt.figure()
plt.plot(data, color='g')
plt.plot(ma_data, color='r')
plt.show()
当window/N=3:
红线是MovingAverage,注意看,它慢了一拍
当window/N=10:
可以用来中和掉一些异常值
还有一个类似的东西,叫Moving Median,Median是中位数。
中位数定义:把数列按顺利排好,中间的那个数就是中位数,如果数列个数是偶数,那么取中间两个数的平均值。