首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >从现有样本数据生成合成时间序列数据

从现有样本数据生成合成时间序列数据
EN

Stack Overflow用户
提问于 2020-01-03 02:18:26
回答 1查看 843关注 0票数 1

python中有没有好的库/工具可以从现有的样本数据中生成合成的时间序列数据?例如,我有1月至6月的销售数据,并希望生成7月至12月的合成时间序列数据样本(保持时间序列因素不变,如趋势、季节性等)。

EN

回答 1

Stack Overflow用户

发布于 2020-06-12 19:27:35

抛开这类数据的质量问题不谈,这里有一个简单的方法,您可以使用高斯分布来生成基于样本的合成数据。下面是关键部分。

代码语言:javascript
运行
复制
import numpy as np
x # original sample np.array of features
feature_means = np.mean(x, axis=1)
feature_std = np.std(x, axis=1)
random_normal_feature_values = np.random.normal(feature_means, feature_std)

这是我使用的功能齐全的代码,

代码语言:javascript
运行
复制
def generate_synthetic_data(sample_dataset, window_mean, window_std, fixed_window=None, variance_range =1 , sythesize_ratio = 2, forced_reverse = False):
    synthetic_data = pd.DataFrame(columns=sample_dataset.columns)
    synthetic_data.insert(len(sample_dataset.columns), "synthesis_seq", [], True) 


    for k in range(sythesize_ratio):
      if len(synthetic_data) >= len(sample_dataset) * sythesize_ratio:
        break;
      #this loop generates a set that resembles the entire dataset
      country_synthetic = pd.DataFrame(columns=synthetic_data.columns)

      if fixed_window != None:
        input_sequence_len =  fixed_window
      else:
        input_sequence_len = int(np.random.normal(window_mean, window_std)) 

      #population data change
      country_data_i = sample_dataset
      if len(country_data_i) < input_sequence_len :
        continue
      feature_length = configuration['feature_length'] #number of features to be randomized
      country_data_array = country_data_i.to_numpy()
      country_data_array = country_data_array.T[:feature_length]
      country_data_array = country_data_array.reshape(feature_length,len(country_data_i))
      x = country_data_array[:feature_length].T

      reversed = np.random.normal(0,1)>0
      if reversed:
        x = x[::-1]

      sets =0
      x_list = []
      dict_x = dict()
      for i in range(input_sequence_len):
        array_len = ((len(x) -i) - ((len(x)-i)%input_sequence_len))+i
        if array_len <= 0:
          continue
        sets = int( array_len/ input_sequence_len)
        if sets <= 0:
          continue

        x_temp = x[i:array_len].T.reshape(sets,feature_length,input_sequence_len)
        uniq_keys = np.array([i+(input_sequence_len*k) for k in range(sets)])
        x_temp = x_temp.reshape(feature_length,sets,input_sequence_len)
        arrays_split = np.hsplit(x_temp,sets)
        dict_x.update(dict(zip(uniq_keys, arrays_split)))

      temp_x_list  = [dict_x[i].T for i in sorted(dict_x.keys())]        
      temp_x_list = np.array(temp_x_list).squeeze()
      feature_means = np.mean(temp_x_list, axis=1)
      feature_std = np.std(temp_x_list, axis=1) /variance_range
      random_normal_feature_values = np.random.normal(feature_means, feature_std).T
      random_normal_feature_values = np.round(random_normal_feature_values,0)
      random_normal_feature_values[random_normal_feature_values < 0] = 0

      if reversed:
        random_normal_feature_values = random_normal_feature_values.T[::-1]
        random_normal_feature_values = random_normal_feature_values.T

      for i in range(len(random_normal_feature_values)):
        country_synthetic[country_synthetic.columns[i]] = random_normal_feature_values[i]

      country_synthetic['synthesis_seq'] = k
      synthetic_data = synthetic_data.append(country_synthetic, ignore_index=True)
    return synthetic_data

for i in range(1):
  directory_name = '/synthetic_'+str(i)
  mypath = source_path+ '/cleaned'+directory_name
  if os.path.exists(mypath) == False:
    os.mkdir(mypath)

  data = generate_synthetic_data(original_data, window_mean = 0,  window_std= 0, fixed_window=2 ,variance_range = 10**i, sythesize_ratio = 1)
  synthetic_data.append(data)
  #data.to_csv(mypath+'/synthetic_'+str(i)+'_dt31_05_.csv',  index=False )
  print('synth step : ', i, ' len : ', len(synthetic_data))

祝好运!

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59568114

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档