我想学习这个CSV,并做以下操作:
1-将数据分成60个元素块,其中每个新块移动10个元素。
例子:0到60,10到70,20到80 .等
2-然后将区块分成5部分(12x5 = 60)
4-从每块60个元素中取下30个元素。
例如: 60至90,70至100,80至110 .等
5-计算0到100之间的读数,每20分组一次。
例子:0至20,20至40,40至60,60至80及80至100
(0-20) 12,18,11,14 =4 (20至40) 20 25 23=3 ..。
其结果将是这样的数据格式:
mean 1 | standard deviation 1 | ... | mean 5 | standard deviation 5 | 0 to 20| 20 to 40 | ... | 80 to 100
我的代码执行这个过程,但是路径中有些地方失败了,从最后的输出336行返回我,但是根据我的数据,它应该是700行左右。另外,我想让这个代码更干净,来改进解释,有什么建议吗?
def standardDeviation(data):
""" Calculates standard deviation """
return statistics.stdev(data)
def average(data):
""" Calculates average """
return statistics.mean(data)
def captureOcurrences(elements, n):
""" Capture an X number of elements within a list """
return [elements[i: i+n] for i in range(0, len(elements), n)]
def neuronsInput(elements):
""" Generates input neuron modeling (5 averages, 5 standard deviations - Between 12 occurrences in a window of 60 readings) """
result = []
temp = []
start = 0
limit = 60
size = int(len(elements))
TargetDivision = int(size / 30)
repetitions = 0
five = 0
while repetitions < TargetDivision:
temp = []
five += 1
ocurrences = captureOcurrences(elements[start: limit],12)
for i in ocurrences:
m = average(i)
sd = standardDeviation(i)
temp.append([m,sd])
result.append(temp)
repetitions += 1
limit += 10
start += 10
return result
def neuronsOutput(elements):
""" Generates output neuron modeling (Histogram of the next 30 data readings) """
result = []
start = 61
limit = 90
size = int(len(elements))
TargetDivision = int(size / 30)
repetitions = 0
while repetitions < TargetDivision:
counter=collections.Counter(elements[start: limit])
consumption0_20 = 0
consumption20_40 = 0
consumption40_60 = 0
consumption60_80 = 0
consumption80_100 = 0
for key in counter:
if key <= 20:
consumption0_20 += int(counter[key])
elif key > 20 and key < 40:
consumption20_40 += int(counter[key])
elif key > 40 and key < 60:
consumption40_60 += int(counter[key])
elif key > 60 and key < 80:
consumption60_80 += int(counter[key])
elif key > 80 and key < 100:
consumption80_100 += int(counter[key])
result.append([consumption0_20,consumption20_40,consumption40_60,consumption60_80,consumption80_100])
repetitions += 1
limit += 10
start += 10
return result
示例数据
data = {0: {'data': '7/11/2020 0:00', '"cpu"': 27.6},
1: {'data': '7/11/2020 0:01', '"cpu"': 0.7},
2: {'data': '7/11/2020 0:02', '"cpu"': 1.0},
3: {'data': '7/11/2020 0:03', '"cpu"': 2.7},
4: {'data': '7/11/2020 0:04', '"cpu"': 0.9},
5: {'data': '7/11/2020 0:05', '"cpu"': 4.2},
6: {'data': '7/11/2020 0:06', '"cpu"': 1.1},
7: {'data': '7/11/2020 0:07', '"cpu"': 0.6},
8: {'data': '7/11/2020 0:08', '"cpu"': 3.0},
9: {'data': '7/11/2020 0:09', '"cpu"': 0.8},
10: {'data': '7/11/2020 0:10', '"cpu"': 3.7},
11: {'data': '7/11/2020 0:11', '"cpu"': 13.2},
12: {'data': '7/11/2020 0:12', '"cpu"': 1.3},
13: {'data': '7/11/2020 0:13', '"cpu"': 2.9},
14: {'data': '7/11/2020 0:14', '"cpu"': 11.7},
15: {'data': '7/11/2020 0:15', '"cpu"': 9.2},
16: {'data': '7/11/2020 0:16', '"cpu"': 1.1},
17: {'data': '7/11/2020 0:17', '"cpu"': 0.7},
18: {'data': '7/11/2020 0:18', '"cpu"': 4.1},
19: {'data': '7/11/2020 0:19', '"cpu"': 0.7}}
df = pd.DataFrame.from_dict(data, orient='index')
发布于 2020-08-05 07:41:28
对于这样的操作,我更喜欢使用NumPy
(这比在代码中使用for
循环要快得多)。您可以简单地将NumPy
用作:
import numpy as np
import pandas as pd
#read the data
df = pd.read_csv('cpu-7day.csv')
data = df['"cpu"'].values
#task 1
blocks_data = []
for i in np.arange(0, int(data.shape[0]-50), 10):
blocks_data.append(data[i:i+60])
blocks_data = np.array(blocks_data)
#task 2
parts_data = blocks_data.reshape(-1, 5, 12)
#task 3
mean_parts_data = np.mean(parts_data, axis = -1)
std_parts_data = np.std(parts_data, axis = -1, ddof = 1)
#task 4
next_data = []
for i in np.arange(60, int(data.shape[0]-20), 10):
next_data.append(data[i:i+30])
next_data = np.array(next_data)
#task 5
count_groups = np.array([np.sum(((0<=next_data) & (next_data<20))*1, axis = -1),
np.sum(((20<=next_data) & (next_data<40))*1, axis = -1),
np.sum(((40<=next_data) & (next_data<60))*1, axis = -1),
np.sum(((60<=next_data) & (next_data<80))*1, axis = -1),
np.sum(((80<=next_data) & (next_data<100))*1, axis = -1)]).T
#collect all and merge in new dataframe
mean_std = np.append(mean_parts_data.reshape(-1, 1), std_parts_data.reshape(-1, 1), axis = -1).reshape(-1, 10)
pad_count_groups = np.pad(count_groups, (0, mean_std.shape[0]-count_groups.shape[0]))[:, :5]
res_data = np.append(mean_std, pad_count_groups, axis = 1)
columns = ['mean_1', 'std_1', 'mean_2', 'std_2', 'mean_3', 'std_3', 'mean_4', 'std_4', 'mean_5', 'std_5',
'0_20', '20_40', '40_60', '60_80', '80_100']
myDF = pd.DataFrame(res_data, columns = columns)
#save this dataframe
myDF.to_csv('myDF.csv', index = False)
发布于 2020-08-01 22:56:33
将数据分成60个元素块,其中每个新块移动10个元素
您是否考虑过最后6个size60块长度的减少?
例如:
#extract elements from testInputData & write to blocks [0,30),[10,40),[20,50)......
testInputData = list(range(0, 100))
data = [[]]
activeBlockIndexes=[]
i = 0
for row in testInputData:
if i%10==0:
activeBlockIndexes.append(int(i/10))
data.append([])
for b in activeBlockIndexes[-3:]:
data[b].append(row)
i+=1
for row in data:
print('['+('{:>4d}'*len(row)).format(*row)+']')
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]
[ 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
[ 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]
[ 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
[ 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69]
[ 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
[ 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89]
[ 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
[ 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
[ 90 91 92 93 94 95 96 97 98 99]
https://stackoverflow.com/questions/63210045
复制相似问题