文章/答案/技术大牛

发布

社区首页 >问答首页 >在csv中生成平均值和标准差

问在csv中生成平均值和标准差
EN

Stack Overflow用户

提问于 2020-08-01 21:03:53

回答 2查看 169关注 0票数 0

我想学习这个CSV，并做以下操作：

1-将数据分成60个元素块，其中每个新块移动10个元素。

例子:0到60，10到70，20到80 .等

2-然后将区块分成5部分(12x5 = 60)

3-计算各部分的平均值和偏差。

4-从每块60个元素中取下30个元素。

例如: 60至90，70至100，80至110 .等

5-计算0到100之间的读数，每20分组一次。

例子:0至20，20至40，40至60，60至80及80至100

(0-20) 12,18,11,14 =4 (20至40) 20 25 23=3 ..。

其结果将是这样的数据格式：

mean 1 | standard deviation 1 | ... | mean 5 | standard deviation 5 | 0 to 20| 20 to 40 | ... | 80 to 100

我的代码执行这个过程，但是路径中有些地方失败了，从最后的输出336行返回我，但是根据我的数据，它应该是700行左右。另外，我想让这个代码更干净，来改进解释，有什么建议吗？

def standardDeviation(data):
    """ Calculates standard deviation """
    
    return statistics.stdev(data)
       
def average(data):
    """ Calculates average """
    
    return statistics.mean(data)

def captureOcurrences(elements, n):
    """ Capture an X number of elements within a list """
    
    return [elements[i: i+n] for i in range(0, len(elements), n)]

def neuronsInput(elements):
    """ Generates input neuron modeling (5 averages, 5 standard deviations - Between 12 occurrences in a window of 60 readings) """
    
    result = []
    temp = []
    start = 0
    limit = 60
    size = int(len(elements))
    TargetDivision = int(size / 30)
    repetitions = 0
    five = 0

    while repetitions < TargetDivision:
        temp = []

        five += 1
        ocurrences = captureOcurrences(elements[start: limit],12)
        for i in ocurrences:
            m = average(i)
            sd = standardDeviation(i)
            temp.append([m,sd])

        result.append(temp)

        repetitions += 1
        limit += 10
        start += 10

    return result

def neuronsOutput(elements):
    """ Generates output neuron modeling (Histogram of the next 30 data readings) """
    
    result = []
    start = 61
    limit = 90
    size = int(len(elements))
    TargetDivision = int(size / 30)
    repetitions = 0

    while repetitions < TargetDivision:

        counter=collections.Counter(elements[start: limit])
        
        consumption0_20 = 0
        consumption20_40 = 0
        consumption40_60 = 0
        consumption60_80 = 0
        consumption80_100 = 0
        for key in counter:
            if key <= 20:
                consumption0_20 += int(counter[key])
            elif key > 20 and key < 40:
                consumption20_40 += int(counter[key])
            elif key > 40 and key < 60:
                consumption40_60 += int(counter[key])
            elif key > 60 and key < 80:
                consumption60_80 += int(counter[key])
            elif key > 80 and key < 100:
                consumption80_100 += int(counter[key])

        result.append([consumption0_20,consumption20_40,consumption40_60,consumption60_80,consumption80_100])

        repetitions += 1
        limit += 10
        start += 10

    return result

示例数据

data = {0: {'data': '7/11/2020 0:00', '"cpu"': 27.6},
        1: {'data': '7/11/2020 0:01', '"cpu"': 0.7},
        2: {'data': '7/11/2020 0:02', '"cpu"': 1.0},
        3: {'data': '7/11/2020 0:03', '"cpu"': 2.7},
        4: {'data': '7/11/2020 0:04', '"cpu"': 0.9},
        5: {'data': '7/11/2020 0:05', '"cpu"': 4.2},
        6: {'data': '7/11/2020 0:06', '"cpu"': 1.1},
        7: {'data': '7/11/2020 0:07', '"cpu"': 0.6},
        8: {'data': '7/11/2020 0:08', '"cpu"': 3.0},
        9: {'data': '7/11/2020 0:09', '"cpu"': 0.8},
        10: {'data': '7/11/2020 0:10', '"cpu"': 3.7},
        11: {'data': '7/11/2020 0:11', '"cpu"': 13.2},
        12: {'data': '7/11/2020 0:12', '"cpu"': 1.3},
        13: {'data': '7/11/2020 0:13', '"cpu"': 2.9},
        14: {'data': '7/11/2020 0:14', '"cpu"': 11.7},
        15: {'data': '7/11/2020 0:15', '"cpu"': 9.2},
        16: {'data': '7/11/2020 0:16', '"cpu"': 1.1},
        17: {'data': '7/11/2020 0:17', '"cpu"': 0.7},
        18: {'data': '7/11/2020 0:18', '"cpu"': 4.1},
        19: {'data': '7/11/2020 0:19', '"cpu"': 0.7}}

df = pd.DataFrame.from_dict(data, orient='index')

python

pandas

numpy

average

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-08-05 07:41:28

对于这样的操作，我更喜欢使用NumPy (这比在代码中使用for循环要快得多)。您可以简单地将NumPy用作：

import numpy as np
import pandas as pd

#read the data
df = pd.read_csv('cpu-7day.csv')
data = df['"cpu"'].values

#task 1
blocks_data = []
for i in np.arange(0, int(data.shape[0]-50), 10):
    blocks_data.append(data[i:i+60])
blocks_data = np.array(blocks_data)

#task 2
parts_data = blocks_data.reshape(-1, 5, 12)

#task 3
mean_parts_data = np.mean(parts_data, axis = -1)
std_parts_data = np.std(parts_data, axis = -1, ddof = 1)

#task 4
next_data = []
for i in np.arange(60, int(data.shape[0]-20), 10):
    next_data.append(data[i:i+30])
next_data = np.array(next_data)

#task 5
count_groups = np.array([np.sum(((0<=next_data) & (next_data<20))*1, axis = -1),
                         np.sum(((20<=next_data) & (next_data<40))*1, axis = -1),
                         np.sum(((40<=next_data) & (next_data<60))*1, axis = -1),
                         np.sum(((60<=next_data) & (next_data<80))*1, axis = -1),
                         np.sum(((80<=next_data) & (next_data<100))*1, axis = -1)]).T

#collect all and merge in new dataframe
mean_std = np.append(mean_parts_data.reshape(-1, 1), std_parts_data.reshape(-1, 1), axis = -1).reshape(-1, 10)
pad_count_groups = np.pad(count_groups, (0, mean_std.shape[0]-count_groups.shape[0]))[:, :5]
res_data = np.append(mean_std, pad_count_groups, axis = 1)

columns = ['mean_1', 'std_1', 'mean_2', 'std_2', 'mean_3', 'std_3', 'mean_4', 'std_4', 'mean_5', 'std_5',
           '0_20', '20_40', '40_60', '60_80', '80_100']
myDF = pd.DataFrame(res_data, columns = columns)

#save this dataframe
myDF.to_csv('myDF.csv', index = False)

票数 1

Stack Overflow用户

发布于 2020-08-01 22:56:33

将数据分成60个元素块，其中每个新块移动10个元素

您是否考虑过最后6个size60块长度的减少？

例如：

#extract elements from testInputData & write to blocks [0,30),[10,40),[20,50)......
testInputData = list(range(0, 100))
data = [[]]
activeBlockIndexes=[]
i = 0
for row in testInputData:
    if i%10==0:
        activeBlockIndexes.append(int(i/10))
        data.append([])
    for b in activeBlockIndexes[-3:]:
        data[b].append(row)
    i+=1
for row in data:
    print('['+('{:>4d}'*len(row)).format(*row)+']')

[   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29]
[  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39]
[  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49]
[  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59]
[  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69]
[  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79]
[  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89]
[  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99]
[  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99]
[  90  91  92  93  94  95  96  97  98  99]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63210045

复制

相似问题

问在csv中生成平均值和标准差
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在csv中生成平均值和标准差EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在csv中生成平均值和标准差
EN