首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >当分钟重复时,从数据帧到时间序列

当分钟重复时,从数据帧到时间序列
EN

Stack Overflow用户
提问于 2018-02-20 17:32:24
回答 1查看 69关注 0票数 0

我正在处理临床数据,希望对患者每分钟的等待时间进行预测,数据(简化)如下所示:

代码语言:javascript
运行
复制
Time(minutes)    PatientSerial       RemainingTime(minutes)
420              1                      5
420              2                      10
420              3                      8
421              1                      4
421              2                      9
421              3                      7

其中420是自午夜以来的分钟数(420 = 7:00am),其中我的输出是RemainingTime (历史数据)。一般来说,机器学习算法应该生成每个患者在每分钟的等待时间,因为输入是每分钟生成的临床数据。但我很困惑,当相同的分钟重复时,如何将此数据帧转换为时间序列?

EN

回答 1

Stack Overflow用户

发布于 2018-02-20 18:02:28

为了清楚起见:这不是一个答案,而是询问结果应该是什么样子(不能在问题下面的注释中显示视图)。这可能有助于更好地理解应该如何解决这个问题。Edit1:编码答案在之下。

@Ted:

我想知道您尝试得到的结果是否如下表所示:

代码语言:javascript
运行
复制
Time (min)   MeanWait (min, single default patient)

...          ...
420          5.2
421          4.9
422          4.3
423          4.2
...
...
820          11.39
821          11.41
822          11.41
823          11.09
824          10.7
825          10.69
...          ...

最终结果应该使用Matplotlib格式还是在屏幕上的程序图形用户界面中查看?如果是这样,请修改您的问题以将其包括在内。

EDIT 1:

基于我在以下脚本中所做的注释,该脚本执行核心工作“计算”每单位日间(分钟)的平均患者等待时间。内联中有评论what happens。因为我认为你可以自己实现filedata加载和输出,所以我没有为matplotlib添加这些。在here和网络上有很多例子就足够了。

代码语言:javascript
运行
复制
import datetime

# The day timescale is from 0 to 1440 minutes and then resets for day 2.
# The input-textfile can have 24h (0-1440) or continues (e.g. 0-4320 == 3 days) timescaling for x-axis.

# Testset for dataprocessing (day 1 and day2 data)
datas = ['Time(minutes)       RemainingTime(minutes)', 
        '420              :                      5',
        '420              :                      10',
        '420              :                      8',
        '421              :                      4',
        '421              :                      9',
        '421              :                      7',
        '830              :                      8',
        '830              :                      4',
        '340              :                      3',
        '340              :                      5',
        '340              :                      4',
        '351              :                      10',
        '351              :                      7',
        '420              :                      9',
        '420              :                      7',]

def sort_data(scr):

    raw_data            = {}
    day_minute_counter  = 0
    current_list        = []
    day_in_minutes      = (24 * 60)
    elapsed_days_min    = 0           # during processing this holds value in minutes
    processed_days      = 1
    data_from_exception = {}
    count_exceptions    = 0

    for row in scr:
        print row

        try:
            # the following steps take ito account that lapsed time is linear for a single day.
            # each row is being searched for ":" which identifies teh row as having integers or floats.
            x_value, y_value = row.split(":")

#            print 'xy_values : %s, %s' % (x_value, y_value)

            # clipping trailing whitespaces from both ends.
            x_value = x_value.strip(' ')
            y_value = y_value.strip(' ')

            # string > integer conversion
            x_val = int(x_value)
            y_val = int(y_value)

            # set each x-axis timepoint only once.
            if day_minute_counter == 0:
                print 'Start', day_minute_counter, x_val

                day_minute_counter = x_val

            # zipping: append all y-axis datapoint that belong to single x-axispoint
            if day_minute_counter == x_val:
                print 'Append', day_minute_counter, x_val
                current_list.append(y_val)

            # add x,y-axis data to the datalist
            if day_minute_counter < x_val:

                print 'Done', day_minute_counter, x_val, current_list

                raw_data[(day_minute_counter + elapsed_days_min)] = current_list

                day_minute_counter = x_val

                # new list for the next point in the "day_minute_counter".
                current_list      = []  
                current_list.append(y_val)

            # correct x-axis "next-day" time difference.
            if day_minute_counter > x_val:

                processed_days += 1

                print 'Next Day Marker', day_minute_counter, x_val, current_list

                raw_data[(day_minute_counter+ elapsed_days_min)] = current_list

                elapsed_days_min += day_in_minutes
                # reset day_minute_counter because a day has elapsed.
                day_minute_counter = 0
                print 'elapsed_day in minutes : ', elapsed_days_min

        except ValueError:
            #get axis information
            count_exceptions += 1
            data_from_exception[count_exceptions] = row
#            print 'Graph info or "none integer" information collected:\n\n%s > %s\n' % (count_exceptions, row)

    # End of datablock : add the last x,y datapoints without known what EOF marker is being used.
    raw_data[day_minute_counter] = current_list

    print '\nRaw Data   : %s\nOther info : %s\n ' % (raw_data, data_from_exception)

    return (raw_data, processed_days, data_from_exception)

def calc_mean(scr):

    days = scr[1]
    minutes = (days * 24 * 60)
    missing_datapoints = []
    result = []
    print 'Dataset spans a total of "%s" minutes.\n' % minutes

    data = scr[0]

    for x_datapoint in range(1, minutes):

        meanwait  = 0.0
        totalwait = 0

        try:
            # process data from sorte_data.

#            print 'datapoint', x_datapoint  # shows only the absent datapoints on x-axis.
            dataset   = data[x_datapoint]
#            print 'datapoint', x_datapoint  # shows only the available datapoints on x-axis.

            total_values = len(dataset)
            for value in dataset:
                totalwait += value

            meanwait = float(totalwait) / float(total_values)

            x = x_datapoint
            y = meanwait

            result.append((x, y))

            print 'Patient meanwaiting time per Timepoint %s : %.03f' % (x_datapoint, meanwait)

        except Exception:
            missing_datapoints.append(x_datapoint)
#            print 'Patient meainwaiting time "%s" is not available.' % x_datapoint

    return result

def main():

    # open file code here and use readlines to import data to "datas"
    #
    # datas = ...

    ct = str(datetime.datetime.now())[0:23]

    print '%s --> Collecting patient waittime data from Time Series.\n' % ct

    sorted_data = sort_data(datas)   # used template date from this script.

    print 'Processing data to obtain main values'

    the_result = calc_mean(sorted_data)

    print '\nProcessing Finished. Here is the result :\n\n%s' % the_result

    # create new file and store result or keep processing to PDF in matplotlib

if __name__ == '__main__':

    main()
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/48881901

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档