文章/答案/技术大牛

发布

社区首页 >问答首页 >变宽表格式为长表格式，按年划分日期

问变宽表格式为长表格式，按年划分日期
EN

Stack Overflow用户

提问于 2019-06-02 04:52:14

回答 1查看 75关注 0票数 0

我有张桌子看起来像这样：

temp = [['K98R', 'AB',34,'2010-07-27', '2013-08-17', '2008-03-01', '2011-05-02', 44],['S33T','ES',55, '2009-07-23', '2012-03-12', '2010-09-17', '', 76]]
Data = pd.DataFrame(temp,columns=['ID','Initials','Age', 'Entry','Exit','Event1','Event2','Weight'])

您在上表中看到的是，有一个输入和退出日期，事件1和事件2的日期，还有第二个病人缺少事件2的日期，因为事件没有发生。还要注意的是，第一个病人的event1发生在进入日期之前。

我试图实现的是两方面: 1.将输入和退出之间的时间划分为年份2。将宽格式转换为长格式，每年3行一行。检查每一行所包含的时间段内是否发生了事件1和2。

为了进一步解释，这是我想要做的输出。

ID    Initial   Age   Entry       Exit     Event1   Event2 Weight
K89R    AB       34 27/07/2010  31/12/2010  1       0       44
K89R    AB       35 1/01/2011   31/12/2011  1       1       44 
K89R    AB       36 1/01/2012   31/12/2012  1       1       44
K89R    AB       37 1/01/2013   17/08/2013  1       1       44
S33T    ES       55 23/07/2009  31/12/2009  0       0       76
S33T    ES       56 1/01/2010   31/12/2010  1       0       76
S33T    ES       57 1/01/2011   31/12/2011  1       0       76
S33T    ES       58 1/01/2012   12/03/2012  1       0       76

您在这里注意到的是，退出日期期间的条目被划分为每个病人的每个行，每个行代表一年。事件列现在被编码为0(意味着事件尚未发生)或1(事件发生)，然后由于事件已经发生而被转到以后的年份。

随着时间的推移，每一排病人的年龄都会增加。

病人ID和初始值保持不变，重量不变。

有谁能帮忙吗，谢谢

python

pandas

Stack Overflow用户

回答已采纳

发布于 2019-06-02 08:17:30

从输入和退出之间的年数开始：

# Convert to datetime
df.Entry = pd.to_datetime(df.Entry)
df.Exit = pd.to_datetime(df.Exit)
df.Event1 = pd.to_datetime(df.Event1)
df.Event2 = pd.to_datetime(df.Event2)
# Round up, to include the upper years 
import math
df['Years_Between'] = (df.Exit - df.Entry).apply(lambda x: math.ceil(x.days/365))

# printing the df will provide the following:

    ID  Initials    Age Entry   Exit    Event1  Event2  Weight  Years_Between
0   K98R    AB  34  2010-07-27  2013-08-17  2008-03-01  2011-05-02  44  4
1   S33T    ES  55  2009-07-23  2012-03-12  2010-09-17  NaT 76  3

循环遍历数据，并为每年创建一个新行：

new_data = []

for idx, row in df.iterrows():  

  year  = row['Entry'].year
  new_entry = pd.to_datetime(year,  format='%Y')

  for y in range(row['Years_Between']):

    new_entry = new_entry + pd.DateOffset(years=1)
    new_exit = new_entry + pd.DateOffset(years=1) - pd.DateOffset(days=1)

    record = {'Entry': new_entry,'Exit':new_exit}

    if row['Entry']> new_entry:
      record['Entry'] = row['Entry']

    if row['Exit']< new_exit:
      record['Exit'] = row['Exit']

    for col in ['ID', 'Initials', 'Age', 'Event1', 'Event2', 'Weight']:
      record[col] = row[col]

    new_data.append(record)

创建一个新的DataFrame，比较日期：

df_new = pd.DataFrame(new_data, columns = ['ID','Initials','Age', 'Entry','Exit','Event1','Event2','Weight'])
df_new['Event1'] = (df_new.Event1 <= df_new.Exit).astype(int)
df_new['Event2'] = (df_new.Event2 <= df_new.Exit).astype(int)

# printing df_new will provide:
    ID  Initials    Age Entry   Exit    Event1  Event2  Weight
0   K98R    AB  34  2011-01-01  2011-12-31  1   1   44
1   K98R    AB  34  2012-01-01  2012-12-31  1   1   44
2   K98R    AB  34  2013-01-01  2013-08-17  1   1   44
3   K98R    AB  34  2014-01-01  2013-08-17  1   1   44
4   S33T    ES  55  2010-01-01  2010-12-31  1   0   76
5   S33T    ES  55  2011-01-01  2011-12-31  1   0   76
6   S33T    ES  55  2012-01-01  2012-03-12  1   0   76

票数 0

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56412274

复制

相似问题

问变宽表格式为长表格式，按年划分日期
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问变宽表格式为长表格式，按年划分日期EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问变宽表格式为长表格式，按年划分日期
EN