我的数据文件(日降雨量数据)有以下格式
df = Year Month Day01 Day02 Day03 ..............Day31
1970 1 0 0 20 3.5
1970 2 0 0 20 3.5
1970 3 0 0 20 3.5
... . . . .. ...
... . . . .. ...
我想把上面的数据读成日期格式。
df = date (year-month-day)
请帮帮忙
您可以在这里找到数据,https://docs.google.com/spreadsheets/d/1sPRiRDYmWyTuuhks3CDWXj0eNcddsJopUNfjEAlSI-w/edit?usp=sharing
发布于 2020-05-17 14:59:04
我假设您已经有了具有以下格式的数据格式:
YEAR MN DRF01 DRF02 DRF03 DRF04 DRF05 DRF06 DRF07 DRF08 DRF09 DRF10 DRF11 DRF12 DRF13 DRF14 DRF15 DRF16 DRF17 DRF18 DRF19 DRF20 DRF21 DRF22 DRF23 DRF24 DRF25 DRF26 DRF27 DRF28 DRF29 DRF30 DRF31
1971 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 22.0 0.0 0.0 4.6
1971 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN NaN NaN
1971 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1971 4 0.0 0.0 0.0 0.0 0.0 0.0 25.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 8.6 0.0 0.0 0.0 7.4 24.0 0.0 NaN
1971 5 3.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 40.0 0.0 0.0 106.0 0.0 2.0 0.0 0.0 0.0 0.0 3.0 6.6 0.0 0.0 22.6 26.8 12.8
你可以通过堆叠柱子来得到你想要的东西。理想的堆叠数据应该只包含要堆栈的列,其余的列移到索引中:
result = df.rename(columns={'MN': 'MONTH'}) \
.set_index(['YEAR', 'MONTH']) \
.rename_axis('DAY', axis=1) \
.stack() \
.to_frame('RAINFALL') \
.reset_index()
result['DAY'] = result['DAY'].str[-2:].astype('int')
result['DATE'] = pd.to_datetime(result[['YEAR', 'MONTH', 'DAY']])
结果:
YEAR MONTH DAY RAINFALL DATE
1971 1 1 0.0 1971-01-01
1971 1 2 0.0 1971-01-02
1971 1 3 0.0 1971-01-03
1971 1 4 0.0 1971-01-04
1971 1 5 0.0 1971-01-05
发布于 2020-05-17 15:25:13
使用df.melt
可能更直接:
import pandas as pd
df = pd.DataFrame({'Year': {0: 1910, 1: 1910, 2: 1911},
'Month': {0:1, 1:1, 2:2},
'Day 1': {0: 1, 1: 3, 2: 5},
'Day 2': {0: 2, 1: 4, 2: 6}})
print(df)
day_columns = [i for i in df.columns if 'Day' in i]
df = pd.melt(df,id_vars=['Year','Month'],value_vars=day_columns,var_name='Day',value_name='Rain')
df['Day'] = df['Day'].str.replace('Day ','')
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])
print(df)
https://stackoverflow.com/questions/61853631
复制相似问题