问在DateTime上合并Pandas数据帧
EN

Stack Overflow用户

提问于 2020-03-28 02:52:18

回答 1查看 37关注 0票数 0

我有几个csv文件，其中包含同一时间段内不同点的数据。当我尝试将数据集合并在一起时，我得到了一个包含第一个点和第二个点的数据帧。我想查看同一行中每个点的值(假设它们同时发生)

    for file in envData:
            tmp_df = pd.read_csv(f'{enviormentDataPath}/{eventFolder}/{file}')
            tmp_df.set_index("time [UTC]", inplace=True)
            station=tmp_df.values[0][1]

            for header in list(tmp_df):
                if 'time' not in header:
                    tmp_df = tmp_df.rename(columns={header: f"{station}_{header}"})

            if env_df is None:
                env_df=tmp_df
            else:
                env_df=pd.merge(env_df,tmp_df, how='outer', on='time [UTC]')

示例CSV1：

time [utc], u [kt], v [kt]
2015-10-17 10:00:00, 12, -14
2015-10-17 11:00:00, 13, -13

示例CSV2：

time [utc], u [kt], v [kt]
2015-10-17 10:00:00, 11, -12
2015-10-17 11:00:00, 10, -13

但是，env_df=pd.merge(env_df,tmp_df, how='outer', on='time [UTC]')命令只是创建了一个如下所示的表：

time[utc]            sample1_u sample1_v sample2_u sample2_v
2015-10-17 10:00:00  12        -14       NaN       NaN
2015-10-17 11:00:00  13        -13       NaN       NaN
2015-10-17 10:00:00  NaN       NaN       11        -12
2015-10-17 11:00:00  NaN       NaN       10        -13

如有任何帮助或建议，将不胜感激。

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-03-28 03:31:04

合并时我无法重现您的问题，您的列“time utc”是否不是datetime格式？

使用python 3.8和pandas 1.0.3

# import pandas
import pandas as pd
# read sample 1
sample_1_df = pd.read_csv("sample_1.csv", parse_dates=['time [utc]'], infer_datetime_format=True)
# read sample 2
sample_2_df = pd.read_csv("sample_2.csv", parse_dates=['time [utc]'], infer_datetime_format=True)
# Show sample 1
sample_1_df.info()     
"""                                                
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   time [utc]  2 non-null      datetime64[ns]
 1    u [kt]     2 non-null      int64         
 2    v [kt]     2 non-null      int64         
dtypes: datetime64[ns](1), int64(2)
memory usage: 176.0 bytes
"""
# Show sample 2 df
sample_2_df.info()
"""                                                     
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   time [utc]  2 non-null      datetime64[ns]
 1    u [kt]     2 non-null      int64         
 2    v [kt]     2 non-null      int64         
dtypes: datetime64[ns](1), int64(2)
memory usage: 176.0 bytes
"""
# Merge sample_1 and sample_2 on the time [utc] column
pd.merge(sample_1_df, sample_2_df, on='time [utc]')                    
Out[17]: 
           time [utc]   u [kt]_x   v [kt]_x   u [kt]_y   v [kt]_y
0 2015-10-17 10:00:00         12        -14         11        -12
1 2015-10-17 11:00:00         13        -13         10        -13