我有几个csv文件,其中包含同一时间段内不同点的数据。当我尝试将数据集合并在一起时,我得到了一个包含第一个点和第二个点的数据帧。我想查看同一行中每个点的值(假设它们同时发生)
for file in envData:
tmp_df = pd.read_csv(f'{enviormentDataPath}/{eventFolder}/{file}')
tmp_df.set_index("time [UTC]", inplace=True)
station=tmp_df.values[0][1]
for header in list(tmp_df):
if 'time' not in header:
tmp_df = tmp_df.rename(columns={header: f"{station}_{header}"})
if env_df is None:
env_df=tmp_df
else:
env_df=pd.merge(env_df,tmp_df, how='outer', on='time [UTC]')
示例CSV1:
time [utc], u [kt], v [kt]
2015-10-17 10:00:00, 12, -14
2015-10-17 11:00:00, 13, -13
示例CSV2:
time [utc], u [kt], v [kt]
2015-10-17 10:00:00, 11, -12
2015-10-17 11:00:00, 10, -13
但是,env_df=pd.merge(env_df,tmp_df, how='outer', on='time [UTC]')
命令只是创建了一个如下所示的表:
time[utc] sample1_u sample1_v sample2_u sample2_v
2015-10-17 10:00:00 12 -14 NaN NaN
2015-10-17 11:00:00 13 -13 NaN NaN
2015-10-17 10:00:00 NaN NaN 11 -12
2015-10-17 11:00:00 NaN NaN 10 -13
如有任何帮助或建议,将不胜感激。
发布于 2020-03-28 03:31:04
合并时我无法重现您的问题,您的列“time utc”是否不是datetime格式?
使用python 3.8和pandas 1.0.3
# import pandas
import pandas as pd
# read sample 1
sample_1_df = pd.read_csv("sample_1.csv", parse_dates=['time [utc]'], infer_datetime_format=True)
# read sample 2
sample_2_df = pd.read_csv("sample_2.csv", parse_dates=['time [utc]'], infer_datetime_format=True)
# Show sample 1
sample_1_df.info()
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time [utc] 2 non-null datetime64[ns]
1 u [kt] 2 non-null int64
2 v [kt] 2 non-null int64
dtypes: datetime64[ns](1), int64(2)
memory usage: 176.0 bytes
"""
# Show sample 2 df
sample_2_df.info()
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time [utc] 2 non-null datetime64[ns]
1 u [kt] 2 non-null int64
2 v [kt] 2 non-null int64
dtypes: datetime64[ns](1), int64(2)
memory usage: 176.0 bytes
"""
# Merge sample_1 and sample_2 on the time [utc] column
pd.merge(sample_1_df, sample_2_df, on='time [utc]')
Out[17]:
time [utc] u [kt]_x v [kt]_x u [kt]_y v [kt]_y
0 2015-10-17 10:00:00 12 -14 11 -12
1 2015-10-17 11:00:00 13 -13 10 -13
请注意,列u kt和v kt现在具有后缀_x和_y。可以使用pd.merge中的后缀关键字参数更改此后缀
https://stackoverflow.com/questions/60896720
复制相似问题