腾讯云

文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫:合并后如何组合后缀栏？

问熊猫:合并后如何组合后缀栏？
EN

Stack Overflow用户

提问于 2018-01-09 08:01:45

回答 1查看 1.3K关注 0票数 0

背景

有一个CSV包含以下格式的数据：
- 时间戳、第1次、第2次等

Python脚本定期查询API并添加到这个CSV中。
为此，CSV被加载到一个dataframe中，要添加的数据被加载到一个dataframe中。时间戳列是每个数据的索引。
有时在更新CSV时，由于一些我不知道的原因，CSV中已经有一行包含与新数据中的一个行相同的时间戳。
当这些行存在时，它们共享时间戳( CSV中的一个和新数据中的一个)，它们在其他列中的值总是不同的。

任务

如何使用Pandas将这些共享时间戳的行组合成一行？

当使用merge函数合并两个数据文件时，重叠的列分别保存，后缀(_x和_y)将它们区分开来。

示例

这是我得到的密码：

# this is an example of the code I have but it isn't working properly
from datetime import datetime
import pandas as pd

csv_data = 'example.csv'

timenow = datetime.now()

# reads dataframe from example.csv
historical_df = pd.read_csv(csv_data)
historical_df.set_index('timestamp', inplace=True)

# gets new data from API ### 1st API call
new_data = pd.DataFrame([timenow, 1234]).T
new_data.columns = ['timestamp', 'col 1']

new_data.set_index('timestamp', inplace=True)

# Concat current data to historical DF and dump to excel
updated_df = pd.concat([historical_df, new_data])

# Save to CSV
updated_df.to_csv(csv_data)


### 2nd API call with the same time
# reads dataframe from example.csv
historical_df = pd.read_csv(csv_data)
historical_df.set_index('timestamp', inplace=True)

# gets new data from API
new_data = pd.DataFrame([timenow, 5678]).T
new_data.columns = ['timestamp', 'col 2']

new_data.set_index('timestamp', inplace=True)

# Concat current data to historical DF and dump to excel
updated_df = pd.concat([historical_df, new_data])

# Save to CSV
updated_df.to_csv(csv_data)

“example.csv”的内容：

timestamp          col 1    col 2
9/01/2018 12:15    3610     2420.29

期望输出的示例：

timestamp            col 1      col 2
9/01/2018 12:15      3610       2420.29
<the new timestamp>  1234       5678

使用concat的结果是：

timestamp            col 1      col 2
9/01/2018 12:15      3610       2420.29
<the new timestamp>  1234       
<the new timestamp>             5678

使用merge的结果是：

timestamp            col 1_x   col 1_y   col 2_x    col 2_y
9/01/2018 12:15      3610                2420.29
<the new timestamp>  1234                           5678

请注意，此示例仅显示两个非时间戳列，但在给出的示例中实际上有15列。

pandas

pandas-groupby

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-01-09 08:46:24

正如COLDSPEED在评论中提到的那样，答案是使用combine_first而不是merge

df_with_merged_rows = df_with_new_data.combine_first(df_created_from_csv)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48172146

复制

相似问题

如何合并/合并熊猫栏？

熊猫合并数据后产生x和y后缀

如何在重命名栏后合并熊猫数据？

熊猫合并栏

熊猫数据组合/组合栏？

添加站长进交流群

领取专属 10元无门槛券

AI混元助手 在线答疑

关注 腾讯云开发者公众号

洞察 腾讯核心技术

剖析业界实践案例

问熊猫:合并后如何组合后缀栏？
EN

回答 1

Stack Overflow用户

如何合并/合并熊猫栏？

熊猫合并数据后产生x和y后缀

如何在重命名栏后合并熊猫数据？

熊猫合并栏

熊猫数据组合/组合栏？

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:合并后如何组合后缀栏？EN

回答 1

Stack Overflow用户

如何合并/合并熊猫栏？

熊猫合并数据后产生x和y后缀

如何在重命名栏后合并熊猫数据？

熊猫合并栏

熊猫数据组合/组合栏？

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:合并后如何组合后缀栏？
EN