我有两个CSV文件:
dump1.csv
Primary attribute1 attribute2 attribute3
rowA 3992372839 3778338494 9180339101
rowB 8291392010 3739203044 2840493019
dump2.csv
Primary attribute1 attribute2 attribute3
rowA 8911849302 9018383910 8103293202
rowB 7310393021 8301940301 7209301030
现在我想创建第三个CSV文件dump1_dump2.csv
在第三个文件中逐个复制具有相同标题的CSV和dump2.csv
dump1_
和dump2_
的行名,以区分它们。第三个CSV dump1_dump2.csv
应该如下所示:
Primary attribute1 attribute2 attribute3
dump1_rowA 3992372839 3778338494 9180339101
dump2_rowA 8911849302 9018383910 8103293202
change % 123.22 138.68 -11.73
dump1_rowB 8291392010 3739203044 2840493019
dump2_rowB 7310393021 8301940301 7209301030
change % -11.83 122.02 153.80
我写了一小段代码:
import csv
f1 = open('dump2.csv', 'r')
f2 = open('dump2.csv', 'r')
f3 = open('results.csv', 'w')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
finallist = list(c2)
for hosts_row in c1:
row = 1
for final_row in finallist:
results_row = hosts_row
if hosts_row[0] == final_row[0]:
# copy line from dump1.csv, paste it to dump1_dump2.csv and modify hosts_row[0] by appending dump1 & dump2 respectively
results_row_1 = #new row containing content of dump1
results_row_2 = #new row containing content of dump2
break
row = row + 2
c3.writerow(results_row_1)
c3.writerow(results_row_2)
f1.close()
f2.close()
f3.close()
我使用的是Python 3.6。
任何帮助都将不胜感激!
发布于 2018-09-03 20:18:59
欢迎来到Stackoverflow。
回答您的问题,您可以使用pandas进行更容易和更快的评估。
import pandas as pd
df1 = pd.read_csv("dump1.csv")
df2 = pd.read_csv("dump2.csv")
df1["Primary"] = "dump1_"+df1["Primary"].astype(str)
df2["Primary"] = "dump2_"+df2["Primary"].astype(str)
df3 = pd.concat([df1,df2]).sort_index()
df4 = pd.DataFrame(columns=["Primary","attribute1","attribute2","attribute3"])
indx = df3.index.value_counts()
for i, j in indx.iteritems():
if j == 2:
tempdf = df3.loc[1].append({"attribute1":((df3.loc[i]["attribute1"].astype(int).diff()/df3.loc[i]["attribute1"].astype(int).values[0])*100).values[1],"attribute2":((df3.loc[i]["attribute2"].astype(int).diff()/df3.loc[i]["attribute2"].astype(int).values[0])*100).values[1], "attribute3":((df3.loc[i]["attribute3"].astype(int).diff()/df3.loc[i]["attribute3"].astype(int).values[0])*100).values[1], "Primary":"Change %"}, ignore_index=True)
df4 = tempdf.append(df4)
df4.to_csv("dump1_dump2.csv", index=False)
输出:
Primary attribute1 attribute2 attribute3
dump1_rowA 3992372839 3778338494 9180339101
dump2_rowA 8911849302 9018383910 8103293202
Change % 123.222 138.687 -11.7321
dump1_rowB 8291392010 3739203044 2840493019
dump2_rowB 7310393021 8301940301 7209301030
Change % -11.8315 122.024 153.805
希望这能解决你的问题,如果不能,请让我知道。谢谢
https://stackoverflow.com/questions/52145057
复制相似问题