我有一个在csv文件行上操作的函数,根据是否满足条件将不同单元格的值添加到字典中:
df = pd.concat([pd.read_csv(filename) for filename in args.csv], ignore_index = True)
ID_Use_Totals = {}
ID_Order_Dates = {}
ID_Received_Dates = {}
ID_Refs = {}
IDs = args.ID
def TSQs(row):
global ID_Use_Totals, ID_Order_Dates, ID_Received_Dates
if row['Stock Item'] not in IDs:
pass
else:
if row['Action'] in ['Order/Resupply', 'Cons. Purchase']:
if row['Stock Item'] not in ID_Order_Dates:
ID_Order_Dates[row['Stock Item']] = [{row['Ref']: pd.to_datetime(row['TransDate'])}]
else:
ID_Order_Dates[row['Stock Item']].append({row['Ref']: pd.to_datetime(row['TransDate'])})
elif row['Action'] == 'Received':
if row['Stock Item'] not in ID_Received_Dates:
ID_Received_Dates[row['Stock Item']] = [{row['Ref']: pd.to_datetime(row['TransDate'])}]
else:
ID_Received_Dates[row['Stock Item']].append({row['Ref']: pd.to_datetime(row['TransDate'])})
elif row['Action'] == 'Use':
if row['Stock Item'] in ID_Use_Totals:
ID_Use_Totals[row['Stock Item']].append(row['Qty'])
else:
ID_Use_Totals[row['Stock Item']] = [row['Qty']]
else:
pass目前,我正在做:
for index, row in df.iterrows():
TSQs(row)但是,对于40000行csv文件,timer()返回70到90秒。
我想知道在整个dataframe (它可能有数十万行)实现这一点的最快方法是什么。
发布于 2020-07-30 12:44:30
您可以使用apply函数。代码将如下所示:
df.apply(TSQs, axis=1)在这里,当axis=1时,每一行将作为一个pd.Series发送到函数TSQs,您可以在那里进行像row["Ref"]这样的索引以获得该行的值。因为这是一个向量操作,所以它将在for循环之后运行那么多。
https://stackoverflow.com/questions/63173294
复制相似问题