我有一个名为clients
的dataframe,它具有以下5000+行:
ID Ordered Days
1 101565 131
2 202546 122
3 459863 78
4 328453 327
5 458975 -27
我正在尝试创建一个循环,该循环查看天数,并在满足days critetaria的情况下使用新列替换它们,其基础如下:
Days NEW_COLUMNS
0-119 0-3 Months
120-209 4-6 Months
210-299 7-9 Months
300+ 10+ Months
-280 to -196 Reach out clients
-195 to -104 Send promotion
-103 to -1 Close case
< -280 Plan
我有以下代码,但到目前为止还没有起作用:
if(days <-280)(NEW_COLUMNS ="plan")
else if (days>-280 && days <-196)(NEW_COLUMNS=" Reach out clients";)
else if (days>-195 && days<-104)(NEW_COLUMNS =" Send promotion";)
else if (days>-103 && days <-1)(NEW_COLUMNS="Close case";)
else if(days> 0 && days <119)(NEW_COLUMNS="0-3 Months";)
else if(days > 120 && days <209)(NEW_COLUMNS="4-6 Mos";)
else if(days > 210 && day s<299)(NEW_COLUMNS="7-9 Mos";)
else if(days > 300)(NEW_COLUMNS="10+ Mos";)
最后,我想要一张这样的桌子:
ID Ordered Days New_Columns
1 101565 131 4-6 Months
2 202546 122 4-6 Months
3 459863 78 0-3 Months
4 328453 327 10+ Months
5 458975 -27 Close case
发布于 2019-08-27 06:20:37
请查看以下代码:
from pandas import DataFrame
Cars = {'ID': [1, 2, 3, 4, 5],
'Ordered': [101565,202546,459863,328453,458975],
'Days': [131, 122, 78, 327, -27]
}
df = DataFrame(Cars, columns=['ID', 'Ordered', 'Days'])
if "New_Columns" not in df:
df["New_Columns"] = ""
for index, row in df.iterrows():
days = row['Days']
val = ''
if days < -280:
val = 'Plan'
elif -280 < days < -196:
val = 'Reach out clients'
elif -195 < days < -104:
val = 'Send promotion'
elif -103 < days < -1:
val = 'Close case'
elif 0 < days < 119:
val = '0-3 Months'
elif 120 < days < 209:
val = '4-6 Months'
elif 210 < days < 299:
val = '7-9 Months'
elif days > 300:
val = '10+ Months'
df.at[index, 'New_Columns'] = val
print(df)
产出:
ID Ordered Days New_Columns
0 1 101565 131 4-6 Months
1 2 202546 122 4-6 Months
2 3 459863 78 0-3 Months
3 4 328453 327 10+ Months
4 5 458975 -27 Close case
发布于 2019-08-27 06:22:56
解决这一问题的一个很好的方法是使用numpy.select
。此函数接受条件列表和选项列表,然后选择第一个条件为真的选项。
一个优点是,由于条件是按顺序检查的,因此只需要检查day
值的条件的一侧。
假设输入数据被称为df
:
import numpy as np
import pandas as pd
conditions = [
df['Days'] < -280,
df['Days'] < -195,
df['Days'] < -103,
df['Days'] < 0,
df['Days'] < 120,
df['Days'] < 210,
df['Days'] < 300,
True
]
outputs = [
"Plan", "Reach out clients", "Send promotion", "Close case",
"0-3 Months", "4-6 Months", "7-9 Months", "10+ Months"
]
df['New_Columns'] = np.select(conditions, outputs)
结果:
ID Ordered Days New_Columns
0 1 101565 131 4-6 Months
1 2 202546 122 4-6 Months
2 3 459863 78 0-3 Months
3 4 328453 327 10+ Months
4 5 458975 -27 Close case
https://datascience.stackexchange.com/questions/58232
复制相似问题