我有一个大的df,其结构如下:
Grade Price Group
A 2 apple
A 10 apple
A 9 apple
A 10 apple
B 8 apple
B 7 apple
B 6 apple
B 10 apple
A 12 berry
A 11 berry
A 11 berry
A 12 berry
A 10 berry
B 9 berry
B 9 berry
B 10 berry
B 11 berry
我需要根据以下条件过滤每个Group
的df。如果是苹果,那么分数应该在以下限制之内,否则不包括那些违反本Group
的。它必须检查每个唯一的Group
变量,并检查价格是否在相应的等级范围内。
apple
A 9-10
B 7-8
berry
A 11-12
B 9-10
然后,输出df应该只有那些符合标准的输出。
Grade Price Group
A 10 apple
A 9 apple
A 10 apple
B 8 apple
B 7 apple
A 12 berry
A 11 berry
A 11 berry
A 12 berry
B 9 berry
B 9 berry
B 10 berry
目前,我过滤满足每个条件的dataframe,并将产生的数据连在一起。
a = df[(df['Group'] == 'apple') & (df['Grade'] == 'A') & (df['Price'].between(9, 10))]
b = df[(df['Group'] == 'apple') & (df['Grade'] == 'B') & (df['Price'].between(7, 8))]
res = pd.concat([a,b])
而且,对于每个条件,写入多个数据帧并不是最优的。
任何有效的解决办法,如果排除那些不符合标准的问题,将是有帮助的。
发布于 2022-09-22 07:47:20
假设OP在问题中共享的数据是df1
,并且希望创建一个新的dataframe,df2
,下面的筛选器将完成这项工作
df2 = df1[(df1['Group'] == 'apple') & ((df1['Grade'] == 'A') & (df1['Price'].between(9, 10)) | (df1['Grade'] == 'B') & (df1['Price'].between(7, 8))) | (df1['Group'] == 'berry') & ((df1['Grade'] == 'A') & (df1['Price'].between(11, 12)) | (df1['Grade'] == 'B') & (df1['Price'].between(9, 10)))]
[Out]:
Grade Price Group
1 A 10 apple
2 A 9 apple
3 A 10 apple
4 B 8 apple
5 B 7 apple
8 A 12 berry
9 A 11 berry
10 A 11 berry
11 A 12 berry
13 B 9 berry
14 B 9 berry
15 B 10 berry
根据OP's request,可以使用以下函数
def filter_df(df, groups, prices):
# Create a list of conditions
conditions = []
# For each group
for group in groups:
# For each grade
for grade in prices[group].keys():
# Create a condition
condition = (df['Group'] == group) & (df['Grade'] == grade) & (df['Price'].between(prices[group][grade][0], prices[group][grade][1]))
# Append the condition to the list of conditions
conditions.append(condition)
# Filter the dataframe
df = df[np.any(conditions, axis=0)]
# Return the filtered dataframe
return df
并将其应用于OP的用例如下(基本上,其中一个必须与所有组一起传递一个列表,一个包含组、等级和价格范围的字典)
df3 = filter_df(df1, ['apple', 'berry'], {'apple':{'A':[9, 10], 'B':[7, 8]}, 'berry':{'A':[11, 12], 'B':[9, 10]}})
[Out]:
Grade Price Group
1 A 10 apple
2 A 9 apple
3 A 10 apple
4 B 8 apple
5 B 7 apple
8 A 12 berry
9 A 11 berry
10 A 11 berry
11 A 12 berry
13 B 9 berry
14 B 9 berry
15 B 10 berry
发布于 2022-09-22 08:16:59
您可以为您的条件创建一个字典并使用apply
。
dct = {'apple': {'A': range(9, 11), 'B': range(7, 9)}, \
'berry': {'A': range(11, 13), 'B': range(9, 11)}}
df[df.apply(lambda x: x.Price in dct[x.Group][x.Grade] , axis=1)]
发布于 2022-09-22 07:47:08
apples = [apple for apple in myList if apple.Price == 'apple']
这方面的一些东西可以工作,而不是测试,但列表理解很好地解决这个问题。
https://stackoverflow.com/questions/73810978
复制相似问题