我有一个像这样的DataFrame。
df1 = pd.DataFrame(columns=['ID', 'Divide', 'Object', 'List'], data=[ ['A, B', 2, 20, [0, 5]], ['C, D', 2, 40, [10, 15, 35]], ['E, F', 2, 20, [11, 15]], ['G', 1, 10, [1, 5]], ['H', 1, 10, ''], ['I, J', 2, 20, ''] ])
| | ID | Divide | Object | List |
|---:|:-----|---------:|---------:|:-------------|
| 0 | A, B | 2 | 20 | [0, 5] |
| 1 | C, D | 2 | 40 | [10, 15, 35] |
| 2 | E, F | 2 | 20 | [11, 15] |
| 3 | G | 1 | 10 | [1, 5] |
| 4 | H | 1 | 10 | |
| 5 | I, J | 2 | 20 | |每个ID都需要有自己的行。但是,List列的数据属于每个ID。逻辑如下:
E 1250: Object/Divided 1E 232的E 232对象/拆分: Object - 1的条目。
最后一张表如下所示:
| | ID | Divide | Object | List |
|---:|:-----|---------:|---------:|:-------|
| 0 | A | 2 | 20 | 0, 5 |
| 1 | B | 2 | 20 | |
| 2 | C | 2 | 40 | 10, 15 |
| 3 | D | 2 | 40 | 35 |
| 4 | E | 2 | 20 | |
| 5 | F | 2 | 20 | 11, 15 |
| 6 | G | 1 | 10 | 1, 5 |
| 7 | H | 1 | 10 | |
| 8 | I | 2 | 20 | |
| 9 | J | 2 | 20 | |如果是列表,那么 lists 就可以用来将列表夷为平地。但是,我不知道该由谁来应用DataFrame中的计算逻辑来解析细节。谢谢
发布于 2021-04-30 19:29:16
你可以试试这个:
import pandas as pd
df = pd.DataFrame(columns=['ID', 'Divide', 'Object', 'List'], data=[ ['A, B', 2, 20, [0, 5]], ['C, D', 2, 40, [10, 15, 35]], ['E, F', 2, 20, [11, 15]], ['G', 1, 10, [1, 5]], ['H', 1, 10, ''], ['I, J', 2, 20, ''] ])
def split_list(lst, limit):
l1 = list()
l2 = list()
for e in lst:
if e <= limit:
l1.append(e)
else:
l2.append(e)
return l1, l2
df['ID'] = df['ID'].str.split(', ')
df['Limit'] = df['Object'] / df['Divide']
df['List'] = df.apply(lambda row: dict(zip(row['ID'], split_list(row['List'], row['Limit']))), axis=1)
df = df.explode('ID')
df['List'] = df.apply(lambda row: row['List'].get(row['ID']), axis=1)
print(df)
# Out[192]:
# ID Divide Object List Limit
# 0 A 2 20 [0, 5] 10.0
# 0 B 2 20 [] 10.0
# 1 C 2 40 [10, 15] 20.0
# 1 D 2 40 [35] 20.0
# 2 E 2 20 [] 10.0
# 2 F 2 20 [11, 15] 10.0
# 3 G 1 10 [1, 5] 10.0
# 4 H 1 10 [] 10.0
# 5 I 2 20 [] 10.0
# 5 J 2 20 [] 10.0发布于 2021-04-30 20:50:39
尝试同时爆炸ID和列表,然后根据ID的顺序进行有条件筛选。
import pandas as pd
df1 = pd.DataFrame(columns=['ID', 'Divide', 'Object', 'List'],
data=[['A, B', 2, 20, [0, 5]],
['C, D', 2, 40, [10, 15, 35]],
['E, F', 2, 20, [11, 15]],
['G', 1, 10, [1, 5]],
['H', 1, 10, ''],
['I, J', 2, 20, '']])
# Split and Explode ID
df1['ID'] = df1['ID'].str.split(', ')
# Group By Each ID and set index so that First and Second IDs are tracked
df1 = df1.explode('ID') \
.groupby(level=0) \
.apply(lambda x: x.reset_index()) \
.droplevel(0)
# Calculate Cap For Later
df1['cap'] = df1['Object'] // df1['Divide'] - 1
def split_lists(g):
# If more than 1 row and non-empty list
if len(g) > 1 and not g['List'].empty:
# Check if is the First ID
if g['level_0'].iloc[0] == 0:
# Filter Less Than Equal To Cap
g['List'] = g['List'][g['List'] <= g['cap']]
else:
# Filter Greater Than Cap
g['List'] = g['List'][g['List'] > g['cap']]
return g
# Explode Lists Group By ID filter using function
# Regroup and convert back to lists
df2 = df1 \
.explode('List') \
.reset_index() \
.groupby('ID') \
.apply(split_lists) \
.groupby('ID')['List'] \
.apply(lambda x: x.dropna().tolist())
# Drop Extra Columns from df1 and merge back
out = df1.drop(columns=['List', 'index', 'cap']) \
.merge(df2, left_on='ID', right_index=True, how='left') \
.reset_index(drop=True)
print(out)退出:
ID Divide Object List
0 A 2 20 [0, 5]
1 B 2 20 []
2 C 2 40 [10, 15]
3 D 2 40 [35]
4 E 2 20 []
5 F 2 20 [11, 15]
6 G 1 10 [1, 5]
7 H 1 10 []
8 I 2 20 []
9 J 2 20 []带有附加列的DF1
index ID Divide Object List cap
0 0 A 2 20 [0, 5] 9
1 0 B 2 20 [0, 5] 9
0 1 C 2 40 [10, 15, 35] 19
1 1 D 2 40 [10, 15, 35] 19
0 2 E 2 20 [11, 15] 9
1 2 F 2 20 [11, 15] 9
0 3 G 1 10 [1, 5] 9
0 4 H 1 10 9
0 5 I 2 20 9
1 5 J 2 20 9滤波重组后的DF2
ID
A [0, 5]
B []
C [10, 15]
D [35]
E []
F [11, 15]
G [1, 5]
H []
I []
J []
Name: List, dtype: objecthttps://stackoverflow.com/questions/67339269
复制相似问题