我有两个与监督层次is相关的列,一个是父列,另一个是子列。因此,想象一下,父母从首席执行官开始,然后有一个孩子身份,比如首席营销官或任何一位CEO直接报告。我正在尝试基于这两列构建整个sup组织层次结构。
Sup Org Name | superior org ID. | sup org ID
CEO | | 111
CPO | 111 | 222
CTO | 111 | 221
Engineering | 221 | 223
PM | 222 | 224
Backend Dev. | 223 | 228我的目标是跨多个列创建层次结构,这样我们就可以构建出整个层次结构。
Level 1. | Level 2 | Level 3 | Level 4
111 | 222 | 224
111 | 221 | 223
111 | 221 | 223 | 228诸若此类。
我还想看到上面用sup组织名替换id的情况。这两种方式对我的数据都有帮助。
我试过很多自我加入,但必须有一个更干净的方法.
发布于 2022-02-12 18:40:00
使用以下玩具数据帧:
import pandas as pd
df = pd.DataFrame(
{
"name": {
0: "CEO",
1: "CPO",
2: "CTO",
3: "Engineering",
4: "PM",
5: "Backend_Dev",
6: "COO",
},
"ID": {0: 111, 1: 222, 2: 221, 3: 223, 4: 224, 5: 228, 6: 220},
"superior_ID": {0: "", 1: 111, 2: 111, 3: 221, 4: 222, 5: 223, 6: 111},
}
)你可以试试这个:
# Setup
df = df.sort_values(by=["superior_ID", "ID"]).reset_index(drop=True)
highest_ID = df.loc[df["superior_ID"] == "", "ID"].values[0]
number_of_paths = df["superior_ID"].value_counts()[highest_ID]
paths = {i + 1: [] for i in range(number_of_paths)}
# Iterate to find all paths (111/222/224 is one path)
for i, row in df.iterrows():
if row["superior_ID"] == highest_ID:
paths[i + 1].append(highest_ID)
paths[i + 1].append(row["ID"])
continue
for path in paths.values():
if row["superior_ID"] in path:
path.append(row["ID"])
# Create new df concatenating found paths as rows
new_df = pd.DataFrame()
for path in paths.values():
s = pd.Series(path)
new_df = pd.concat([new_df, s], axis=1)
# Transpose and cleanup
new_df = (
new_df
.T
.pipe(lambda x: x.iloc[x.isna().sum(axis=1).mul(-1).argsort()])
.fillna(999)
.astype(int)
.pipe(lambda x: x.set_axis([f"level_{i+1}" for i in x.columns], axis="columns"))
.replace(999, "")
.reset_index(drop=True)
)因此:
print(new_df)
# Output
level_1 level_2 level_3 level_4
0 111 220
1 111 222 224
2 111 221 223 228https://stackoverflow.com/questions/71081165
复制相似问题