任务如下。它需要找出最常被延误的前五条路线,并计算它们因天气状况被延误了多少次。
以下是航班清单:
FlightNum CancellationCode
1 "B"
1 NA
1 NA
2 NA
2 "A"
2 "A"
3 NA
3 NA
3 NA
4 "B"
4 "B"
4 "B"
5 NA
5 "A"
5 "B"
6 "A"
6 "A"
6 "A"
6 "B"
7 "A"
7 "B"
7 "B"
CancellationCode是延迟的原因。"A“-承运人,"B”-天气,NA -及时离开。我编写了代码,其中找到了最常被延迟的前5条路线。
data[(data.CancellationCode.notnull())]['FlightNum'].value_counts()[:5]
Result:
6: 4
7: 3
4: 3
5: 2
2: 2
现在需要显示这些航班因天气("B")而延误的航班数量。结果必须如下:
6: 1
7: 2
4: 3
5: 1
2: 0
如何改进我的代码?
发布于 2022-10-05 07:06:46
这里有个办法。首先,得到的value_counts
,当它是由于天气和reindex
与当前解决方案的索引,你必须得到的只有前5条路线。
res = (
data.loc[data['CancellationCode'].eq('"B"'), 'FlightNum'].value_counts()
.reindex(data.loc[data['CancellationCode'].notnull(), 'FlightNum']
.value_counts()[:5].index,
fill_value=0)
)
print(res)
# 6 1
# 4 3
# 7 2
# 2 0
# 5 1
# Name: FlightNum, dtype: int64
发布于 2022-10-05 11:30:41
使用枢轴表可以提供非常丰富的信息:
table = df.dropna().assign(n=1).pivot_table(index='FlightNum',
columns='CancellationCode',
aggfunc='sum',
margins=True,
fill_value=0).droplevel(0,1)
>>> table
'''
CancellationCode "A" "B" All
FlightNum
1 0 1 1
2 2 0 2
4 0 3 3
5 1 1 2
6 3 1 4
7 1 2 3
All 7 8 15
'''
# the top 5 routes that were most often delayed
table.drop('All').nlargest(5,'All')
>>> out
'''
CancellationCode "A" "B" All
FlightNum
6 3 1 4
4 0 3 3
7 1 2 3
2 2 0 2
5 1 1 2
'''
# or to show only the number of delayed flights due to the weather ("B")
table.drop('All').nlargest(5,'All')['"B"']
>>> out
'''
FlightNum
6 1
4 3
7 2
2 0
5 1
Name: "B", dtype: int64
https://stackoverflow.com/questions/73962235
复制相似问题