我试图估计平均一场足球比赛将在不同的比赛状态下进行多少分钟,这取决于两支球队隐含的进球。
就我们的目的而言,有三种可能的游戏状态:
隐含进球意味着平均预计有多少支球队进球,比如主场队1.80分,客场队1.45分。
为了简单起见,我们可以假设:
一场典型的足球比赛包括90分钟的固定时间,通常是上半场1分钟的受伤时间和下半场的4分钟,总共95分钟。
我使用以下算法来完成此任务:
我对我的解决方案在概念上和实现上都有什么选择感兴趣。尽管我知道我的程序可以通过包含numpy来以数量级的速度加速,但在这种特殊情况下没有那么重要,因为运行它只需几秒钟,而且不应该在快速的连续过程中被大量调用。
from dataclasses import dataclass
import random
from statistics import mean
from numpy import cumsum
MATCH_LENGTH = 95
TRIALS = 100_000
@dataclass
class MatchGameState:
"""
Represents for how many minutes of a particular game:
- home team was ahead
- teams were drawing
- away team was ahead
"""
home_ahead: int
draw: int
away_ahead: int
MatchGameStates = list[MatchGameState]
def mean_game_state(home_implied_goals: float,
away_implied_goals: float,
match_length: int=MATCH_LENGTH,
trials: int=TRIALS) -> tuple[float, float, float]:
"""
Given match length in minutes and implied goals for home and away teams,
calculates for how many minutes per match in average home team will be ahead,
there will be a draw and away team will be ahead.
"""
random.seed()
sims = [single_match_game_state(home_implied_goals, away_implied_goals, match_length) for _ in range(trials)]
home_ahead_mean = round(mean(s.home_ahead for s in sims), 2)
draw_mean = round(mean(s.draw for s in sims), 2)
away_ahead_mean = round(mean(s.away_ahead for s in sims), 2)
return home_ahead_mean, draw_mean, away_ahead_mean
def single_match_game_state(home_implied_goals: float,
away_implied_goals: float,
match_length: int=95) -> MatchGameState:
"""
Given match length in minutes and implied goals for home and away teams,
simulates teams scoring minute by minute for a particular game.
Returns MatchGameState for the game.
"""
# Probability to score in a given minute
home_goals_per_min = home_implied_goals / match_length
away_goals_per_min = away_implied_goals / match_length
# For every minute in a game, 1 if a team scored in that minute, 0 otherwise
home_outcomes = [random.random() for _ in range(match_length)]
away_outcomes = [random.random() for _ in range(match_length)]
home_goals_by_minute = [int(home_outcome < home_goals_per_min) for home_outcome in home_outcomes]
away_goals_by_minute = [int(away_outcome < away_goals_per_min) for away_outcome in away_outcomes]
# How many goals a team scored by a particular minute of the game
home_cumulative_goals = cumsum(home_goals_by_minute)
away_cumulative_goals = cumsum(away_goals_by_minute)
home_ahead, draw, away_ahead = 0, 0, 0
for home_cumulative_score, away_cumulative_score in zip(home_cumulative_goals, away_cumulative_goals):
if home_cumulative_score > away_cumulative_score:
home_ahead += 1
elif home_cumulative_score == away_cumulative_score:
draw += 1
else:
away_ahead += 1
assert home_ahead + draw + away_ahead == match_length
return MatchGameState(home_ahead, draw, away_ahead)
if __name__ == '__main__':
print(mean_game_state(2.15, 1.20))
发布于 2022-09-20 00:01:26
很明显,您一直在使用泛型Python质量,从这个角度来看,这段代码是相当不错的。它掉下来的地方是Numpy的用途。
从某种意义上说,良好的Numpy代码看起来并不像我们想象的那样是Pythonic代码。您的数据必须离开,您的"single_“方法必须消失。必须将"single_“的操作撤到"mean_”,并对试验次数进行矢量化。对内置数学和随机库的引用将消失。这样,您可以在更短的时间内运行代码,并且/或运行更多的测试。
即使您要跳过矢量化(您不应该这样做),您还可以更改其他一些小事情:
PEP8的函数间距不足;您需要两个空行。
MatchGameStates
未使用,所以请删除它。
int=MATCH_LENGTH
也是不符合PEP8 8的,需要间隔。
random.seed()
不是mean_game_state
的责任;它是一个顶级程序,关注的是结果是否应该是可重复的。类似地,round
是一个显示关注点,不应该被放入您的业务逻辑函数中。
MatchGameState
,而不是dataclass
,比NamedTuple
更简单。
即使没有Numpy,home_ahead
、draw
和away_ahead
变量也可以在生成器上使用sum()
计算。
import numpy as np
from numpy.random import default_rng
MATCH_LENGTH = 95
TRIALS = 100_000
rand = default_rng()
def mean_game_state(
home_implied_goals: float,
away_implied_goals: float,
match_length_minutes: int = MATCH_LENGTH,
trials: int = TRIALS,
) -> tuple[float, float, float]:
"""
Given match length in minutes and implied goals for home and away teams,
calculates for how many minutes per match in average home team will be ahead,
there will be a draw and away team will be ahead.
"""
implied_goals = np.array((home_implied_goals, away_implied_goals))
# Probability to score in a given minute
goals_per_min = implied_goals / match_length_minutes
# For every minute in a game, 1 if a team scored in that minute, 0 otherwise
outcomes = rand.uniform(size=(trials, match_length_minutes, 2))
goals_by_minute = outcomes < goals_per_min
# How many goals a team scored by a particular minute of the game
home_cumulative_goals, away_cumulative_goals = goals_by_minute.cumsum(axis=1).T
diff = home_cumulative_goals - away_cumulative_goals
home_ahead = np.count_nonzero(diff > 0, axis=0)
away_ahead = np.count_nonzero(diff < 0, axis=0)
draw = np.count_nonzero(diff == 0, axis=0)
return home_ahead.mean(), draw.mean(), away_ahead.mean()
if __name__ == '__main__':
print(mean_game_state(home_implied_goals=2.15, away_implied_goals=1.20))
https://codereview.stackexchange.com/questions/279823
复制相似问题