囚徒的困境,但有三个选择,和回报是随机的!
每一轮,你的机器人收到一个3x3网格,并选择一行发挥。网格可能是这样的:
4  5  7
3  1  9
9  9  0网格中的每个数字在0到10之间(包括在内)。你这一轮的得分是grid[your_play][their_play],而你的对手是grid[their_play][your_play]。您按顺序播放100次(+/-10)回合,保留任何您想要的信息。胜利者是最终得分较高的机器人(两个机器人平局均为0.5胜)。
使用上面的网格:
玩家1:第2排
玩家2:第2行
两名球员都得到了零分。
玩家1:第1排
玩家2:第0行
第一名获得3分,第2名获得5分。
每个机器人将与每个机器人(包括自己!)玩10场~100回合的比赛。你的机器人可以在以下两类中获胜:
总冠军将通过合并两张表来确定。获奖者将被接受约1周后,最近的条目,但我可能会继续更新高分表,如果新的条目被添加。
在Python 3中使用以下签名编写两个函数:
def strategize(grid: list[list[int]], store: object) -> intdef interpret(grid: list[list[int]], moves: tuple(int, int), store: dict) -> Nonestrategize被称为每一轮,并且应该返回0、1或2。grid是可能支付的3x3网格。store是一个空的数据集,可以存储任何您想要存储的信息。interpret。moves是一个包含(your_move,opponents_move)的元组将您的代码放在您的答案中的第一个代码块中,这样控制器就可以轻松地拉进您的机器人。
“奈伊夫”选择了平均支出最高的行。
def strategize(grid, store):
    sums = [(sum(x), i) for (i, x) in enumerate(grid)]
    return max(sums)[1]
def interpret(grid, moves, store):
    pass“随机”选一列。
import random
def strategize(grid, store):
    return random.randint(0, 2)
def interpret(grid, moves, store):
    pass上使用
该控制器主要是从https://github.com/jthistle/KOTH-counting中改编而来。
本文还提供了几个示例机器人,以演示如何使用它。
arena.py是我用来计算最后分数的工具。它让每一个机器人互相对决。
update.py将从竞赛页面中获取所有提交的机器人。
使用标志--c或-constant将导致游戏的玩,而不是随机的网格之间的回合,纯粹是为了兴趣。
By score:
1: Blendo with 12237.8 points
2: The Student with 11791.8 points
3: Naiive with 11728.2 points
4: Analyst with 11591.9 points
5: Min-Maxer with 11155.3 points
6: Rafaam with 11041.1 points
7: Naive Variation with 10963.1 points
8: Villain with 10853.6 points
9: Gradient-Mehscent with 10809.8 points
10: Gentleman with 10715.1 points
11: WhatDoYouExpect with 10676.6 points
12: Thief with 9687.6 points
13: Minimum Maximizer with 9656.0 points
14: HermitCrab with 9654.5 points
15: crab with 9001.9 points
16: Investigator with 8698.0 points
17: Random with 8653.3 points
By wins:
1: The Student with 15.2/16 wins
2: Blendo with 14.8/16 wins
3: Rafaam with 14.1/16 wins
4: WhatDoYouExpect with 13.9/16 wins
5: Naiive with 11.3/16 wins
6: Analyst with 10.8/16 wins
7: Min-Maxer with 10.4/16 wins
8: Naive Variation with 8.1/16 wins
9: Villain with 8.1/16 wins
10: HermitCrab with 7.3/16 wins
11: crab with 6.4/16 wins
12: Thief with 5.1/16 wins
13: Minimum Maximizer with 3.9/16 wins
14: Gradient-Mehscent with 1.9/16 wins
15: Investigator with 1.6/16 wins
16: Random with 1.5/16 wins
17: Gentleman with 1.5/16 wins
Combined leaderboard (fewer pts = better):
1: Blendo  (3 pts)
1: The Student  (3 pts)
3: Naiive  (8 pts)
4: Rafaam  (9 pts)
5: Analyst  (10 pts)
6: Min-Maxer  (12 pts)
7: Naive Variation  (15 pts)
7: WhatDoYouExpect  (15 pts)
9: Villain  (17 pts)
10: Gradient-Mehscent  (23 pts)
11: Thief  (24 pts)
11: HermitCrab  (24 pts)
13: Minimum Maximizer  (26 pts)
13: crab  (26 pts)
15: Gentleman  (27 pts)
16: Investigator  (31 pts)
17: Random  (33 pts)发布于 2021-06-07 21:27:58
螃蟹想让每个人都死。当其他人都不快乐的时候,螃蟹是快乐的。是的螃蟹回来了。
def strategize(grid, store):
    aa = grid[0][0] #aa means '0-0'.
    ab = grid[1][0] #and similarly for the others.
    ac = grid[2][0]
    ba = grid[0][1]
    bb = grid[1][1] #I wish I had a macro to do this.
    bc = grid[2][1]
    ca = grid[0][2]
    cb = grid[1][2]
    cc = grid[2][2]
    a = (aa * ab * ac)/3 # a, b, and c are the respective averages.
    b = (ba * bb * bc)/3
    c = (ca * cb * cc)/3
    if a <= min(b, c):
        return 0
    if b <= min(a, c):
        return 1
    return 2
def interpret(grid, moves, store):
    pass具体来说,它选择了对对手伤害最大的价值。
发布于 2021-06-07 07:15:28
def strategize(grid, store = None):
    nonzero = [a for a in range(3)]
    if len([a for a in grid if not 0 in a]):
        maximum = max(nonzero, key=lambda index: sum(grid[index]) if not 0 in grid[index] else 0)
        if max(grid[maximum]) > 5:
            return maximum
        else:
            return max(nonzero, key=lambda arr: max(grid[arr]))
    return __import__('random').randint(0, 2)
def interpret(grid, moves, store):
    pass分析师是一种机器人,它试图从非零的可能性中找到最高的平均值,如果最大值不够高,那么就找出最大值最大的指数。如果所有子列表都有零,请随机选择。
发布于 2021-06-07 12:21:12
def strategize(grid, store):
    expected = [0] * 3
    for my_play in range(3):
        for opponent_play in range(3):
            expected[my_play] += grid[my_play][opponent_play] - grid[opponent_play][my_play]
    return expected.index(max(expected))
def interpret(grid, moves, store):
    pass始终以最大期望值执行移动操作。
https://codegolf.stackexchange.com/questions/229155
复制相似问题