文章/答案/技术大牛

发布

社区首页 >问答首页 >随机囚徒三引理- Python 3 KOTH

问随机囚徒三引理- Python 3 KOTH
EN

Code Golf用户

提问于 2021-06-07 01:19:53

回答 15查看 2K关注 0票数 12

囚徒的困境，但有三个选择，和回报是随机的！

每一轮，你的机器人收到一个3x3网格，并选择一行发挥。网格可能是这样的：

4  5  7
3  1  9
9  9  0

网格中的每个数字在0到10之间(包括在内)。你这一轮的得分是grid[your_play][their_play]，而你的对手是grid[their_play][your_play]。您按顺序播放100次(+/-10)回合，保留任何您想要的信息。胜利者是最终得分较高的机器人(两个机器人平局均为0.5胜)。

示例

使用上面的网格：

玩家1:第2排

玩家2:第2行

两名球员都得到了零分。

玩家1:第1排

玩家2:第0行

第一名获得3分，第2名获得5分。

赢

每个机器人将与每个机器人(包括自己！)玩10场~100回合的比赛。你的机器人可以在以下两类中获胜：

得分:将总分相加，终场得分最高的机器人将获胜。
胜利:一场“胜利”被计算为在100轮比赛后得分最高的机器人。

总冠军将通过合并两张表来确定。获奖者将被接受约1周后，最近的条目，但我可能会继续更新高分表，如果新的条目被添加。

技术细节

在Python 3中使用以下签名编写两个函数：

def strategize(grid: list[list[int]], store: object) -> int

def interpret(grid: list[list[int]], moves: tuple(int, int), store: dict) -> None

strategize被称为每一轮，并且应该返回0、1或2。
- grid是可能支付的3x3网格。
- store是一个空的数据集，可以存储任何您想要存储的信息。
在每一轮之后都会调用interpret。
- moves是一个包含(your_move，opponents_move)的元组

将您的代码放在您的答案中的第一个代码块中，这样控制器就可以轻松地拉进您的机器人。

示例机器人

“奈伊夫”选择了平均支出最高的行。

def strategize(grid, store):
    sums = [(sum(x), i) for (i, x) in enumerate(grid)]
    return max(sums)[1]

def interpret(grid, moves, store):
    pass

“随机”选一列。

import random

def strategize(grid, store):
    return random.randint(0, 2)

def interpret(grid, moves, store):
    pass

规则

不要通过直接干扰对手来作弊(通过全局变量等)。
您的功能应该相对较快地执行--越快越好。
您可以提交多个条目。

控制器，竞技场

控制器可在https://github.com/Nucaranlaeg/KOTH-random-prisoner.

上使用

该控制器主要是从https://github.com/jthistle/KOTH-counting中改编而来。

本文还提供了几个示例机器人，以演示如何使用它。

arena.py是我用来计算最后分数的工具。它让每一个机器人互相对决。

update.py将从竞赛页面中获取所有提交的机器人。

使用标志--c或-constant将导致游戏的玩，而不是随机的网格之间的回合，纯粹是为了兴趣。

当前结果

By score:
1: Blendo with 12237.8 points
2: The Student with 11791.8 points
3: Naiive with 11728.2 points
4: Analyst with 11591.9 points
5: Min-Maxer with 11155.3 points
6: Rafaam with 11041.1 points
7: Naive Variation with 10963.1 points
8: Villain with 10853.6 points
9: Gradient-Mehscent with 10809.8 points
10: Gentleman with 10715.1 points
11: WhatDoYouExpect with 10676.6 points
12: Thief with 9687.6 points
13: Minimum Maximizer with 9656.0 points
14: HermitCrab with 9654.5 points
15: crab with 9001.9 points
16: Investigator with 8698.0 points
17: Random with 8653.3 points

By wins:
1: The Student with 15.2/16 wins
2: Blendo with 14.8/16 wins
3: Rafaam with 14.1/16 wins
4: WhatDoYouExpect with 13.9/16 wins
5: Naiive with 11.3/16 wins
6: Analyst with 10.8/16 wins
7: Min-Maxer with 10.4/16 wins
8: Naive Variation with 8.1/16 wins
9: Villain with 8.1/16 wins
10: HermitCrab with 7.3/16 wins
11: crab with 6.4/16 wins
12: Thief with 5.1/16 wins
13: Minimum Maximizer with 3.9/16 wins
14: Gradient-Mehscent with 1.9/16 wins
15: Investigator with 1.6/16 wins
16: Random with 1.5/16 wins
17: Gentleman with 1.5/16 wins

Combined leaderboard (fewer pts = better):
1: Blendo  (3 pts)
1: The Student  (3 pts)
3: Naiive  (8 pts)
4: Rafaam  (9 pts)
5: Analyst  (10 pts)
6: Min-Maxer  (12 pts)
7: Naive Variation  (15 pts)
7: WhatDoYouExpect  (15 pts)
9: Villain  (17 pts)
10: Gradient-Mehscent  (23 pts)
11: Thief  (24 pts)
11: HermitCrab  (24 pts)
13: Minimum Maximizer  (26 pts)
13: crab  (26 pts)
15: Gentleman  (27 pts)
16: Investigator  (31 pts)
17: Random  (33 pts)

king-of-the-hill

python

回答 15

Code Golf用户

发布于 2021-06-07 21:27:58

蟹

螃蟹想让每个人都死。当其他人都不快乐的时候，螃蟹是快乐的。是的螃蟹回来了。

def strategize(grid, store):
    aa = grid[0][0] #aa means '0-0'.
    ab = grid[1][0] #and similarly for the others.
    ac = grid[2][0]
    ba = grid[0][1]
    bb = grid[1][1] #I wish I had a macro to do this.
    bc = grid[2][1]
    ca = grid[0][2]
    cb = grid[1][2]
    cc = grid[2][2]
    a = (aa * ab * ac)/3 # a, b, and c are the respective averages.
    b = (ba * bb * bc)/3
    c = (ca * cb * cc)/3
    if a <= min(b, c):
        return 0
    if b <= min(a, c):
        return 1
    return 2

def interpret(grid, moves, store):
    pass

具体来说，它选择了对对手伤害最大的价值。

票数 5

Code Golf用户

发布于 2021-06-07 07:15:28

分析师

def strategize(grid, store = None):
    nonzero = [a for a in range(3)]
    if len([a for a in grid if not 0 in a]):
        maximum = max(nonzero, key=lambda index: sum(grid[index]) if not 0 in grid[index] else 0)
        if max(grid[maximum]) > 5:
            return maximum
        else:
            return max(nonzero, key=lambda arr: max(grid[arr]))
    return __import__('random').randint(0, 2)
def interpret(grid, moves, store):
    pass

用样例测试用例在线试用

分析师是一种机器人，它试图从非零的可能性中找到最高的平均值，如果最大值不够高，那么就找出最大值最大的指数。如果所有子列表都有零，请随机选择。

票数 4

Code Golf用户

发布于 2021-06-07 12:21:12

WhatDoYouExpect

def strategize(grid, store):
    expected = [0] * 3
    for my_play in range(3):
        for opponent_play in range(3):
            expected[my_play] += grid[my_play][opponent_play] - grid[opponent_play][my_play]
    return expected.index(max(expected))

def interpret(grid, moves, store):
    pass

在网上试试！

始终以最大期望值执行移动操作。

票数 4

页面原文内容由Code Golf提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codegolf.stackexchange.com/questions/229155

复制

相似问题

问随机囚徒三引理- Python 3 KOTH
EN

示例

赢

技术细节

示例机器人

规则

控制器，竞技场

控制器可在https://github.com/Nucaranlaeg/KOTH-random-prisoner.

当前结果

回答 15

Code Golf用户

蟹

Code Golf用户

分析师

Code Golf用户

WhatDoYouExpect

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问随机囚徒三引理- Python 3 KOTHEN

示例

赢

技术细节

示例机器人

规则

控制器，竞技场

控制器可在https://github.com/Nucaranlaeg/KOTH-random-prisoner.

当前结果

回答 15

Code Golf用户

蟹

Code Golf用户

分析师

Code Golf用户

WhatDoYouExpect

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问随机囚徒三引理- Python 3 KOTH
EN