挑战状态:开放
评论,打开公关,或者对我大喊大叫,如果我错过了你的机器人。
囚犯困境..。有三个选择。疯了,嗯?
这是我们的收益矩阵。球员A在左边,B在顶部
A,B| C | N | D
---|---|---|---
C |3,3|4,1|0,5
N |1,4|2,2|3,2
D |5,0|2,3|1,1回报矩阵是设计好的,所以双方最好总是合作,但你可以(通常)通过选择中立或叛逃。
下面是一些(竞争的)示例机器人。
# turns out if you don't actually have to implement __init__(). TIL!
class AllC:
def round(self, _): return "C"
class AllN:
def round(self, _): return "N"
class AllD:
def round(self, _): return "D"
class RandomBot:
def round(self, _): return random.choice(["C", "N", "D"])
# Actually using an identically-behaving "FastGrudger".
class Grudger:
def __init__(self):
self.history = []
def round(self, last):
if(last):
self.history.append(last)
if(self.history.count("D") > 0):
return "D"
return "C"
class TitForTat:
def round(self, last):
if(last == "D"):
return "D"
return "C"你的机器人是Python3类。每个游戏都会创建一个新的实例,每一轮都调用round(),由您的对手从最后一轮中选择(如果是第一轮,则没有)。
一个月后获奖者将得到50英镑的赏金。
C、N或D以外的任何响应都将被静默地视为N。如果有人需要的话,我会拼凑一个API。
27 bots, 729 games.
name | avg. score/round
----------------|-------------------
PatternFinder | 3.152
DirichletDice2 | 3.019
EvaluaterBot | 2.971
Ensemble | 2.800
DirichletDice | 2.763
Shifting | 2.737
FastGrudger | 2.632
Nash2 | 2.574
HistoricAverage | 2.552
LastOptimalBot | 2.532
Number6 | 2.531
HandshakeBot | 2.458
OldTitForTat | 2.411
WeightedAverage | 2.403
TitForTat | 2.328
AllD | 2.272
Tetragram | 2.256
Nash | 2.193
Jade | 2.186
Useless | 2.140
RandomBot | 2.018
CopyCat | 1.902
TatForTit | 1.891
NeverCOOP | 1.710
AllC | 1.565
AllN | 1.446
Kevin | 1.322发布于 2018-11-12 19:24:44
class EvaluaterBot:
def __init__(self):
self.c2i = {"C":0, "N":1, "D":2}
self.i2c = {0:"C", 1:"N", 2:"D"}
self.history = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
self.last = [None, None]
def round(self, last):
if self.last[0] == None:
ret = 2
else:
# Input the latest enemy action (the reaction to my action 2 rounds ago)
# into the history
self.history[self.last[0]][self.c2i[last]] += 1
# The enemy will react to the last action I did
prediction,_ = max(enumerate(self.history[self.last[1]]), key=lambda l:l[1])
ret = (prediction - 1) % 3
self.last = [self.last[1], ret]
return self.i2c[ret]战胜所有以前提交的机器人,除了(可能)随机机器人(但它可能有优势,因为它在平局中选择D,而D应该是最优的),并对自己进行一个持续的平局。
发布于 2018-11-12 18:04:52
class TatForTit:
def round(self, last):
if(last == "C"):
return "N"
return "D"这个机器人将交替选择D,N,N,而TitForTat交替C,D,C,D,D,我认为这对TitForTat来说可能是最佳的选择。显然,它可以改进,以发现一个非TFT对手,并采取其他策略,但我只是瞄准原来的赏金。
发布于 2018-11-12 21:28:37
这个机器人在大学上了一堂博弈论课,但是很懒,没有去他们讨论迭代游戏的那堂课。所以他只玩单局混合纳什均衡。结果表明,1/5,2/5,2/5是收益的混合NE。
class NashEquilibrium:
def round(self, _):
a = random.random()
if a <= 0.2:
return "C"
elif a <= 0.6:
return "N"
else:
return "D" 这个机器人从他懒惰的弟弟那里学到了一两课。他懒惰的哥哥的问题是他没有利用固定的策略。这个版本检查对手是否是一个固定的玩家或对手,并相应地进行游戏,否则就会进行规则的纳什均衡。
它唯一的缺点是,它平均每场2.2分对自己的比赛。
class NashEquilibrium2:
def __init__(self):
self.opphistory = [None, None, None]
self.titfortatcounter = 0
self.titfortatflag = 0
self.mylast = "C"
self.constantflag = 0
self.myret = "C"
def round(self, last):
self.opphistory.pop(0)
self.opphistory.append(last)
# check if its a constant bot, if so exploit
if self.opphistory.count(self.opphistory[0]) == 3:
self.constantflag = 1
if last == "C":
self.myret = "D"
elif last == "N":
self.myret = "C"
elif last == "D":
self.myret = "N"
# check if its a titfortat bot, if so exploit
# give it 2 chances to see if its titfortat as it might happen randomly
if self.mylast == "D" and last == "D":
self.titfortatcounter = self.titfortatcounter + 1
if self.mylast == "D" and last!= "D":
self.titfortatcounter = 0
if self.titfortatcounter >= 3:
self.titfortatflag = 1
if self.titfortatflag == 1:
if last == "C":
self.myret = "D"
elif last == "D":
self.myret = "N"
elif last == "N":
# tit for tat doesn't return N, we made a mistake somewhere
self.titfortatflag = 0
self.titfortatcounter = 0
# else play the single game nash equilibrium
if self.constantflag == 0 and self.titfortatflag == 0:
a = random.random()
if a <= 0.2:
self.myret = "C"
elif a <= 0.6:
self.myret = "N"
else:
self.myret = "D"
self.mylast = self.myret
return self.myrethttps://codegolf.stackexchange.com/questions/175794
复制相似问题