日拱一卒，伯克利Python入门项目，手把手教你写个简单ai

TechFlow-承志

发布于 2022-08-26 14:42:56

7070

发布于 2022-08-26 14:42:56

文章被收录于专栏：TechFlow

作者 | 梁唐

大家好，日拱一卒，我是梁唐。

咱们继续来聊聊伯克利的CS61A，这次分享的是这门课的第一个project。虽然说是project，但其实代码量不大。难度也不高，需要了解一点Python的基础语法和基本的函数式编程的思想。如果对于函数式编程还不太熟悉的，可以去历史记录里回看一下上一篇文章。

整个项目的代码量并不大，但代码质量很高，非常严谨符合规范。非常适合新手学习，既可以加深理解，也可以提高代码质量和设计规范。

由于原文档为英文，对于一些小伙伴来说可能比较吃力。所以我把它做了一个翻译，有些地方加上了一些说明。想要查看完整的代码+实现的同学可以点击【阅读原文】访问我的GitHub仓库。

项目原始文档：https://inst.eecs.berkeley.edu/~cs61a/sp18/proj/hog/

简介

我们需要实现一个小游戏Hog的核心代码，Hog是两名玩家轮流掷骰子拼点数的比拼游戏。玩家每次可以选择掷0至10颗骰子，不同的策略可以获取不同的积分，积分先到100的玩家获胜。

计算骰子点数有以下规则：

Pig Out：如果玩家掷出的骰子当中有骰子是1点，那么该回合只能获取1点，否则获取所有骰子点数之和
- 案例1：玩家掷出7枚骰子，其中第三枚为1点，玩家此回合获取1分
- 案例2：玩家掷出4枚骰子，均为3点，玩家此回合获取12分
Free Bacon：当玩家选择掷出0枚骰子时，触发Free Bacon规则。获取积分为对手玩家积分个位和十位分差绝对值再加2。
- 案例1：对手有42分，free bacon可以获得|4-2|+2=4分
- 案例2：对手有48分，free bacon可以获得|4-8|+2=6分
- 案例3：对手有7分，free bacon可以获得|0-7|+2=9分
Swine Swap：当两名玩家积分均大于1，并且其中一名为另外一名倍数时，两人积分互换
- 案例1：玩家0当前有37分，对手有92分，此回合获得9分之后变成了46分，92是46的两倍。触发Swine Swap，于是两人分数互换
- 案例2：玩家0当前有91分，对手有37分，此回合获得20分，积分变成111。但由于111是37的三倍，于是触发Swine Swap，两人积分互换，玩家1拥有超过100分，玩家1获胜

整个项目的框架已经搭好，只需要我们按照要求实现其中一些具体的函数。

最后当我们实现完成所有功能以及策略，可以和指定的策略进行比拼，统计获胜率。全项目一共有三个阶段11题，可以理解成一共有11个函数需要我们实现。

每一个阶段完成之后，都会有一个阶段性成果。难度不大，但很有意思，非常推荐刚入门的同学练习。

阶段1

阶段1主要实现游戏功能的主体，完成之后我们可以通过项目里提供的gui进行双人比赛。

Problem 1：

实现roll_dice函数，roll_dice函数拥有两个参数，第一个参数是一个整数，表示投掷的次数。第二个参数是一个函数，表示骰子的类型。返回投掷num_rolls次之后获得的积分，当出现Pig Out获得1分，否则为骰子点数之和。

调用dice()函数获取骰子掷出的点数，必须要投掷num_rolls次骰子，即使中途出现了Pig Out。

def roll_dice(num_rolls, dice=six_sided):
    """Simulate rolling the DICE exactly NUM_ROLLS > 0 times. Return the sum of
    the outcomes unless any of the outcomes is 1. In that case, return 1.

    num_rolls:  The number of dice rolls that will be made.
    dice:       A function that simulates a single dice roll outcome.
    """
    # These assert statements ensure that num_rolls is a positive integer.
    assert type(num_rolls) == int, 'num_rolls must be an integer.'
    assert num_rolls > 0, 'Must roll at least once.'
    # BEGIN PROBLEM 1
    "*** YOUR CODE HERE ***"
    # END PROBLEM 1

我们只需要使用循环重复执行num_rolls次，然后判断一下当中有没有出现1即可。

def roll_dice(num_rolls, dice=six_sided):
    """Simulate rolling the DICE exactly NUM_ROLLS > 0 times. Return the sum of
    the outcomes unless any of the outcomes is 1. In that case, return 1.

    num_rolls:  The number of dice rolls that will be made.
    dice:       A function that simulates a single dice roll outcome.
    """
    # These assert statements ensure that num_rolls is a positive integer.
    assert type(num_rolls) == int, 'num_rolls must be an integer.'
    assert num_rolls > 0, 'Must roll at least once.'
    # BEGIN PROBLEM 1
    "*** YOUR CODE HERE ***"
    ret = 0
    pigout = False
    for _ in range(num_rolls):
        score = dice()
        if score == 1:
            pigout = True
            continue
        ret += score
    return 1 if pigout else ret
    # END PROBLEM 1

Problem 2：

实现free_bacon函数，返回当掷出0颗骰子时获取的点数。输入为对手玩家的分数，可以假设分数一定小于100，如果分数小于10，那么十位视作0。

def free_bacon(score):
    """Return the points scored from rolling 0 dice (Free Bacon).

    score:  The opponent's current score.
    """
    assert score < 100, 'The game should be over.'
    # BEGIN PROBLEM 2
    "*** YOUR CODE HERE ***"
    # END PROBLEM 2

计算出分数的十位和个位，作差之后求绝对值，再加上2即可。

def free_bacon(score):
    """Return the points scored from rolling 0 dice (Free Bacon).

    score:  The opponent's current score.
    """
    assert score < 100, 'The game should be over.'
    # BEGIN PROBLEM 2
    "*** YOUR CODE HERE ***"
    return abs(score // 10 - score % 10) + 2
    # END PROBLEM 2

Problem 3:

实现take_turn函数，给定num_rolls和dice，返回玩家此回合获得的分数。

你需要先实现free_bacon函数，你实现的take_turn函数当中必须要调用roll_dice和free_bacon函数。

def take_turn(num_rolls, opponent_score, dice=six_sided):
    """Simulate a turn rolling NUM_ROLLS dice, which may be 0 (Free Bacon).
    Return the points scored for the turn by the current player.

    num_rolls:       The number of dice rolls that will be made.
    opponent_score:  The total score of the opponent.
    dice:            A function that simulates a single dice roll outcome.
    """
    # Leave these assert statements here; they help check for errors.
    assert type(num_rolls) == int, 'num_rolls must be an integer.'
    assert num_rolls >= 0, 'Cannot roll a negative number of dice in take_turn.'
    assert num_rolls <= 10, 'Cannot roll more than 10 dice.'
    assert opponent_score < 100, 'The game should be over.'
    # BEGIN PROBLEM 3
    "*** YOUR CODE HERE ***"
    # END PROBLEM 3

根据num_rolls判断是投掷骰子，还是执行free bacon。只要看懂了题意，同样没有难度。

def take_turn(num_rolls, opponent_score, dice=six_sided):
    """Simulate a turn rolling NUM_ROLLS dice, which may be 0 (Free Bacon).
    """
    # Leave these assert statements here; they help check for errors.
    assert type(num_rolls) == int, 'num_rolls must be an integer.'
    assert num_rolls >= 0, 'Cannot roll a negative number of dice in take_turn.'
    assert num_rolls <= 10, 'Cannot roll more than 10 dice.'
    assert opponent_score < 100, 'The game should be over.'
    # BEGIN PROBLEM 3
    "*** YOUR CODE HERE ***"
    if num_rolls == 0:
        return free_bacon(opponent_score)
    else:
        return roll_dice(num_rolls, dice)
    # END PROBLEM 3

Problem 4：

实现is_swap函数，返回当前玩家的分数是否会触发交换

def is_swap(score0, score1):
    """Return whether one of the scores is an integer multiple of the other."""
    # BEGIN PROBLEM 4
    "*** YOUR CODE HERE ***"
    # END PROBLEM 4

考虑分数为0的情况，分数为0时不应该触发is_swap。

判断两个人的分数是否成倍数关系即可：

def is_swap(score0, score1):
    """Return whether one of the scores is an integer multiple of the other."""
    # BEGIN PROBLEM 4
    "*** YOUR CODE HERE ***"
    if score0 <= 1 or score1 <= 1:
        return False
    return score0 % score1 == 0 or score1 % score0 == 0
    # END PROBLEM 4

Problem 5：

实现play函数，它模拟了一整局游戏。玩家轮流行动，每次行动时都会使用它们指定的策略。如玩家0，使用的策略就是strategy0，直到有玩家分数达到目标goal。当游戏结束时，按顺序返回两名玩家的分数，玩家0在先，玩家1在后。

提示：

必须要调用刚刚开发的函数，需要调用take_turn并传入对应的三个参数
每一回合只能调用一次take_turn
考虑所有边界情况
当回合结束时，你可以调用函数other，获取下一名玩家
你可以无视say参数
strategy0和strategy1本身都是函数，传入当前玩家和对手玩家的分数之后，它会返回玩家需要投掷的骰子数量。每回合只能调用strategy一次。

def play(strategy0, strategy1, score0=0, score1=0, dice=six_sided,
         goal=GOAL_SCORE, say=silence):
    """Simulate a game and return the final scores of both players, with Player
    0's score first, and Player 1's score second.

    A strategy is a function that takes two total scores as arguments (the
    current player's score, and the opponent's score), and returns a number of
    dice that the current player will roll this turn.

    strategy0:  The strategy function for Player 0, who plays first.
    strategy1:  The strategy function for Player 1, who plays second.
    score0:     Starting score for Player 0
    score1:     Starting score for Player 1
    dice:       A function of zero arguments that simulates a dice roll.
    goal:       The game ends and someone wins when this score is reached.
    say:        The commentary function to call at the end of the first turn.
    """
    player = 0  # Which player is about to take a turn, 0 (first) or 1 (second)
    # BEGIN PROBLEM 5
    "*** YOUR CODE HERE ***"
    # END PROBLEM 5

照着题目意思实现逻辑即可，基本上没有难度。

def play(strategy0, strategy1, score0=0, score1=0, dice=six_sided,
         goal=GOAL_SCORE, say=silence):
    """Simulate a game and return the final scores of both players, with Player
        0's score first, and Player 1's score second.
    """
    player = 0  # Which player is about to take a turn, 0 (first) or 1 (second)
    # BEGIN PROBLEM 5
    "*** YOUR CODE HERE ***"
    while True:
        if score0 >= goal or score1 >= goal:
            break
        if player == 0:
            num = strategy0(score0, score1)
            score0 += take_turn(num, score1, dice)
        else:
            num = strategy1(score1, score0)
            score1 += take_turn(num, score0, dice)
        if is_swap(score0, score1):
            score0, score1 = score1, score0
        player = other(player)
    # END PROBLEM 5
    return score0, score1

阶段2

阶段2主要实现一些评论函数，可以在一些关键节点打印出日志给玩家以提示。

类似这种"22 points! That's the biggest gain yet for Player 1."，玩家1打破了记录，获得了22分。

老师给了评论函数的一个例子：

def say_scores(score0, score1):
    """A commentary function that announces the score for each player."""
    print("Player 0 now has", score0, "and Player 1 now has", score1)
    return say_scores

注意这个say_scores函数最后返回了它自己，我们每一轮调用的都是同样的函数。

announce_lead_changes是使用函数式编程返回领先玩家的例子：

def announce_lead_changes(previous_leader=None):
    """Return a commentary function that announces lead changes.

    >>> f0 = announce_lead_changes()
    >>> f1 = f0(5, 0)
    Player 0 takes the lead by 5
    >>> f2 = f1(5, 12)
    Player 1 takes the lead by 7
    >>> f3 = f2(8, 12)
    >>> f4 = f3(8, 13)
    >>> f5 = f4(15, 13)
    Player 0 takes the lead by 2
    """
    def say(score0, score1):
        if score0 > score1:
            leader = 0
        elif score1 > score0:
            leader = 1
        else:
            leader = None
        if leader != None and leader != previous_leader:
            print('Player', leader, 'takes the lead by', abs(score0 - score1))
        return announce_lead_changes(leader)
    return say

在开发之前，你需要先看懂下面这个both函数：

def both(f, g):
    """Return a commentary function that says what f says, then what g says.

    >>> h0 = both(say_scores, announce_lead_changes())
    >>> h1 = h0(10, 0)
    Player 0 now has 10 and Player 1 now has 0
    Player 0 takes the lead by 10
    >>> h2 = h1(10, 6)
    Player 0 now has 10 and Player 1 now has 6
    >>> h3 = h2(6, 18) # Player 0 gets 8 points, then Swine Swap applies
    Player 0 now has 6 and Player 1 now has 18
    Player 1 takes the lead by 12
    """
    def say(score0, score1):
        return both(f(score0, score1), g(score0, score1))
    return say

它接收两个评论函数f和g，并且返回一个全新的评论函数。这个返回的函数中的f和g这两个参数，又是调用f和g之后获得的结果。

这一段涉及函数式编程的嵌套，会显得有些绕，最好根据这个例子理解：

>>> h0 = both(say_scores, announce_lead_changes())
>>> h1 = h0(10, 0)
Player 0 now has 10 and Player 1 now has 0
Player 0 takes the lead by 10
>>> h2 = h1(10, 6)
Player 0 now has 10 and Player 1 now has 6
>>> h3 = h2(6, 18) # Player 0 gets 8 points, then Swine Swap applies
Player 0 now has 6 and Player 1 now has 18
Player 1 takes the lead by 12

Problem 6：

更新的你play函数，使得在每一个回合的结束都调用一下评论函数。评论函数调用之后会返回一个新的评论函数用在下一个回合。

比如说say(score0, score1)将在第一个回合结束时调用，它会返回另外一个评论函数，返回的这个函数需要在第二个回合调用。也就是说每个回合调用的评论函数都是上一个回合返回的。

看懂了say函数之后，我们只需要在play函数当中加上say方法的调用即可。

由于say函数是一个高阶函数，它运行之后会返回一个函数，注意这点，不难实现。

def play(strategy0, strategy1, score0=0, score1=0, dice=six_sided,
         goal=GOAL_SCORE, say=silence):
    """Simulate a game and return the final scores of both players, with Player
    0's score first, and Player 1's score second.
    """
    player = 0  # Which player is about to take a turn, 0 (first) or 1 (second)
    # BEGIN PROBLEM 5
    "*** YOUR CODE HERE ***"
    while True:
        if score0 >= goal or score1 >= goal:
            break
        if player == 0:
            num = strategy0(score0, score1)
            score0 += take_turn(num, score1, dice)
        else:
            num = strategy1(score1, score0)
            score1 += take_turn(num, score0, dice)
        if is_swap(score0, score1):
            score0, score1 = score1, score0
        player = other(player)
        say = say(score0, score1)
    # END PROBLEM 5
    return score0, score1

Problem 7：

实现announce_highest函数，它是一个高阶函数，返回一个评论函数。这个评论函数会在指定玩家单个回合获得新的最大得分时打印日志。要实现这点，它必须要计算当前得分，并且和历史最高得分进行比较。函数的第一个参数who指定了需要跟踪记录的玩家，其他玩家的得分可以忽略。

需要注意，在打印得分时，需要注意单复数。得分大于1时，使用points，只有1分时，使用point。

def announce_highest(who, previous_high=0, previous_score=0):
    """Return a commentary function that announces when WHO's score
    increases by more than ever before in the game.

    >>> f0 = announce_highest(1) # Only announce Player 1 score gains
    >>> f1 = f0(11, 0)
    >>> f2 = f1(11, 1)
    1 point! That's the biggest gain yet for Player 1
    >>> f3 = f2(20, 1)
    >>> f4 = f3(5, 20) # Player 1 gets 4 points, then Swine Swap applies
    19 points! That's the biggest gain yet for Player 1
    >>> f5 = f4(20, 40) # Player 0 gets 35 points, then Swine Swap applies
    20 points! That's the biggest gain yet for Player 1
    >>> f6 = f5(20, 55) # Player 1 gets 15 points; not enough for a new high
    """
    assert who == 0 or who == 1, 'The who argument should indicate a player.'
    # BEGIN PROBLEM 7
    "*** YOUR CODE HERE ***"
    # END PROBLEM 7

这题稍稍有些难，多关注一下注释中的测试样例。

主要思路是在announce_highest中实现一个函数，我这里函数名叫say。然后通过这两个函数的互相调用来实现功能。

如果想不明白可以看一下公开课的视频，老师专门开了一个小节讲这题。

def announce_highest(who, previous_high=0, previous_score=0):
    """Return a commentary function that announces when WHO's score
    increases by more than ever before in the game.
    """
    assert who == 0 or who == 1, 'The who argument should indicate a player.'
    # BEGIN PROBLEM 7
    "*** YOUR CODE HERE ***"
    def say(score0, score1):
        if who == 0:
            gain = score0 - previous_score
            new_score = score0
        else:
            gain = score1 - previous_score
            new_score = score1

        new_high = previous_high
        if gain > previous_high:
            new_high = gain
            print("{} {}! That's the biggest gain yet for Player {}".format(gain, 'points' if gain > 1 else 'point', who))
        return announce_highest(who, new_high, new_score)
    return say
    # END PROBLEM 7

阶段3

在第三个阶段，需要实现一些游戏策略，设计一些简单的ai。

Problem 8：

实现make_averaged函数，它是一个高阶函数。接收一个参数fn，fn是一个函数。make_averaged函数返回一个函数，它接收和fn同样的参数。这个返回的参数和fn不同，它返回调用fn函数num_samples之后结果的均值。

说起来有些拗口，稍微解释一下。即我们要实现一个高阶函数make_averaged，它返回一个函数，我们把这个返回的函数称为r。r函数接收的参数和fn一样，调用r会得到将fn函数调用num_samples次之后的均值。

这里最大的问题在于我们不知道fn会接收什么样的参数，那又怎么来确定r的接收参数格式呢？

这里需要用到一个小技巧，在Python传参的过程当中，当我们不知道具体传入的参数数量时，我们可以把若干个连续的必填参数写成*arg。这里的arg表示一个列表，*arg表示将列表展开。其实*符号后面跟什么名字都行，都表示一个列表名。

作业当中提供了一个例子：

>>> def printed(fn):
...     def print_and_return(*args):
...         result = fn(*args)
...         print('Result:', result)
...         return result
...     return print_and_return
>>> printed_pow = printed(pow)
>>> printed_pow(2, 8)
Result: 256
256
>>> printed_abs = printed(abs)
>>> printed_abs(-10)
Result: 10
10

在这个例子当中，print_and_return函数接收的参数就是*args，它可以接收若干个参数。在print_and_return函数当中，我们又调用了fn函数，将*args又传给了fn。通过这种方式，其实就把fn函数的接收参数通过print_and_return传递了进来。

在本题当中，我们也需要使用类似的方法实现：

def make_averaged(fn, num_samples=1000):
    """Return a function that returns the average value of FN when called.
    """
    # BEGIN PROBLEM 8
    "*** YOUR CODE HERE ***"
    def process(*args):
        tot = 0
        for _ in range(num_samples):
            tot += fn(*args)
        return tot / num_samples
    return process
    # END PROBLEM 8

Problem 9：

实现max_scoring_num_rolls函数，它接收两个参数dice和num_samples，其中dice表示骰子的类型，num_samples表示实验的次数。它返回num_samples实验之后，收益均值最大的骰子的数量。

你需要调用你刚刚实现的make_averaged和roll_dice函数来完成。返回的结果在1到10之间（包括1和10）。当两种情况得分相同时，返回骰子数量少的。比如3个骰子和6个骰子的得分相同，返回3。

只要开发出了make_averaged，基本上没有难度，枚举一下所有骰子的可能，选出平均收益（收益期望）最大的即可。

def max_scoring_num_rolls(dice=six_sided, num_samples=1000):
    """Return the number of dice (1 to 10) that gives the highest average turn
    score by calling roll_dice with the provided DICE over NUM_SAMPLES times.
    Assume that the dice always return positive outcomes.

    >>> dice = make_test_dice(1, 6)
    >>> max_scoring_num_rolls(dice)
    1
    """
    # BEGIN PROBLEM 9
    "*** YOUR CODE HERE ***"
    ret, max_score = 1, 0
    for i in range(1, 11):
        avearge_dice = make_averaged(roll_dice, num_samples)
        score = avearge_dice(i, dice)
        if score > max_score:
            ret, max_score = i, score
    return ret
    # END PROBLEM 9

Problem 10：

实现bacon_strategy函数，这个策略会在投掷0收益最优时尽可能投掷0颗骰子。

具体逻辑为：当投掷0颗骰子的收益大于等参数margin时，投掷0颗骰子，否则投掷num_rolls颗。

def bacon_strategy(score, opponent_score, margin=8, num_rolls=4):
    """This strategy rolls 0 dice if that gives at least MARGIN points, and
    rolls NUM_ROLLS otherwise.
    """
    # BEGIN PROBLEM 10
    gain = free_bacon(opponent_score)
    if gain >= margin:
        return 0
    return num_rolls
    # END PROBLEM 10

Problem 11:

实现swap_strategy策略，这个策略会尽可能利用Swine Swap规则。当投掷0会触发Swine Swap，并且获得很大收益时投掷0。如果投掷0颗收益大于等于参数margin，并且不会触发Swine Swap时也投掷0。否则投掷num_rolls。

def swap_strategy(score, opponent_score, margin=8, num_rolls=4):
    """This strategy rolls 0 dice when it triggers a beneficial swap. It also
    rolls 0 dice if it gives at least MARGIN points. Otherwise, it rolls
    NUM_ROLLS.
    """
    # BEGIN PROBLEM 11
    gain = free_bacon(opponent_score)
    score += gain
    if gain >= margin and not is_swap(score, opponent_score):
        return 0
    if is_swap(score, opponent_score) and opponent_score >= score + margin:
        return 0
    return num_rolls
    # END PROBLEM 11

Problem 12:

附加题，发挥你个人的才智将上述实现的策略进行结合实现终极策略，尽可能和在始终投掷4的ai比拼当中获得更高的胜率。一些建议：

swap_strategy是一个很好的默认策略
获得超过100分没有意义，查看能否通过掷出0、1、2枚骰子获胜，如果领先，可以投掷少一些的骰子降低风险
尝试获得有收益的交换
谨慎选择num_rolls和margin

这题没有标准答案，大家可以自由发挥。提供一个我根据提示实现的版本：

def final_strategy(score, opponent_score):
    """Write a brief description of your final strategy.

    *** YOUR DESCRIPTION HERE ***
    """
    # BEGIN PROBLEM 12
    # 落后时，如果获得1分可以与对手交换，那么就投10个骰子
    if opponent_score > score and is_swap(score+1, opponent_score):
        return 10
    # 领先时，如果获得1分会被交换，返回0
    if score > opponent_score and is_swap(score+1, opponent_score):
        return 0
    # 判断free bacon是否可以获胜
    max_roll = max_scoring_num_rolls(num_samples=100)
    num = swap_strategy(score, opponent_score, 100-score, max_roll)
    if num == 0:
        return 0
    # 不投0颗，是否有获胜可能
    avg_dice = make_averaged(roll_dice, 100)
    for i in range(1, 4):
        gain = avg_dice(i)
        if score + gain >= 100:
            return i
    # 判断投掷0颗是否可以反超，如果领先，则以获得6分以上为期望
    margin = opponent_score - score + 1 if opponent_score > score else 6
    return swap_strategy(score, opponent_score, margin, max_roll)
    # END PROBLEM 12

到这里整个项目就算是开发完了，我们可以调用一下老师提供的工具，计算一下我们开发的策略对战一直投掷4个骰子策略的胜率。