# Python Algorithms - C7 Greedy

Python算法设计篇(7) Chapter 7: Greed is good? Prove it!

It’s not a question of enough, pal. ——Gordon Gekko, Wall Street

[果然贪心我领悟的不够，很多问题我貌似都讲不到点子上，大家将就着看下]

1.匹配问题 matching problem (maximum-weight matching problem)

To be on the safe side, just let me emphasize that this greedy solution would not work in general, with an arbitrary set of weights. The distinct powers of two are key here.

In this case (or the bipartite case, for that matter), greed won’t work in general. However, by some freak coincidence, all the compatibility numbers happen to be distinct powers of two. Now, what happens?

Let’s first consider what a greedy algorithm would look like here and then see why it yields an optimal result. We’ll be building a solution piece by piece—let the pieces be pairs and a partial solution be a set of pairs. Such a partial solution is valid only if no person in it participates in two (or more) of its pairs. The algorithm will then be roughly as follows:

1. List potential pairs, sorted by decreasing compatibility.
2. Pick the first unused pair from the list.
3. Is anyone in the pair already occupied? If so, discard it; otherwise, use it.
4. Are there any more pairs on the list? If so, go to 2.

As you’ll see later, this is rather similar to Kruskal’s algorithm for minimum spanning trees (although that works regardless of the edge weights). It also is a rather prototypical greedy algorithm. Its correctness is another matter. Using distinct powers of two is sort of cheating, because it would make virtually any greedy algorithm work; that is, you’d get an optimal result as long as you could get a valid solution at all. Even though it’s cheating (see Exercise 7-3), it illustrates the central idea here: making the greedy choice is safe. Using the most compatible of the remaining couples will always be at least as good as any other choice.

[原书关于稳定婚姻的扩展知识 EAGER SUITORS AND STABLE MARRIAGES]

There is, in fact, one classical matching problem that can be solved (sort of) greedily: the stable marriage problem. The idea is that each person in a group has preferences about whom he or she would like to marry. We’d like to see everyone married, and we’d like the marriages to be stable, meaning that there is no man who prefers a woman outside his marriage who also prefers him. (To keep things simple, we disregard same-sex marriages and polygamy here.)

There’s a simple algorithm for solving this problem, designed by David Gale and Lloyd Shapley. The formulation is quite gender-conservative but will certainly also work if the gender roles are reversed. The algorithm runs for a number of rounds, until there are no unengaged men left. Each round consists of two steps:

1. Each unengaged man proposes to his favorite of the women he has not yet asked.
2. Each woman is (provisionally) engaged to her favorite suitor and rejects the rest.

This can be viewed as greedy in that we consider only the available favorites (both of the men and women) right now. You might object that it’s only sort of greedy in that we don’t lock in and go straight for marriage; the women are allowed to break their engagement if a more interesting suitor comes along. Even so, once a man has been rejected, he has been rejected for good, which means that we’re guaranteed progress.

To show that this is an optimal and correct algorithm, we need to know that everyone gets married and that the marriages are stable. Once a woman is engaged, she stays engaged (although she may replace her fiancé). There is no way we can get stuck with an unmarried pair, because at some point the man would have proposed to the woman, and she would have (provisionally) accepted his proposal.

How do we know the marriages are stable? Let’s say Scarlett and Stuart are both married but not to each other. Is it possible they secretly prefer each other to their current spouses? No: if so, Stuart would already have proposed to her. If she accepted that proposal, she must later have found someone she liked better; if she rejected it, she would already have a preferable mate.

Although this problem may seem silly and trivial, it is not. For example, it is used for admission to some colleges and to allocate medical students to hospital jobs. There have, in fact, been written entire books (such as those by Donald Knuth and by Dan Gusfield and Robert W. Irwing) devoted to the problem and its variations.

2.背包问题

There are two important cases of the integer knapsack problem—the bounded and unbounded cases. The bounded case assumes we have a fixed number of objects in each category,4 and the unbounded case lets us use as many as we want. Sadly, greed won’t work in either case. In fact, these are both unsolved problems, in the sense that no polynomial algorithms are known to solve them. There is hope, however. As you’ll see in the next chapter, we can use dynamic programming to solve the problems in pseudopolynomial time, which may be good enough in many important cases. Also, for the unbounded case, it turns out that the greedy approach ain’t half bad! Or, rather, it’s at least half good, meaning that we’ll never get less than half the optimum value. And with a slight modification, you can get as good results for the bounded version, too. This concept of greedy approximation is discussed in more detail in Chapter 11.

3.哈夫曼编码

```from heapq import heapify, heappush, heappop
from itertools import count

def huffman(seq, frq):
num = count()
trees = list(zip(frq, num, seq))            # num ensures valid ordering
heapify(trees)                              # A min-heap based on freq
while len(trees) > 1:                       # Until all are combined
fa, _, a = heappop(trees)               # Get the two smallest trees
fb, _, b = heappop(trees)
n = next(num)
heappush(trees, (fa+fb, n, [a, b]))     # Combine and re-add them
# print trees
return trees[0][-1]

seq = "abcdefghi"
frq = [4, 5, 6, 9, 11, 12, 15, 16, 20]
print huffman(seq, frq)
# [['i', [['a', 'b'], 'e']], [['f', 'g'], [['c', 'd'], 'h']]]```

consider how each leaf contributes to the sum over all nodes: the leaf weight occurs as a summand once in each of its ancestor nodes—which means that the sum is exactly the same! That is, sum(weight(node) for node in nodes) is exactly the same as sum(depth(leaf)*weight(leaf) for leaf in leaves).

[哈夫曼树问题的一个扩展就是最优二叉搜索树问题，后者可以用动态规划算法来求解，感兴趣的话可以阅读算法导论中动态规划部分内容]

4.最小生成树

[如果对最小生成树问题的历史感兴趣的话作者推荐看这篇论文`“On the History of the Minimum Spanning Tree Problem,” by Graham and Hell`]

Kruskal算法

Prim算法

```#A Naïve Implementation of Kruskal’s Algorithm
def naive_find(C, u):                           # Find component rep.
while C[u] != u:                            # Rep. would point to itself
u = C[u]
return u

def naive_union(C, u, v):
u = naive_find(C, u)                        # Find both reps
v = naive_find(C, v)
C[u] = v                                    # Make one refer to the other

def naive_kruskal(G):
E = [(G[u][v],u,v) for u in G for v in G[u]]
T = set()                                   # Empty partial solution
C = {u:u for u in G}                        # Component reps
for _, u, v in sorted(E):                   # Edges, sorted by weight
if naive_find(C, u) != naive_find(C, v):
T.add((u, v))                       # Different reps? Use it!
naive_union(C, u, v)                # Combine components
return T

G = {
0: {1:1, 2:3, 3:4},
1: {2:5},
2: {3:2},
3: set()
}
print list(naive_kruskal(G)) #[(0, 1), (2, 3), (0, 2)]```

```#Kruskal’s Algorithm
def find(C, u):
if C[u] != u:
C[u] = find(C, C[u])                    # Path compression
return C[u]

def union(C, R, u, v):
u, v = find(C, u), find(C, v)
if R[u] > R[v]:                             # Union by rank
C[v] = u
else:
C[u] = v
if R[u] == R[v]:                            # A tie: Move v up a level
R[v] += 1

def kruskal(G):
E = [(G[u][v],u,v) for u in G for v in G[u]]
T = set()
C, R = {u:u for u in G}, {u:0 for u in G}   # Comp. reps and ranks
for _, u, v in sorted(E):
if find(C, u) != find(C, v):
union(C, R, u, v)
return T

G = {
0: {1:1, 2:3, 3:4},
1: {2:5},
2: {3:2},
3: set()
}
print list(kruskal(G)) #[(0, 1), (2, 3), (0, 2)]```

Prim算法不断地添加新的边(也可以说是一个新的顶点)，一旦我们加入了一条新的边，可能会导致某些原来的边缘节点到生成树的距离更加近了，所以我们要更新一下它们的距离值，然后重新调整下排序，那怎么修改距离值呢？我们可以先找到原来的那个节点，然后再修改它的距离值接着重新调整堆，但是这么做实在是太麻烦了！这里有一个巧妙的技巧就是直接向堆中插入新的距离值的节点！为什么可以呢？因为插入的新节点B的距离值比原来的节点A的距离值小，那么Prim算法添加顶点的时候肯定是先弹出堆中的节点B，后面如果弹出节点A的话，因为这个节点已经添加进入了，直接忽略就行了，也就是说我们这么做不仅很简单，而且并没有把原来的问题搞砸了。下面是作者给出的详细解释，总共三点，第三点是重复的添加不会影响算法的渐近时间复杂度

• We’re using a priority queue, so if a node has been added multiple times, by the time we remove one of its entries, it will be the one with the lowest weight (at that time), which is the one we want.

• We make sure we don’t add the same node to our traversal tree more than once. This can be ensured by a constant-time membership check. Therefore, all but one of the queue entries for any given node will be discarded.

• The multiple additions won’t affect asymptotic running time

[重新添加一次权值减小了的节点就相当于是松弛(或者说是隐含了松弛操作在里面)，Re-adding a node with a lower weight is equivalent to a relaxation，这两种方式是可以相互交换的，后面图算法中作者在实现Dijkstra算法时使用的是relax，那其实我们还可以实现带relex的Prim和不带relax的Dijkstra]

```from heapq import heappop, heappush

def prim(G, s):
P, Q = {}, [(0, None, s)]
while Q:
_, p, u = heappop(Q)
if u in P: continue
P[u] = p
for v, w in G[u].items():
heappush(Q, (w, u, v)) #weight, predecessor node, node
return P

G = {
0: {1:1, 2:3, 3:4},
1: {0:1, 2:5},
2: {0:3, 1:5, 3:2},
3: {2:2, 0:4}
}
print prim(G, 0) # {0: None, 1: 0, 2: 0, 3: 2}```

[扩展知识，另一个角度来看最小生成树 A SLIGHTLY DIFFERENT PERSPECTIVE]

In their historical overview of minimum spanning tree algorithms, Ronald L. Graham and Pavol Hell outline three algorithms that they consider especially important and that have played a central role in the history of the problem. The first two are the algorithms that are commonly attributed to Kruskal and Prim (although the second one was originally formulated by Vojtěch Jarník in 1930), while the third is the one initially described by Borůvka. Graham and Hell succinctly explain the algorithms as follows. A partial solution is a spanning forest, consisting of a set of fragments (components, trees). Initially, each node is a fragment. In each iteration, edges are added, joining fragments, until we have a spanning tree.

Algorithm 1: Add a shortest edge that joins two different fragments.

Algorithm 2: Add a shortest edge that joins the fragment containing the root to another fragment.

Algorithm 3: For every fragment, add the shortest edge that joins it to another fragment.

For algorithm 2, the root is chosen arbitrarily at the beginning. For algorithm 3, it is assumed that all edge weights are different to ensure that no cycles can occur. As you can see, all three algorithms are based on the same fundamental fact—that the shortest edge over a cut is safe. Also, in order to implement them efficiently, you need to be able to find shortest edges, detect whether two nodes belong to the same fragment, and so forth (as explained for algorithms 1 and 2 in the main text). Still, these brief explanations can be useful as a memory aid or to get the bird’s-eye perspective on what’s going on.

5.Greed Works. But When?

(1)Keeping Up with the Best

This is what Kleinberg and Tardos (in Algorithm Design) call staying ahead. The idea is to show that as you build your solution, one step at a time, the greedy algorithm will always have gotten at least as far as a hypothetical optimal algorithm would have. Once you reach the finish line, you’ve shown that greed is optimal.

(2)No Worse Than Perfect

This is a technique I used in showing the greedy choice property for Huffman’s algorithm. It involves showing that you can transform a hypothetical optimal solution to the greedy one, without reducing the quality. Kleinberg and Tardos call this an exchange argument.

(3)Staying Safe

This is where we started: to make sure a greedy algorithm is correct, we must make sure each greedy step along the way is safe. One way of doing this is the two-part approach of showing (1) the greedy choice property, that is, that a greedy choice is compatible with optimality, and (2) optimal substructure, that is, that the remaining subproblem is a smaller instance that must also be solved optimally.

[扩展知识：算法导论中还介绍了贪心算法的内在原理，也就是拟阵，贪心算法一般都是求这个拟阵的最大独立子集，方法就是从一个空的独立子集开始，从一个已经经过排序的序列中依次取出一个元素，尝试添加到独立子集中，如果新元素加入之后的集合仍然是一个独立子集的话那就加入进去，这样就形成了一个更大的独立子集，待遍历完整个序列时我们就得到最大的独立子集。拟阵的内容比较难，感兴趣不妨阅读下算法导论然后证明一两道练习题挑战下，嘻嘻]

```#贪心算法的框架 [拟阵的思想]
def greedy(E, S, w):
T = []                                      # Emtpy, partial solution
for e in sorted(E, key=w):                  # Greedily consider elements
TT = T + [e]                            # Tentative solution
if TT in S: T = TT                      # Is it valid? Use it!
return T```

0 条评论

## 相关文章

3675

### 第一章 | 使用python机器学习

python经常作为机器学习的首选，有一个统计，50%以上的机器学习开发者使用python。在学习机器学习之前需要熟悉以下几个python模块： numpy P...

3935

2614

1513

### python数据分析师面试题选

python数据分析部分 1. 如何利用SciKit包训练一个简单的线性回归模型 利用linear_model.LinearRegression()函数 #...

5576

1752

3344

3395

### hihoCoder #1142 : 三分求极值

#1142 : 三分·三分求极值 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 这一次我们就简单一点了，题目在此： ? 在直...

3499

1211