# Python Algorithms - C6 Divide and Combine and Conquer

Python算法设计篇(6) Chapter 6: Divide and Combine and Conquer

Divide and rule, a sound motto; Unite and lead, a better one. ——Johann Wolfgang von Goethe, Gedichte

1.平衡性是树形问题的关键

The recurrences tell you something about the performance involved, the induction gives you a tool for understanding how the algorithms work, and the recursive traversal (DFS in trees) is a raw skeleton for the algorithms.

(1)T(n)=T(n-1)+T(1)+n

(2)T(n)=2T(n/2)+n

2.典型的分治法

```# Pseudocode(ish)
def divide_and_conquer(S, divide, combine):
if len(S) == 1: return S
L, R = divide(S)
A = divide_and_conquer(L, divide, combine)
B = divide_and_conquer(R, divide, combine)
return combine(A, B)```

Python中`bisect`模块也正是利用了二分查找策略，其中方法`bisect`的作用是返回要找到元素的位置，`bisect_left`是其左边的那个位置，而`bisect_right``bisect`的作用是一样的，函数`insort`也是这样设计的。

```from bisect import bisect
a = [0, 2, 3, 5, 6, 7, 8, 8, 9]
print bisect(a, 5) #4
from bisect import bisect_left, bisect_right
print bisect_left(a, 5) #3
print bisect_right(a, 5) #4```

```class Node:
lft = None
rgt = None
def __init__(self, key, val):
self.key = key
self.val = val

def insert(node, key, val):
if node is None: return Node(key, val)      # Empty leaf: Add node here
if node.key == key: node.val = val          # Found key: Replace val
elif key < node.key:                        # Less than the key?
node.lft = insert(node.lft, key, val)   # Go left
else:                                       # Otherwise...
node.rgt = insert(node.rgt, key, val)   # Go right
return node

def search(node, key):
if node is None: raise KeyError             # Empty leaf: It's not here
if node.key == key: return node.val         # Found key: Return val
elif key < node.key:                        # Less than the key?
return search(node.lft, key)            # Go left
else:                                       # Otherwise...
return search(node.rgt, key)            # Go right

class Tree:                                     # Simple wrapper
root = None
def __setitem__(self, key, val):
self.root = insert(self.root, key, val)
def __getitem__(self, key):
return search(self.root, key)
def __contains__(self, key):
try: search(self.root, key)
except KeyError: return False
return True```

3.顺序统计量

[扩展知识：在Python中如果你需要求前 k 小或者前 k 大的元素，可以使用`heapq`模块中的`nsmallest`或者`nlargest`函数，如果 k 很小的话这种方式会好些，但是如果 k 很大的话，不如直接去调用`sort`函数]

```#A Straightforward Implementation of Partition and Select
def partition(seq):
pi, seq = seq[0], seq[1:]                   # Pick and remove the pivot
lo = [x for x in seq if x <= pi]            # All the small elements
hi = [x for x in seq if x > pi]             # All the large ones
return lo, pi, hi                           # pi is "in the right place"

def select(seq, k):
lo, pi, hi = partition(seq)                 # [<= pi], pi, [> pi]
m = len(lo)
if m == k: return pi                        # We found the kth smallest
elif m < k:                                 # Too far to the left
return select(hi, k-m-1)                # Remember to adjust k
else:                                       # Too far to the right
return select(lo, k)                    # Just use original k here

seq = [3, 4, 1, 6, 3, 7, 9, 13, 93, 0, 100, 1, 2, 2, 3, 3, 2]
print partition(seq) #([1, 3, 0, 1, 2, 2, 3, 3, 2], 3, [4, 6, 7, 9, 13, 93, 100])
print select([5, 3, 2, 7, 1], 3) #5
print select([5, 3, 2, 7, 1], 4) #7
ans = [select(seq, k) for k in range(len(seq))]
seq.sort()
print ans == seq #True```

[我还未完全理解，算法导论上也有相应的介绍，感兴趣不妨去阅读下]

It turns out guaranteeing that the pivot is even a small percentage into the sequence (that is, not at either end, or a constant number of steps from it) is enough for the running time to be linear. In 1973, a group of algorists (Blum, Floyd, Pratt, Rivest, and Tarjan) came up with a version of the algorithm that gives exactly this kind of guarantee.

The algorithm is a bit involved, but the core idea is simple enough: first divide the sequence into groups of five (or some other small constant). Find the median in each, using (for example) a simple sorting algorithm. So far, we’ve used only linear time. Now, find the median among these medians, using the linear selection algorithm recursively. (This will work, because the number of medians is smaller than the size of the original sequence—still a bit mind-bending.) The resulting value is a pivot that is guaranteed to be good enough to avoid the degenerate recursion—use it as a pivot in your selection.

In other words, the algorithm is used recursively in two ways: first, on the sequence of medians, to find a good pivot, and second, on the original sequence, using this pivot.

While the algorithm is important to know about for theoretical reasons (because it means selection can be done in guaranteed linear time), you’ll probably never actually use it in practice.

3.二分排序

```def quicksort(seq):
if len(seq) <= 1: return seq                # Base case
lo, pi, hi = partition(seq)                 # pi is in its place
return quicksort(lo) + [pi] + quicksort(hi) # Sort lo and hi separately

seq = [7, 5, 0, 6, 3, 4, 1, 9, 8, 2]
print quicksort(seq) #[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]```

```# Mergesort, repeated from Chapter 3 (with some modifications)
def mergesort(seq):
mid = len(seq)//2                           # Midpoint for division
lft, rgt = seq[:mid], seq[mid:]
if len(lft) > 1: lft = mergesort(lft)       # Sort by halves
if len(rgt) > 1: rgt = mergesort(rgt)
res = []
while lft and rgt:                          # Neither half is empty
if lft[-1] >= rgt[-1]:                  # lft has greatest last value
res.append(lft.pop())               # Append it
else:                                   # rgt has greatest last value
res.append(rgt.pop())               # Append it
res.reverse()                               # Result is backward
return (lft or rgt) + res                   # Also add the remainder```

[扩展知识：Python内置的排序算法TimSort，看起来好复杂的样子啊，我果断只是略读了一下下]

[章节最后作者介绍了一些关于树平衡的内容，提到2-3树，我对树平衡不是特别感兴趣，也不是很明白，所以跳过不总结，感兴趣的不妨阅读下]

Binary search divides the sequence into two approximately equal parts in each recursive step. Consider ternary search, which divides the sequence into three parts. What would its asymptotic complexity be? What can you say about the number of comparisons in binary and ternary search?

The asymptotic running time would be the same. The number of comparison goes up, however. To see this, consider the recurrences B(n) = B(n/2) + 1 and T(n) = T(n/3) + 2 for binary and ternary search, respectively (with base cases B(1) = T(1) = 0 and B(2) = T(2) = 1). You can show (by induction) that B(n) < lg n + 1 < T(n).

0 条评论

## 相关文章

41370

### 基数排序简介及其并行化

基数排序号称线性时间排序算法中性能最好，速度最快的排序算法。本文将简要概括其算法思想，串行代码及其并行化。

15310

1.8K50

### Apache Spark 2.2.0 中文文档 - GraphX Programming Guide | ApacheCN

GraphX Programming Guide 概述 入门 属性 Graph 示例属性 Graph Graph 运算符 运算符的汇总表 P...

62380

### oc 中随机数的用法（arc4random() 、random()、CCRANDOM_0_1()

1)、arc4random() 比较精确不需要生成随即种子        使用方法 ：                  通过arc4random() 获取0到...

25180

47730

### 计算字符串相似度算法——Levenshtein

Levenshtein 距离，又称编辑距离，指的是两个字符串之间，由一个转换成另一个所需的最少编辑操作次数。

90810

41440

856100

12530