Python高效编程之itertools模块详解

0 前言

```lis = [ I ,  love ,  python ]
for i in lis:
print(i)
I
love
python```

OK，let s go. Hope you enjoy the journey!

1 拼接元素

itertools 中的chain 函数实现元素拼接，原型如下，参数*表示个数可变的参数

`chain`(iterables)

```In [33]: list(chain([ I , love ],[ python ],[ very ,  much ]))
Out[33]: [ I ,  love ,  python ,  very ,  much ]```

```def chain(*iterables):
for it in iterables:
for element in it:
yield element```

2 逐个累积

`accumulate`(iterable[, func, *, initial=None])

```In [36]: list(accumulate([1,2,3,4,5,6],lambda x,y: x*y))
Out[36]: [1, 2, 6, 24, 120, 720]```

accumulate大概的实现代码如下：

```def accumulate(iterable, func=operator.add, *, initial=None):
it = iter(iterable)
total = initial
if initial is None:
try:
total = next(it)
except StopIteration:
return
yield total
for element in it:
total = func(total, element)
yield total```

3 漏斗筛选

`compress`(data, selectors)

```In [38]: list(compress( abcdefg ,[1,1,0,1]))
Out[38]: [ a ,  b ,  d ]```

```def compress(data, selectors):
return (d for d, s in zip(data, selectors) if s)```

4 段位筛选

`dropwhile`(predicate, iterable)

```In [39]: list(dropwhile(lambda x: x<3,[1,0,2,4,1,1,3,5,-5]))
Out[39]: [4, 1, 1, 3, 5, -5]```

```def dropwhile(predicate, iterable):
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x```

5 段位筛选2

`takewhile`(predicate, iterable)

```In [43]: list(takewhile(lambda x: x<5, [1,4,6,4,1]))
Out[43]: [1, 4]```

```def takewhile(predicate, iterable):
for x in iterable:
if predicate(x):
yield x
else:
break #立即返回```

6 次品筛选

`dropwhile`(predicate, iterable)

```In [40]: list(filterfalse(lambda x: x%2==0, [1,2,3,4,5,6]))
Out[40]: [1, 3, 5]```

```def dropwhile(predicate, iterable):
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x```

7 切片筛选

Python中的普通切片操作，比如：

```lis = [1,3,2,1]
lis[:1]```

`islice`(iterable, start, stop[, step])

```In [41]: list(islice( abcdefg ,1,4,2))
Out[41]: [ b ,  d ]```

```def islice(iterable, *args):
s = slice(*args)
start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
it = iter(range(start, stop, step))
try:
nexti = next(it)
except StopIteration:
for i, element in zip(range(start), iterable):
pass
return
try:
for i, element in enumerate(iterable):
if i == nexti:
yield element
nexti = next(it)
except StopIteration:
for i, element in zip(range(i + 1, stop), iterable):
pass```

8 细胞分裂

tee函数类似于我们熟知的细胞分裂，它能复制原迭代器n个，原型如下：

`tee`(iterable, n=2)

```a = tee([1,4,6,4,1],2)
In [51]: next(a[0])
Out[51]: 1

In [52]: next(a[1])
Out[52]: 1```

```def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque:
try:
newval = next(it)
except StopIteration:
return
for d in deques:
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)```

tee 实现内部使用一个队列类型deques，起初生成空队列，向复制出来的每个队列中添加元素newval, 同时yield 当前被调用的mydeque中的最左元素。

9 map变体

starmap可以看做是map的变体，它能更加节省内存，同时iterable的元素必须也为可迭代对象，原型如下：

`starmap`(function, iterable)

```In [63]: list(starmap(lambda x,y: str(x)+ - +str(y), [( a ,1),( b ,2),( c ,3)]))
Out[63]: [ a-1 ,  b-2 ,  c-3 ]```

starmap的实现细节如下：

```def starmap(function, iterable):
for args in iterable:
yield function(*args)```

10 复制元素

repeat实现复制元素n次，原型如下：

`repeat`(object[, times])

```In [66]: list(repeat(6,3))
Out[66]: [6, 6, 6]

In [67]: list(repeat([1,2,3],2))
Out[67]: [[1, 2, 3], [1, 2, 3]]```

```def repeat(object, times=None):
if times is None:# 如果times不设置，将一直repeat下去
while True:
yield object
else:
for i in range(times):
yield object```

11 笛卡尔积

` ((x,y) for x in A for y in B)`

```In [68]: list(product( ABCD ,  xy ))
Out[68]:
[( A ,  x ),
( A ,  y ),
( B ,  x ),
( B ,  y ),
( C ,  x ),
( C ,  y ),
( D ,  x ),
( D ,  y )]```

```def product(*args, repeat=1):
pools = [tuple(pool) for pool in args] * repeat
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)```

12 加强版zip

```In [69]: list(zip_longest( ABCD ,  xy , fillvalue= - ))
Out[69]: [( A ,  x ), ( B ,  y ), ( C ,  - ), ( D ,  - )]```

```def zip_longest(*args, fillvalue=None):
iterators = [iter(it) for it in args]
num_active = len(iterators)
if not num_active:
return
while True:
values = []
for i, it in enumerate(iterators):
try:
value = next(it)
except StopIteration:
num_active -= 1
if not num_active:
return
iterators[i] = repeat(fillvalue)
value = fillvalue
values.append(value)
yield tuple(values)```

```In [74]: for i, it in enumerate([iter([1,2,3]),iter([ x , y ])]):
...:     print(next(it))
#输出：
1
x```

总结

Python的itertools模块提供的节省内存的高效迭代器，里面实现基本都借助于生成器，所以一方面了解这12个函数所实现的基本功能，同时也能加深对生成器(generator)的理解，为我们写出更加高效、简洁、漂亮的代码打下坚实基础。

