导读：昨天写了一篇《为什么range不是迭代器？range到底是什么类型？》，它跟之前两篇关于迭代器的文章是一脉相承的，所以我就没再介绍迭代器是什么，以及它跟可迭代对象有啥差别。到了展示 range 不是迭代器的时候，也是简单带过。这引起某个论坛的小伙伴说我没抓住重点。
他是误会了。我从头到尾所关心的重点就是两个问题：为什么 range 不是迭代器，range 是一种怎样的序列类型？也就是说，我关心的是原因，想要探寻 Python 的设计思想，而不仅仅是区分已经很显然的“Iterable 和 Iterator 的区别”。我基于这样的考虑：range 对象完全可以被设计成迭代器，如此仅仅会减少一些便利而已，并非是不能，所以怎么设计 range，这是一道选择题。
然后，就要说到今天分享的这篇文章了。它的作者是一名有多年经验的 Python 培训师/咨询师/演说者，文章主要回答的问题是 “is range an iterator？” 它花费了不少篇幅，来来去去就是在论证 range 是一个迭代器。我不满足于此，所以上篇文章是在更深层的方向去做思考，是要追问为什么，以及为什么的为什么。
原标题：Python: range is not an iterator!
After my Loop Better talk at PyGotham 2017 someone asked me a great question: iterators are lazy iterables and
range is a lazy iterable in Python 3, so is
range an iterator?
Unfortunately, I don’t remember the name of the person who asked me this question. I do remember saying something along the lines of “oh I love that question!”
I love this question because
range objects in Python 3 (xrange in Python 2) are lazy, but range objects are not iterators and this is something I see folks mix up frequently.
In the last year I’ve heard Python beginners, long-time Python programmers, and even other Python trainers mistakenly refer to Python 3’s
range objects as iterators. This distinction is something a lot of people get confused about.
When people talk about iterators and iterables in Python, you’re likely to hear the someone repeat the misconception that
range is an iterator. This mistake might seem unimportant at first, but I think it’s actually a pretty critical one. If you believe that
range objects are iterators, your mental model of how iterators work in Python isn’t clear enough yet. Both
range and iterators are “lazy” in a sense, but they’re lazy in fairly different ways.
With this article I’m going to explain how iterators work, how
range works, and how the laziness of these two types of “lazy iterables” differs.
But first, I’d like to ask that you do not use the information below as an excuse to be unkind to anyone, whether new learners or experienced Python programmers. Many people have used Python very happily for years without fully understanding the distinction I’m about to explain. You can write many thousands of lines of Python code without having a strong mental model of how iterators work.
In Python an iterable is anything that you can iterate over and an iterator is the thing that does the actual iterating.
Iter-ables are able to be iterated over. Iter-ators are the agents that perform the iteration.
You can get an iterator from any iterable in Python by using the
>>> iter([1, 2]) <list_iterator object at 0x7f043a081da0> >>> iter('hello') <str_iterator object at 0x7f043a081dd8>
Once you have an iterator, the only thing you can do with it is get its next item:
>>> my_iterator = iter([1, 2]) >>> next(my_iterator) 1 >>> next(my_iterator) 2
And you’ll get a stop iteration exception if you ask for the next item but there aren’t anymore items:
>>> next(my_iterator) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
Both conveniently and somewhat confusingly, all iterators are also iterables. Meaning you can get an iterator from an iterator (it’ll give you itself back). Therefore you can iterate over an iterator as well:
>>> my_iterator = iter([1, 2]) >>> [x**2 for x in my_iterator] [1, 4]
Importantly, it should be noted that iterators are stateful. Meaning once you’ve consumed an item from an iterator, it’s gone. So after you’ve looped over an iterator once, it’ll be empty if you try to loop over it again:
>>> my_iterator = iter([1, 2]) >>> [x**2 for x in my_iterator] [1, 4] >>> [x**2 for x in my_iterator] 
In Python 3,
reversed, and a number of other built-in functions return iterators:
>>> enumerate(numbers) <enumerate object at 0x7f04384ff678> >>> zip(numbers, numbers) <zip object at 0x7f043a085cc8> >>> reversed(numbers) <list_reverseiterator object at 0x7f043a081f28>
Generators (whether from generator functions or generator expressions) are one of the simpler ways to create your own iterators:
>>> numbers = [1, 2, 3, 4, 5] >>> squares = (n**2 for n in numbers) >>> squares <generator object <genexpr> at 0x7f043a0832b0>
I often say that iterators are lazy single-use iterables. They’re “lazy” because they have the ability to only compute items as you loop over them. And they’re “single-use” because once you’ve “consumed” an item from an iterator, it’s gone forever. The term “exhausted” is often used for an iterator that has been fully consumed.
That was the quick summary of what iterators are. If you haven’t encountered iterators before, I’d recommend reviewing them a bit further before continuing on. I’ve written an article which explains iterators and I’ve given a talk, Loop Better which I mentioned earlier, during which I dive a bit deeper into iterators.
Okay we’ve reviewed iterators. Let’s talk about
range object in Python 3 (
xrange in Python 2) can be looped over like any other iterable:
>>> for n in range(3): >>> print(n) 0 1 2
range is an iterable, we can get an iterator from it:
>>> iter(range(3)) <range_iterator object at 0x7f043a0a7f90>
range objects themselves are not iterators. We cannot call
next on a
>>> next(range(3)) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'range' object is not an iterator
And unlike an iterator, we can loop over a
range object without consuming it:
>>> numbers = range(3) >>> tuple(numbers) (0, 1, 2) >>> tuple(numbers) (0, 1, 2)
If we did this with an iterator, we’d get no elements the second time we looped:
>>> numbers = iter(range(3)) >>> tuple(numbers) (0, 1, 2) >>> tuple(numbers) ()
range objects are not iterators.
range object is “lazy” in a sense because it doesn’t generate every number that it “contains” when we create it. Instead it gives those numbers to us as we need them when looping over it.
Here is a
range object and a generator (which is a type of iterator):
>>> numbers = range(1_000_000) >>> squares = (n**2 for n in numbers)
range objects have a length:
>>> len(numbers) 1000000 >>> len(squares) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'generator' has no len()
And they can be indexed:
>>> numbers[-2] 999998 >>> squares[-2] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'generator' object is not subscriptable
And unlike iterators, you can ask them whether they contain things without changing their state:
>>> 0 in numbers True >>> 0 in numbers True >>> 0 in squares True >>> 0 in squares False
If you’re looking for a description for
range objects, you could call them “lazy sequences”. They’re sequences (like lists, tuples, and strings) but they don’t really contain any memory under the hood and instead answer questions computationally.
>>> from collections.abc import Sequence >>> isinstance([1, 2], Sequence) True >>> isinstance('hello', Sequence) True >>> isinstance(range(3), Sequence) True
It might seem like I’m nitpicking in saying that range isn’t an iterator, but I really don’t think I am.
If I tell you something is an iterator, you’ll know that when you call
iter on it you’ll always get the same object back (by definition):
>>> iter(my_iterator) is my_iterator True
And you’ll be certain that you can call
next on it because you can call
next on all iterators:
>>> next(my_iterator) 4 >>> next(my_iterator) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
And you’ll know that items will be consumed from the iterator as you loop over it. Sometimes this feature can come in handy for processing iterators in particular ways:
>>> my_iterator = iter([1, 2, 3, 4]) >>> list(zip(my_iterator, my_iterator)) [(1, 2), (3, 4)]
So while it may seem like the difference between “lazy iterable” and “iterator” is subtle, these terms really do mean different things. While “lazy iterable” is a very general term without concrete meaning, the word “iterator” implies an object with a very specific set of behaviors.
If you know you can loop over something, it’s an iterable.
If you know the thing you’re looping over happens to compute things as you loop over it, it’s a lazy iterable.
If you know you can pass something to the
next function, it’s an iterator (which are the most common form of lazy iterables).
If you can loop over something multiple times without “exhausting” it, it’s not an iterator. If you can’t pass something to the
next function, it’s not an iterator. Python 3’s
range object is not an iterator. If you’re teaching people about
range objects, please don’t use the word “iterator”. It’s confusing and might cause others to start misusing the word “iterator” as well.
On the other hand, if you see someone else misusing the word iterator don’t be mean. You may want to point out the misuse if it seems important, but keep in mind that I’ve heard long-time Python programmers and experienced Python trainers misuse this word by calling
rangeobjects iterators. Words are important, but language is tricky.
Thanks for joining me on this brief
range and iterator-filled adventure!
本文分享自微信公众号 - Python猫（python_cat）
原文出处及转载信息见文内详细说明，如有侵权，请联系 firstname.lastname@example.org 删除。