专栏首页Python无止境英文阅读 | range对象不是迭代器

英文阅读 | range对象不是迭代器

导读:昨天写了一篇《为什么range不是迭代器?range到底是什么类型?》,它跟之前两篇关于迭代器的文章是一脉相承的,所以我就没再介绍迭代器是什么,以及它跟可迭代对象有啥差别。到了展示 range 不是迭代器的时候,也是简单带过。这引起某个论坛的小伙伴说我没抓住重点。

他是误会了。我从头到尾所关心的重点就是两个问题:为什么 range 不是迭代器,range 是一种怎样的序列类型?也就是说,我关心的是原因,想要探寻 Python 的设计思想,而不仅仅是区分已经很显然的“Iterable 和 Iterator 的区别”。我基于这样的考虑:range 对象完全可以被设计成迭代器,如此仅仅会减少一些便利而已,并非是不能,所以怎么设计 range,这是一道选择题。

然后,就要说到今天分享的这篇文章了。它的作者是一名有多年经验的 Python 培训师/咨询师/演说者,文章主要回答的问题是 “is range an iterator?” 它花费了不少篇幅,来来去去就是在论证 range 是一个迭代器。我不满足于此,所以上篇文章是在更深层的方向去做思考,是要追问为什么,以及为什么的为什么。

虽然有此不同的考虑,但不可否认这篇文章是不错的科普文章,它主题明确、思路清晰、浅显易懂,是一篇不错的阅读材料,关键是还能找到中文译文,所以,我要分享给大家一读。


原标题:Python: range is not an iterator!

作者:Trey Hunner

英文:http://t.cn/EGSAs5y

译文:https://zhuanlan.zhihu.com/p/34157478


After my Loop Better talk at PyGotham 2017 someone asked me a great question: iterators are lazy iterables and range is a lazy iterable in Python 3, so is range an iterator?

Unfortunately, I don’t remember the name of the person who asked me this question. I do remember saying something along the lines of “oh I love that question!”

I love this question because range objects in Python 3 (xrange in Python 2) are lazy, but range objects are not iterators and this is something I see folks mix up frequently.

In the last year I’ve heard Python beginners, long-time Python programmers, and even other Python trainers mistakenly refer to Python 3’s range objects as iterators. This distinction is something a lot of people get confused about.

Yes this is confusing

When people talk about iterators and iterables in Python, you’re likely to hear the someone repeat the misconception that range is an iterator. This mistake might seem unimportant at first, but I think it’s actually a pretty critical one. If you believe that range objects are iterators, your mental model of how iterators work in Python isn’t clear enough yet. Both range and iterators are “lazy” in a sense, but they’re lazy in fairly different ways.

With this article I’m going to explain how iterators work, how range works, and how the laziness of these two types of “lazy iterables” differs.

But first, I’d like to ask that you do not use the information below as an excuse to be unkind to anyone, whether new learners or experienced Python programmers. Many people have used Python very happily for years without fully understanding the distinction I’m about to explain. You can write many thousands of lines of Python code without having a strong mental model of how iterators work.

What’s an iterator?

In Python an iterable is anything that you can iterate over and an iterator is the thing that does the actual iterating.

Iter-ables are able to be iterated over. Iter-ators are the agents that perform the iteration.

You can get an iterator from any iterable in Python by using the iter function:

>>> iter([1, 2])
<list_iterator object at 0x7f043a081da0>
>>> iter('hello')
<str_iterator object at 0x7f043a081dd8>

Once you have an iterator, the only thing you can do with it is get its next item:

>>> my_iterator = iter([1, 2])
>>> next(my_iterator)
1
>>> next(my_iterator)
2

And you’ll get a stop iteration exception if you ask for the next item but there aren’t anymore items:

>>> next(my_iterator)
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
StopIteration

Both conveniently and somewhat confusingly, all iterators are also iterables. Meaning you can get an iterator from an iterator (it’ll give you itself back). Therefore you can iterate over an iterator as well:

>>> my_iterator = iter([1, 2])
>>> [x**2 for x in my_iterator]
[1, 4]

Importantly, it should be noted that iterators are stateful. Meaning once you’ve consumed an item from an iterator, it’s gone. So after you’ve looped over an iterator once, it’ll be empty if you try to loop over it again:

>>> my_iterator = iter([1, 2])
>>> [x**2 for x in my_iterator]
[1, 4]
>>> [x**2 for x in my_iterator]
[]

In Python 3, enumerate, zip, reversed, and a number of other built-in functions return iterators:

>>> enumerate(numbers)
<enumerate object at 0x7f04384ff678>
>>> zip(numbers, numbers)
<zip object at 0x7f043a085cc8>
>>> reversed(numbers)
<list_reverseiterator object at 0x7f043a081f28>

Generators (whether from generator functions or generator expressions) are one of the simpler ways to create your own iterators:

>>> numbers = [1, 2, 3, 4, 5]
>>> squares = (n**2 for n in numbers)
>>> squares
<generator object <genexpr> at 0x7f043a0832b0>

I often say that iterators are lazy single-use iterables. They’re “lazy” because they have the ability to only compute items as you loop over them. And they’re “single-use” because once you’ve “consumed” an item from an iterator, it’s gone forever. The term “exhausted” is often used for an iterator that has been fully consumed.

That was the quick summary of what iterators are. If you haven’t encountered iterators before, I’d recommend reviewing them a bit further before continuing on. I’ve written an article which explains iterators and I’ve given a talk, Loop Better which I mentioned earlier, during which I dive a bit deeper into iterators.

How is range different?

Okay we’ve reviewed iterators. Let’s talk about range now.

The range object in Python 3 (xrange in Python 2) can be looped over like any other iterable:

>>> for n in range(3):    
>>>     print(n)
0
1
2

And because range is an iterable, we can get an iterator from it:

>>> iter(range(3))
<range_iterator object at 0x7f043a0a7f90>

But range objects themselves are not iterators. We cannot call next on a range object:

>>> next(range(3))
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
TypeError: 'range' object is not an iterator

And unlike an iterator, we can loop over a range object without consuming it:

>>> numbers = range(3)
>>> tuple(numbers)
(0, 1, 2)
>>> tuple(numbers)
(0, 1, 2)

If we did this with an iterator, we’d get no elements the second time we looped:

>>> numbers = iter(range(3))
>>> tuple(numbers)
(0, 1, 2)
>>> tuple(numbers)
()

Unlike zip, enumerate, or generator objects, range objects are not iterators.

So what is range?

The range object is “lazy” in a sense because it doesn’t generate every number that it “contains” when we create it. Instead it gives those numbers to us as we need them when looping over it.

Here is a range object and a generator (which is a type of iterator):

>>> numbers = range(1_000_000)
>>> squares = (n**2 for n in numbers)

Unlike iterators, range objects have a length:

>>> len(numbers)
1000000
>>> len(squares)
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

And they can be indexed:

>>> numbers[-2]
999998
>>> squares[-2]
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

And unlike iterators, you can ask them whether they contain things without changing their state:

>>> 0 in numbers
True
>>> 0 in numbers
True
>>> 0 in squares
True
>>> 0 in squares
False

If you’re looking for a description for range objects, you could call them “lazy sequences”. They’re sequences (like lists, tuples, and strings) but they don’t really contain any memory under the hood and instead answer questions computationally.

>>> from collections.abc import Sequence
>>> isinstance([1, 2], Sequence)
True
>>> isinstance('hello', Sequence)
True
>>> isinstance(range(3), Sequence)
True

Why does this distinction matter?

It might seem like I’m nitpicking in saying that range isn’t an iterator, but I really don’t think I am.

If I tell you something is an iterator, you’ll know that when you call iter on it you’ll always get the same object back (by definition):

>>> iter(my_iterator) is my_iterator
True

And you’ll be certain that you can call next on it because you can call next on all iterators:

>>> next(my_iterator)
4
>>> next(my_iterator)
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
StopIteration

And you’ll know that items will be consumed from the iterator as you loop over it. Sometimes this feature can come in handy for processing iterators in particular ways:

>>> my_iterator = iter([1, 2, 3, 4])
>>> list(zip(my_iterator, my_iterator))
[(1, 2), (3, 4)]

So while it may seem like the difference between “lazy iterable” and “iterator” is subtle, these terms really do mean different things. While “lazy iterable” is a very general term without concrete meaning, the word “iterator” implies an object with a very specific set of behaviors.

When in doubt say “iterable” or “lazy iterable”

If you know you can loop over something, it’s an iterable.

If you know the thing you’re looping over happens to compute things as you loop over it, it’s a lazy iterable.

If you know you can pass something to the next function, it’s an iterator (which are the most common form of lazy iterables).

If you can loop over something multiple times without “exhausting” it, it’s not an iterator. If you can’t pass something to the next function, it’s not an iterator. Python 3’s range object is not an iterator. If you’re teaching people about range objects, please don’t use the word “iterator”. It’s confusing and might cause others to start misusing the word “iterator” as well.

On the other hand, if you see someone else misusing the word iterator don’t be mean. You may want to point out the misuse if it seems important, but keep in mind that I’ve heard long-time Python programmers and experienced Python trainers misuse this word by calling rangeobjects iterators. Words are important, but language is tricky.

Thanks for joining me on this brief range and iterator-filled adventure!

本文分享自微信公众号 - Python猫(python_cat)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2019-01-06

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • JVM--对象回收

    Java运行时数据区包括:程序计数器、虚拟机栈、本地方法栈、Java堆和方法区。这里面程序计数器、虚拟机栈和本地方法栈是线程私有的,当线程结束或者方法退出时其内...

    SuperHeroes
  • Spring Bean的装配(非XML文件方式)

    Spring自动扫描默认没有开启,所以我们需要配置开启组件扫描。当然可以通过XML文件配置,但新Spring支持Java配置。

    SuperHeroes
  • 分布式Java--基于远程调用实现系统间通信

    远程调用方式就是尽可能将系统间的调用模拟为系统内的调用,让使用者感觉远程调用就像是调用本地接口一样。但远程调用并不能做到完全透明,因为存在网络问题、超时问题、序...

    SuperHeroes
  • JVM--对象创建

    虚拟机遇到一条new指令时,首先去检查这个指令的参数是否能在常量池中定位到一个符号引用,并且检查这个符号引用代表的类是否已被加载、解析和初始化过。如果没有,就必...

    SuperHeroes
  • Servlet入门笔记

    读取客户端(浏览器)发送的数据,接收数据并处理过后将数据返回给客户端(浏览器),即用于实现服务端业务逻辑。

    SuperHeroes
  • JVM--类加载

    Java虚拟机没有强制约束什么情况下需要开始类加载的第一个阶段--加载,但Java虚拟机强制约束了类的初始化的开始时间(而加载、验证、准备自然在初始化之前进行)...

    SuperHeroes
  • Java--Big Number操作(BigInteger类和BigDecimal类)

    java.math.BigInteger 类的使用场景是大整数操作。它提供类似所有Java的基本整数运算符和java.lang.Math中的所有相关的方法的操作...

    SuperHeroes
  • Servlet过滤器笔记

    实现Servlet过滤器关键有两点,实现Filter接口,在web.xml中配置过滤器。

    SuperHeroes
  • 搭建SpringMVC(非web.xml文件方式)

    既然DispatcherServlet是Spring MVC的核心,首先来配置DispatcherServlet。传统的配置DispatcherServlet是...

    SuperHeroes
  • 分布式Java--基于消息方式实现系统间通信

    分布式子系统之间需要通信时,就发送消息。一般通信的两个要点是:消息处理和消息传输。

    SuperHeroes

扫码关注云+社区

领取腾讯云代金券