通过内置对象理解 Python（八）

老齐

发布于 2021-12-08 08:44:38

3080

发布于 2021-12-08 08:44:38

文章被收录于专栏：老齐教室

`bytearray` and `memoryview`: 字节接口

bytearray 与 bytes 类似，它的意义体现在：

bytearray 在一些低级操作中，比如有关字节和位运算，使用 bytearray 对于改变单个字节会更有效。例如下面的魔幻操作：

>>> def upper(s):
...     return ''.join(chr(ord(c) & 223) for c in s)
...
>>> def toggle(s): return ''.join(chr(ord(c) ^ 32) for c in s)
...
>>> def lower(s): return ''.join(chr(ord(c) | 32) for c in s)
...
>>> upper("Lao Qi")
'LAO\x00QI'
>>> toggle("Lao Qi")
'lAO\x00qI'
>>> lower("Lao Qi")
'lao qi'

字节的大小是固定的，而字符串则由于编码规则，其长度会有所不同，比如按照常用的 unicode 编码标准 utf-8 进行编码：

>>> x = 'I♥🐍'
>>> len(x)
3
>>> x.encode()
b'I\xe2\x99\xa5\xf0\x9f\x90\x8d'
>>> len(x.encode())
8
>>> x[2]
'🐍'
>>> x[2].encode()
b'\xf0\x9f\x90\x8d'
>>> len(x[2].encode())
4

变量 x 引用的字符串 I♥🐍 由三个字符构成，实际上共计 8 个字节，而表情符号 🐍 有4个字节长。按照下面的演示，如果读取表情符的每个单独的字节，它的“值”总是在 0 到 255 之间:

>>> x[2]
'🐍'
>>> b = x[2].encode()
>>> b
b'\xf0\x9f\x90\x8d'  # 4 bytes
>>> b[:1]
b'\xf0'
>>> b[1:2]
b'\x9f'
>>> b[2:3]
b'\x90'
>>> b[3:4]
b'\x8d'
>>> b[0]  # indexing a bytes object gives an integer
240
>>> b[3]
141

下面来看一些针对字节的位操作的例子:

def alternate_case(string):
    """Turns a string into alternating uppercase and lowercase characters."""
    array = bytearray(string.encode())
    for index, byte in enumerate(array):
        if not ((65 <= byte <= 90) or (97 <= byte <= 126)):
            continue

        if index % 2 == 0:
            array[index] = byte | 32
        else:
            array[index] = byte & ~32

    return array.decode()

>>> alternate_case('Hello WORLD?')
'hElLo wOrLd?'

这不是一个很好的示例，因此不用耗费精力解释它，但它确实有效，而且，相比于为每个字符的更改创建一个新的 bytes 对象，它更有效。

另外一个内置函数 memoryview 与 bytearray 很类似，但它可以引用一个对象或一个切片，而不是为自己创建一个新的副本，允许你传一个对内存中“字节段”的引用，并在原地编辑它:

>>> array = bytearray(range(256))
>>> array
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08...
>>> len(array)
256
>>> array_slice = array[65:91]  # Bytes 65 to 90 are uppercase english characters
>>> array_slice
bytearray(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
>>> view = memoryview(array)[65:91]  # Does the same thing,
>>> view
<memory at 0x7f438cefe040>  # but doesn't generate a new new bytearray by default
>>> bytearray(view)
bytearray(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ')  # It can still be converted, though.
>>> view[0]  # 'A'
65
>>> view[0] += 32  # Turns it lowercase
>>> bytearray(view)
bytearray(b'aBCDEFGHIJKLMNOPQRSTUVWXYZ')  # 'A' is now lowercase.
>>> bytearray(view[10:15])
bytearray(b'KLMNO')
>>> view[10:15] = bytearray(view[10:15]).lower()
>>> bytearray(view)
bytearray(b'aBCDEFGHIJklmnoPQRSTUVWXYZ')  # Modified 'KLMNO' in-place.

`bin`, `hex`, `oct`, `ord`, `chr` and `ascii` ：实现最基本转换

bin 、hex 和 oct 三个内置函数实现了最基本的数制转换：

>>> bin(42)
'0b101010'
>>> hex(42)
'0x2a'
>>> oct(42)
'0o52'
>>> 0b101010
42
>>> 0x2a
42
>>> 0o52
42

轻松地实现了二进制、八进制和十六进制与十进制整数之间的转换。

>>> type(0x20)
<class 'int'>
>>> type(0b101010)
<class 'int'>
>>> 0o100 == 64
True

虽然十进制容易理解，但在有的时候，用其他进制，也是有必要的，如：

>>> bytes([255, 254])
b'\xff\xfe'              # Not very easy to comprehend
>>> # This can be written as:
>>> bytes([0xff, 0xfe])
b'\xff\xfe'              # An exact one-to-one translation

下面的示例中，则将文件的打开模式 mode 的值用八进制实现：

>>> import os
>>> os.open('file.txt', os.O_RDWR, mode=384)    # ??? what's 384
>>> # This can be written as:
>>> os.open('file.txt', os.O_RDWR, mode=0o600)  # mode is 600 -> read-write

请注意，bin 仅用于创建一个 Python 整数的二进制数时，如果想要的是二进制字符串，最好使用 Python 的字符串格式：

>>> f'{42:b}'
101010

内置函数 ord 和 chr 用于实现 ASCII 和 unicode 字符及其字符编码间的转换：

>>> ord('x')
120
>>> chr(120)
'x'
>>> ord('🐍')
128013
>>> hex(ord('🐍'))
'0x1f40d'
>>> chr(0x1f40d)
'🐍'
>>> '\U0001f40d'  # The same value, as a unicode escape inside a string
'🐍'

`format`：文本格式

内置函数 format(string, spec) 是 string.format(spec) 的另一种方式。可以用它实现字符串的转换，比如：

>>> format(42, 'c')             # int to ascii
'*'
>>> format(604, 'f')            # int to float
'604.000000'
>>> format(357/18, '.2f')       # specify decimal precision
'19.83'
>>> format(604, 'x')            # int to hex
'25c'
>>> format(604, 'b')            # int to binary
'1001011100'
>>> format(604, '0>16b')        # binary with zero-padding
'0000001001011100'
>>> format('Python!', '🐍^15')  # centered aligned text
'🐍🐍🐍🐍Python!🐍🐍🐍🐍'

`any` 和 `all`

这是两个非常 Pythonic 的函数，恰当使用，能让代码更短，可读性更强，体现了 Python 的精髓。例如：

假设编写一个验证请求是否合规的 API，接受来自请求的 JSON 数据，判断该数据中是否含有 id 字段，并且该字段的长度必须是 20 ，一种常见的写法是：

def validate_responses(responses):
    for response in responses:
        # Make sure that `id` exists
        if 'id' not in response:
            return False
        # Make sure it is a string
        if not isinstance(response['id'], str):
            return False
        # Make sure it is 20 characters
        if len(response['id']) != 20:
            return False

    # If everything was True so far for every
    # response, then we can return True.
    return True

用 all 函数优化之后为：

def validate_responses(responses):
    return all(
        'id' in response
        and isinstance(response['id'], str)
        and len(response['id']) == 20
        for response in responses
    )

all 的参数是布尔值组成的迭代器，若迭代器中有一个 False 值，函数 all 的返回即为 False 。否则返回 True 。

再看一个判断回文的示例：

def contains_palindrome(words):
    for word in words:
        if word == ''.join(reversed(word)):
            return True

    # Found no palindromes in the end
    return False

与之相对的是

def contains_palindrome(words):
    return any(word == ''.join(reversed(word)) for word in words)

补充知识： any 和 all 内部的列表解析

我们可以把使用 any 或 all 的代码写成列表解析式:

>>> any([num == 0 for num in nums])

而不是生成器表达式:

>>> any(num == 0 for num in nums)

用列表解析和生成器，两者有较大的区别：

>>> any(num == 10 for num in range(100_000_000))
True
>>> any([num == 10 for num in range(100_000_000)])
True

使用列表解析的第二行代码不仅会在列表中毫无理由地存储1亿个值，然后再运行 any ，而且在我的机器上也需要10秒以上的时间。同时，因为第一行代码是一个生成器表达式，它会逐个生成从 0 到 10 的数字，并将它们传给 any ，一旦计数达到 10，any 就会中断迭代并几乎立即返回 True 。这也意味着，在这种情况下，它的运行速度实际上快了一千万倍。

所以，要使用生成器。

★ 关于生成器的更多知识，请查阅《Python 大学实用教程》（电子工业出版社）”

（补充知识完毕）

【未完，待续】

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-12-02，如有侵权请联系 cloudcommunity@tencent.com 删除

编程算法

python

unicode

本文分享自老齐教室微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

编程算法

python

unicode

登录后参与评论

0 条评论

热度