《dive into python3》 笔记摘录

0、In Python 2, the / operator usually meant integer division, but you could make it behave like floating point division by including a special directive in your code. In Python 3, the / operator always means floating point division.

1、list can hold arbitrary objects and can expand dynamically as new items are added. A list is an ordered set of items.

2、A tuple is an immutable list. A tuple can not be changed in any way once it is created.

3、A set is an unordered “bag” of unique values. A single set can contain values of any immutable datatype.

4、A dictionary is an unordered set of key-value pairs. keys are unique and immutable

5、import os, glob, humansize

     metadata_list = [(f, os.stat(f)) for f in glob.glob('*test*.py')]

     metadata_dict = {f:os.stat(f) for f in glob.glob('*')}

     humansize_dict = {os.path.splitext(f)[0]:humansize.approximate_size(meta.st_size) \

                              for f, meta in metadata_dict.items() if meta.st_size > 6000}

     a_set = {2**x for x in range(10)}

6、Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those 

abstractions. In Python 3, all strings are immutable sequences of Unicode characters.The built-in len() 

function returns the length of the string, i.e. the number of characters. A string is like a tuple of characters.

An immutable sequence of numbers-between-0-and-255 is called a bytes object.

Each item in a string is a string, each item in a byte array is an integer.

aBuf = b'\xEF\xBB\xBF'

aBuf[-1] #191

aBuf[-1:] #b'\xbf'  byte array

7、To define a bytes object, use the b' ' “byte literal” syntax. Each byte within the byte literal can be an 

ASCII character or an encoded hexadecimal number from \x00 to \xff (0–255).To convert a bytes object into 

a mutable bytearray object, use the built-in bytearray() function.

8、bytes objects have a decode() method that takes acharacter encoding and returns a string, and strings 

have an encode() method that takes a characterencoding and returns a bytes object. 

9、'1MB = 1000{0.modules[humansize].SUFFIXES[1000][0]}'.format(sys)  #compound field names

10、“The rules for parsing an item key are very simple. If it starts with a digit, then it is treated as a number, otherwise it is used as a string.”

11、Within a replacement field, a colon (:) marks the start of the format specifier.

12、compact regular expressions

import re

s = '100 BROAD ROAD APT. 3'  re.sub(r'\bROAD\b', 'RD.', s)

#  re.sub(b'\bROAD\b', 'RD.', s)

# search bytes

pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$' re.search(pattern, 'MDLV')

phonePattern = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')

#Putting it all in parentheses means “match exactly three numeric digits, and then remember them as a #group that I can ask for later”.  phonePattern.search('(800)5551212 ext. 1234').groups()  ('800', '555', '1212', '1234') >>> phonePattern.search('800-555-1212').groups()  ('800', '555', '1212', '') >>> phonePattern.search('work 1-(800) 555.1212 #1234')

('800', '555', '1212', '1234')

if phonePattern.match('800-555-1212'):


if phonePattern.search('800-555-1212'):


re.sub('([^aeiou])y$', r'\1ies', 'vacancy') 


# \1, which means “hey, that first group you remembered? put it right here.” 

re.findall('[0-9]+', '16 2-by-4s in rows of 8')  ['16', '2', '4', '8'] re.findall('[A-Z]+', 'SEND + MORE == MONEY')  ['SEND', 'MORE', 'MONEY']

re.findall(' s.*? s', "The sixth sick sheikh's sixth sheep's sick.") 

# (.*?) means the shortest possible series of any character

[' sixth s', " sheikh's s", " sheep's s"] #doesn’t return overlapping matches. 

13、verbose regular expressions

Python allows you to do this with something called verbose regular expressions. A verbose regular expression is different from a compact regular expression in two ways:

• Whitespace is ignored. Spaces, tabs, and carriage returns are not matched as spaces, tabs, and carriage returns. They’re not matched at all. (If you want to match a space in a verbose regular expression, you’ll need to escape it by putting a backslash in front of it.)

• Comments are ignored. A comment in a verbose regular expression is just like a comment in Python code: it starts with a # character and goes until the end of the line. In this case it’s a comment within a multi-line string instead of within your source code, but it works the same way.

pattern = ''' ^                                    # beginning of string M{0,3}                              # thousands - 0 to 3 Ms (CM|CD|D?C{0,3})           # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 Cs),                                    # or 500-800 (D, followed by 0 to 3 Cs) (XC|XL|L?X{0,3})           # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 Xs),                                    # or 50-80 (L, followed by 0 to 3 Xs) (IX|IV|V?I{0,3})                # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 Is),                                    # or 5-8 (V, followed by 0 to 3 Is) $                                    # end of string '''  re.search(pattern, 'M', re.VERBOSE) 

phonePattern = re.compile(r'''                                    # don't match beginning of string, number can start anywhere (\d{3})                          # area code is 3 digits (e.g. '800') \D*                               # optional separator is any number of non-digits (\d{3})                          # trunk is 3 digits (e.g. '555') \D*                               # optional separator (\d{4})                          # rest of number is 4 digits (e.g. '1212') \D*                               # optional separator (\d*)                          # extension is optional and can be any number of digits $                               # end of string ''', re.VERBOSE)

phonePattern.search('work 1-(800) 555.1212 #1234').groups()

14、regular expressions

• ^ matches the beginning of a string. • $ matches the end of a string. • \b matches a word boundary. • \d matches any numeric digit. • \D matches any non-numeric character. • x? matches an optional x character (in other words, it matches an x zero or one times). • x* matches x zero or more times. • x+ matches x one or more times. • x{n,m} matches an x character at least n times, but not more than m times. • (a|b|c) matches exactly one of a, b or c. • (x) in general is a remembered group. You can get the value of what matched by using the groups() methodof the object returned by re.search.

15、This technique of using the values of outside parameters within a dynamic function is called closures. 

16、The with statement creates what’s called a context: when the with block ends, Python will automatically close the file, even if an exception is raised inside the with block.

There’s nothing file-specific about the with statement; it’s just a generic framework for creating runtime contexts and telling objects that they’re entering and exiting a runtime context. If the object in question is a stream object, then it does useful file-like things (like closing the file automatically). But that behavior is defined in the stream object, not in the with statement. 

17、The first argument to the split() method is None, which means “split on any whitespace (tabs or spaces, it makes no difference).” The second argument is 3, which means “split on whitespace 3 times, then leave the rest of the line alone.” 

18、The presence of the yield x keyword in function body means that this is not a normal function. It is a special kind of function which generates values one at a time. You can think of it as a resumable function. Calling it will return a generator that can be used to generate successive values of x. The next() function takes a generator object and returns its next value.

def fib(max):
    n, a, b = 0, 0, 1
    while n < max:
        yield b
        a, b = b, a + b
        n = n + 1

"yield" pause a function, and "next()" resumes where it left off.

Generators are just a simple form of iterators. A function that yields values is a nice, compact way of building an iterator without building an iterator. File objects are iterators too! It’s iterators all the way down.

This is a useful idiom: pass a generator to the list() function, and it will iterate through the entire generator (just like the for loop) and return a list of all the values.

The for loop will automatically call the next() function to get values from the generator and assign them to the for loop index variable.

实际上还有一种创建generator 的简单方法 :

g = (x * x for x in range(10)) 注意与列表生成式 g = [ ... ] 区分

g.__next__() /*in Python 2, it's g.next()*/ or next(g) or for n in g : print(n)

19、‘pass' is a Python reserved word that just means “move along, nothing to see here”. 

20、The first argument of every class method, including the __init__() method, is always a reference to the

current instance of the class. By convention, this argument is named self. 

21、An iterator is just a class that defines an __iter__() method. 

class Fib:
    '''iterator that yields numbers in the Fibonacci sequence'''
    def __init__(self, maxn):
        self.maxn = maxn

    def __iter__(self):
        self.a = 0
        self.b = 1
        return self

    def __next__(self):
        fib = self.a
        if fib > self.maxn:
            raise StopIteration
        self.a, self.b = self.b, self.a + self.b
        return fib

for n in Fib(1000):
    print(n, end=' ')

After performing beginning-of-iteration initialization, the __iter__() method can return any object that implements a __next__() method. The __next__() method is called whenever someone calls next() on an iterator of an instance of a class.

iter(object) calls object.__iter__(), return an iterator object; 

next(iterator_object) calls iterator_object.__next__(), return a value;

for n in Fib(1000):
     print(n, end=' ')

a for loop calls __init__() (if the object exists, ignore) and __iter__() once, but calls __next__() several times until encounter raise StopIteration exception. 

When the __next__() method raises a StopIteration exception, this signals to the caller that the iteration is exhausted. Unlike most exceptions, this is not an error; it’s a normal condition that just means that the iterator has no more values to generate. If the caller is a for loop, it will notice this StopIteration exception and gracefully exit the loop. 

22、when the variable was  not defined within any method. It’s defined at the class level. It’s a class variable, and although you can access it just like an instance variable (self.rules_filename), it is shared across all instances of the same class.

23、A generator expression is like an anonymous function that yields values. The expression itself looks like a list comprehension, but it’s wrapped in parentheses instead of square brackets.

unique_characters = {'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}  gen = (ord(c) for c in unique_characters)

If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to tuple(), list(), or set(). In these cases, you don’t need an extra set of parentheses — just pass the “bare” expression ord(c) for c in unique_characters to the tuple() function, and Python figures out that it’s a generator expression.

tuple(ord(c) for c in unique_characters)  (69, 68, 77, 79, 78, 83, 82, 89)

24、The itertools.permutations() function doesn’t have to take a list. It can take any sequence — even a string.The permutations() function takes a sequence and a number, which is the number of items you want in each smaller group. The function returns an iterator.

The itertools.combinations() function returns an iterator containing all the possible combinations of the given sequence of the given length.

The itertools.groupby() function takes a sequence and a key function, and returns an iterator that generates pairs. Each pair contains the result of key_function(each item) and another iterator containing all the items that shared that key result.

The itertools.groupby() function only works if the input sequence is already sorted by the grouping function. 

The itertools.chain() function takes two iterators and returns an iterator that contains all the items from the first iterator, followed by all the items from the second iterator. (Actually, it can take any number of iterators, and it chains them all in the order they were passed to the function.)

25、rstrip() string method to strip trailing whitespace from each line. (Strings also have an lstrip() method to strip leading whitespace, and a strip() method which strips both.)


characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y') guess = ('1', '2', '0', '3', '4', '5', '6', '7')

 tuple(zip(characters, guess)) 

(('S', '1'), ('M', '2'), ('E', '0'), ('D', '3'), ('O', '4'), ('N', '5'), ('R', '6'), ('Y', '7'))

 dict(zip(characters, guess)) 

{'E': '0', 'D': '3', 'M': '2', 'O': '4', 'N': '5', 'S': '1', 'R': '6', 'Y': '7'}

'SEND + MORE == MONEY'.translate(translation_table)  '1053 + 2460 == 24507'

The second and third parameters passed to the eval() function act as the global and local namespaces for evaluating the expression. 

The subprocess module allows you to run arbitrary shell commands and get the result as a Python string.

eval("__import__('subprocess').getoutput('rm -rf /')",  {"__builtins__":None}, {}) 

#error. the __import__() function is also a builtin function

27、Running the script runs unittest.main(), which runs each test case. Each test case is a method within a

class in xxxtest.py. There is no required organization of these test classes; they can each contain a single test method, or you can have one class that contains multiple test methods. The only requirement is that each test class must inherit from unittest.TestCase.

The unittest.TestCase class provides the assertRaises method, which takes the following arguments: the exception you’re expecting, the function you’re testing, and the arguments you’re passing to that function. (If the function you’re testing takes more than one argument, pass them all to assertRaises, in order, and it will pass them right along to the function you’re testing.)

28、It is important to understand that modules are only imported once, then cached. If you import an already-imported module, it does nothing. 

29、a file on disk is a sequence of bytes.The default encoding is platform dependent.

30、Python has a built-in function called open(). The open() function returns a stream object, which has methods and attributes for getting information about and manipulating a stream of characters.

31、Once you open a file (with the correct encoding), reading from it is just a matter of calling the stream object’s read() method. The result is a string.The read() method can take an optional parameter, the number of characters to read.

32、The seek() and tell() methods always count bytes, but since you opened this file as text, the read() method counts characters. Chinese characters require multiple bytes to encode in UTF -8 .

33、The stream object file still exists; calling its close() method doesn’t destroy the object itself. But it’s

not terribly useful.Closed stream objects do have one useful attribute: the closed attribute will confirm that the file is closed.

34、Read a file one line at a time

line_number = 0
with open('examples/favorite-people.txt', encoding='utf-8') as a_file: 
    for a_line in a_file: 
        line_number += 1
        print('{:>4} {}'.format(line_number, a_line.rstrip()))

the stream object is also an iterator which spits out a single line every time you ask for a value.

The format specifier {:>4} means “print this argument right-justified within 4 spaces.” 

The rstrip() string method removes the trailing whitespace, including the carriage return characters.

35、Reading a “string” from a text file only works because you told Python what encoding to use to read a stream of bytes and convert it to a string. 

36、Opening a file in binary mode is simple but subtle. The only difference from opening it in text mode is that the mode parameter contains a 'b' character. a binary stream object has no encoding attribute.

Reading a file in “binary” mode? You’ll get a stream of bytes. Fetching a web page? Calling a web API ? They return a stream of bytes, too.

37、Since you opened the file in binary mode, the read() method takes the number of bytes to read, not the number of characters.

38、As long as your functions take a stream object and simply call the object’s read() method, you can handle any input source that acts like a file, without specific code to handle each kind of input.

39、io.StringIO lets you treat a string as a text file. There’s also a io.BytesIO class, which lets you treat a byte array as a binary file.

40、The gzip module lets you create a stream object for reading or writing a gzip-compressed file.

You should always open gzipped files in binary mode.

import gzip with gzip.open('out.log.gz', mode='wb') as z_file:       z_file.write('A nine mile walk is no joke, especially in the rain.'.encode('utf-8'))

41、The print function adds a carriage return to the end of the string you’re printing, and calls sys.stdout.write.

42、Any class can be a context manager by defining two special methods: __enter__() and __exit__().

43、The ElementTree library is part of the Python standard library, in xml.etree.ElementTree

ElementTree represents XML elements as {namespace}localname.

In the ElementTree API, an element acts like a list. The items of the list are the element’s children.

XML isn’t just a collection of elements; each element can also have its own set of attributes(.attrib). Once you have a reference to a specific element, you can easily get its attributes as a Python dictionary.

In a boolean context, ElementTree element objects will evaluate to False if they contain no children (i.e. if len(element) is 0). This means that if element.find('...') is not testing whether the find() method found a matching element; it’s testing whether that matching element has any child elements! To test whether the find() method returned an element, use if element.find('...') is not None

44、The time module contains a data structure (time_struct) to represent a point in time (accurate to one

millisecond) and functions to manipulate time structs. The strptime() function takes a formatted string an

converts it to a time_struct. 

45、The dump() function in the pickle module takes a serializable Python data structure, serializes it into a

binary, Python-specific format using the latest version of the pickle protocol, and saves it to an open file.

The pickle.load() function takes a stream object, reads the serialized data from the stream, creates a new Python object, recreates the serialized data in the new Python object, and returns the new Python object.

The pickle.dumps() function (note the 's' at the end of the function name) performs the same serialization as the pickle.dump() function. Instead of taking a stream object and writing the serialized data to a file on disk, it simply returns the serialized data.

The pickle.loads() function (again, note the 's' at the end of the function name) performs the same deserialization as the pickle.load() function. Instead of taking a stream object and reading the serialized data from a file, it takes a bytes object containing serialized data, such as the one returned by the pickle.dumps() function.

46、Like the pickle module, the json module defines a dump() function which takes a Python data structure

and a writeable stream object. The dump() function serializes the Python data structure and writes it to the

stream object. Doing this inside a with statement will ensure that the file is closed properly when we’re


JSON is a text-based format. Always open JSON files in text mode with a UTF -8 character encoding.

JSON doesn’t distinguish between tuples and lists; it only has a single list-like datatype, the array, and the json module silently converts both tuples and lists into JSON arrays during serialization. For most uses, you can ignore the difference between tuples and lists, but it’s something to keep in mind as you work with the json module.

47、The time.asctime() function will convert that nasty-looking time.struct_time into the string 'Fri Mar 27 22:20:42 2009'.

We can use the list() function to convert the bytes object into a list of integers. So b'\xDE\xD5\xB4\xF8' becomes [222, 213, 180, 248]. 

import customserializer   with open('entry.json', 'w', encoding='utf-8') as f:        json.dump(entry, f, default=customserializer.to_json)  #shell 1

with open('entry.json', 'r', encoding='utf-8') as f:      entry = json.load(f, object_hook=customserializer.from_json)      #shell 2

48、The urllib.request.urlopen().read() method always returns a bytes object, not a string. Remember, bytes are bytes;characters are an abstraction. HTTP servers don’t deal in abstractions. If you request a resource, you get bytes. If you want it as a string, you’ll need to determine the character encoding and explicitly convert it to a string.

The response returned from the urllib.request.urlopen() function contains all the HTTP headers the server sent back. download the actual data by calling response.read()

49、The primary interface to httplib2 is the Http object.Once you have an Http object, retrieving data is as simple as calling the request() method with the address of the data you want. This will issue an HTTP GET request for that URL . 

The request() method returns two values. The first is an httplib2.Response object, which contains all the HTTP headers the server returned. For example, a status code of 200 indicates that the request was successful.

 The content variable contains the actual data that was returned by the HTTP server. The data is returned as a bytes object, not a string. If you want it as a string, you’ll need to determine the character encoding and convert it yourself.

you should always create an httplib2.Http object with a directory name. Caching is the reason.

httplib2 allows you to add arbitrary HTTP headers to any outgoing request. In order to bypass all caches (not just your local disk cache, but also any caching proxies between you and the remote server), add a 

no-cache header in the headers dictionary.

HTTP defines Last-Modified and Etag headers for this purpose. These headers are called validators. If the local cache is no longer fresh(expired), a client can send the validators with the next request to see if the data has actually changed. If the data hasn’t changed, the server sends back a 304 status code and no data. 

which caused httplib2 to look in its cache.

httplib2 sends the ETag validator back to the server in the If-None-Match header.  httplib2 also sends the Last-Modified validator back to the server in the If-Modified-Since header.

50、Python comes with a utility function to URL -encode a dictionary: urllib.parse.urlencode()

Store your username and password with the add_credentials() method. 

51、If Python sees an __init__.py file in a directory, it assumes that all of the files in that directory are part of the same module. The module’s name is the name of the directory. Files within the directory can reference other files within the same directory, or even within subdirectories.  But the entire collection of files is presented to other Python code as a single module — as if all the functions and classes were in a single .py file.

The __init__.py file doesn’t need to define anything; it can literally be an empty file. Or you can use it to define your main entry point functions. Or you put all your functions in it. Or all but one.

A directory with an __init__.py file is always treated as a multi-file module. Without an __init__.py file, a directory is just a directory of unrelated .py files.

52、Within lists, tuples, sets, and dictionaries, whitespace can appear before and after commas with no ill effects.

53、In Python 2, the global file() function was an alias for the open() function, which was the standard way of opening text files for reading. In Python 3, the global file() function no longer exists, but the open() function still exists.

54、In Python 2, a string was an array of bytes whose character encoding was tracked separately. If you wanted Python 2 to keep track of the character encoding, you had to use a Unicode string (u'') instead. But in Python 3, a string is always what Python 2 called a Unicode string — that is, an array of Unicode characters (of possibly varying byte lengths). 

55. 类似c++ operator() 即函数对象的功能:假设login 现在是个Form对象,login() 会调用__call__()函数

即深拷贝了一份自身并返回,故可以看见 login_form=login() 的用法。

56. yield 与协程的关系

传统的生产者-消费者模型是一个线程写消息,一个线程取消息,通过锁机制控制队列和等待,但一不小心就可能死锁。如果改用协程,生产者生产消息后,直接通过yield 跳转到消费者开始执行,待消费者执行完毕后,切换回生产者继续生产,效率极高:


Resumes the execution and “sends” a value into the generator function. The value argument becomes the result of the current yield expression. The send() method returns the next value yielded by the generator, or raises StopIteration if the generator exits without yielding another value. When send() is called to start the generator, it must be called with None as the argument, because there is no yield expression that could receive the value.

import time

def consumer():
    r = ''
    while True:
        n = yield r
        if not n:
        print('[CONSUMER] Consuming %s...' % n)
        r = '200 OK'

def produce(c):
    c.next()   # 使用 send(None) 或 next() 启动协程
    n = 0
    while n < 5:
        n = n + 1
        print('[PRODUCER] Producing %s...' % n)
        r = c.send(n) # 向协程发送消息,使其恢复执行
        print('[PRODUCER] Consumer return: %s' % r)
    c.close() # 关闭协程,使其退出。或 c.throw() 使其引发异常

if __name__=='__main__':
    c = consumer() # 函数返回协程对象

57. yield 与 contextmanage 的关系

def some_generator(<arguments>):
        yield <value>
with some_generator(<arguments>) as <variable>:

        <variable> = <value>
>>> import pymongo
>>> class Operation(object):
...     def __init__(self, database,
...                  host='localhost', port=27017):
...         self._db = pymongo.MongoClient(
...                       host, port)[database]
...     def __enter__(self):
...         return self._db
...     def __exit__(self, exc_type, exc_val, exc_tb):
...         self._db.connection.disconnect()
>>> with Operation(database='test') as db:
...     print db.test.find_one()

>>> @contextmanager
... def operation(database, host='localhost', 
...     db = pymongo.MongoClient(host, port)[database]
...     yield db
...     db.connection.disconnect()
>>> import pymongo
>>> with operation('test') as db:
...     print(db.test.find_one())

摘自《dive into python3》




0 条评论
登录 后参与评论



PAT 甲级 1025 PAT Ranking

1025. PAT Ranking (25) 时间限制 200 ms 内存限制 65536 kB 代码长度限制 16000 B 判题...




Spring Boot系列——Spring Boot如何启动

​上篇《Spring Boot系列——5分钟构建一个应用》介绍了如何快速创建一个Spring Boot项目并运行。虽然步骤少流程简单,为开发者省去了很多重复性的...








Kotlin集成 SpringBoot 混合Java库开发

apply plugin: 'org.springframework.boot' apply plugin: 'kotlin'


springboot + jpa + redis + hibernate validator + 后台抽象



Capturing Packets in Linux at a Speed of Millions of PPS

My article will tell you how to accept 10 million packets per second without usi...


pip install xxxx报错(一大堆红色exception)【解决】

  File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 215, in main

来自专栏Ryan Miao


问题 使用@RequestBody接收一个json数据的时候,如果传入的参数不符合条件,就会直接返回400的error page. 但究竟是为什么会400并没有...