Python终极调试指南

深度学习与Python

发布于 2020-07-27 10:51:30

7110

发布于 2020-07-27 10:51:30

文章被收录于专栏：深度学习与python

作者 | Martin Heinz

译者 | 王坤祥

策划 | 蔡芳芳

本文介绍了一些 Python 调试的高级技巧。如果你还在像新手一样无脑 print 调试，那么赶紧向大牛学习一下如何优雅地调试 Python 代码吧。

本文最初发布于 martinheinz.dev 网站，经原作者授权由 InfoQ 中文站翻译并分享。

作为经验丰富的开发人员，即便你编写了清晰易读的代码，并对代码进行了全方位的测试，但在某些时候程序还是会不可避免地出现一些奇怪的 Bug，这时候你就需要以某种方式 Debug。不少程序员喜欢使用一堆 print 语句来查看代码运行情况。这种方法有点低级，太傻瓜了；实际上有很多更好的方法来帮你定位代码中的问题，我们将在本文中介绍这些方法。

使用 Logging 模块

如果你编写的应用程序没有使用日志功能，那你终究会后悔没有及时用它的。如果应用程序中没有打印任何运行日志，就很难对程序错误进行故障定位及排除。幸运的是在 Python 中，我们很容易配置基本的日志模块：

import logging
logging.basicConfig(
    filename='application.log',
    level=logging.WARNING,
    format= '[%(asctime)s] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s',
    datefmt='%H:%M:%S'
)
logging.error("Some serious error occurred.")
logging.warning('Function you are using is deprecated.')

这就是开始将日志写入文件所需的全部操作，使用时，你可以通过

logging.getLoggerClass().root.handlers[0].baseFilename 找到文件的路径：

[12:52:35] {<stdin>:1} ERROR - Some serious error occurred.
[12:52:35] {<stdin>:1} WARNING - Function you are using is deprecated.

这种设置看起来似乎已经足够好了（通常是这样），但是配置合理、格式清晰、可读性强的日志可以让你 Debug 起来更加轻松。优化日志配置的一种方法是使用.ini 或.yaml 配置文件。下面给你推荐一种配置示例：

version: 1
disable_existing_loggers: true
formatters:
  standard:
    format: "[%(asctime)s] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s"
    datefmt: '%H:%M:%S'
handlers:
  console:  # handler which will log into stdout
    class: logging.StreamHandler
    level: DEBUG
    formatter: standard  # Use formatter defined above
    stream: ext://sys.stdout
  file:  # handler which will log into file
    class: logging.handlers.RotatingFileHandler
    level: WARNING
    formatter: standard  # Use formatter defined above
    filename: /tmp/warnings.log
    maxBytes: 10485760 # 10MB
    backupCount: 10
    encoding: utf8
root:  # Loggers are organized in hierarchy - this is the root logger config
  level: ERROR
  handlers: [console, file]  # Attaches both handler defined above
loggers:  # Defines descendants of root logger
  mymodule:  # Logger for "mymodule"
    level: INFO
    handlers: [file]  # Will only use "file" handler defined above
    propagate: no  # Will not propagate logs to "root" logger

在 python 代码中使用这种通用的配置将很难编辑和维护。将配置内容保存在 YAML 文件中，通过加载配置文件的形式，我们就可以避免上述问题，后续也可以很轻松地修改日志配置。

如果你想知道所有这些配置字段的含义，可以查看这篇文档，它们中的大多数只是关键字参数，如上面的示例所示。

我们已经在配置文件中定义好了日志组件的相关配置，接下来我们需要以某种方式加载该配置。如果使用的是 YAML 配置文件，最简单地加载配置的方法如下所示：

import yaml
from logging import config
with open("config.yaml", 'rt') as f:
    config_data = yaml.safe_load(f.read())
    config.dictConfig(config_data)

Python logger 实际上并不直接支持 YAML 文件，但它支持字典配置，可以使用 yaml.safe_load 从 YAML 文件轻松创建字典配置。如果你倾向于使用.ini 文件，那么我只想指出，对于新应用程序，很多文档都建议使用字典配置。有关更多示例，可以查看使用手册。

使用日志装饰器

继续前面讲到的日志模块技巧。你可能会遇到这么一种情况，就是想 debug 函数调用执行的情况。你可以使用日志装饰器，无需修改函数主体代码即可实现：

from functools import wraps, partial
import logging
def attach_wrapper(obj, func=None):  # Helper function that attaches function as attribute of an object
    if func is None:
        return partial(attach_wrapper, obj)
    setattr(obj, func.__name__, func)
    return func
def log(level, message):  # Actual decorator
    def decorate(func):
        logger = logging.getLogger(func.__module__)  # Setup logger
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler = logging.StreamHandler()
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        log_message = f"{func.__name__} - {message}"
        @wraps(func)
        def wrapper(*args, **kwargs):  # Logs the message and before executing the decorated function
            logger.log(level, log_message)
            return func(*args, **kwargs)
        @attach_wrapper(wrapper)  # Attaches "set_level" to "wrapper" as attribute
        def set_level(new_level):  # Function that allows us to set log level
            nonlocal level
            level = new_level
        @attach_wrapper(wrapper)  # Attaches "set_message" to "wrapper" as attribute
        def set_message(new_message):  # Function that allows us to set message
            nonlocal log_message
            log_message = f"{func.__name__} - {new_message}"
        return wrapper
    return decorate
# Example Usage
@log(logging.WARN, "example-param")
def somefunc(args):
    return args
somefunc("some args")
somefunc.set_level(logging.CRITICAL)  # Change log level by accessing internal decorator function
somefunc.set_message("new-message")  # Change log message by accessing internal decorator function
somefunc("some args")

说实话，这可能需要花一些时间来装饰被调用函数（实际上，你需要做的仅仅是复制粘贴一下就好了）。它的巧妙之处在于通过 log 函数设置参数，并将参数用于内部 wrapper 函数。然后，通过添加附加到装饰器的访问器函数使这些参数可调。至于 functools.wraps 装饰器，如果我们在这里不使用它，被装饰的函数的名称（func .name）将被装饰器的名称所覆盖。在这里我们需要 functools.wraps 装饰器，因为我们 debug 时要使用真实的函数名称。它的原理是拷贝原始函数名称、函数文档描述以及参数列表到装饰器函数上。

下面就是上面代码的输出。看起来很整洁吧？

2020-05-01 14:42:10,289 - __main__ - WARNING - somefunc - example-param
2020-05-01 14:42:10,289 - __main__ - CRITICAL - somefunc - new-message

重写对象的 repr

可以在类中添加 repr 方法来改进一下代码，使其更易于调试。它的功能就是返回类实例的字符串表示形式。repr 方法的最佳实践是输出可用于重新创建实例的文本。例如：

class Circle:
    def __init__(self, x, y, radius):
        self.x = x
        self.y = y
        self.radius = radius
    def __repr__(self):
        return f"Rectangle({self.x}, {self.y}, {self.radius})"
...
c = Circle(100, 80, 30)
repr(c)
# Circle(100, 80, 30)

如果不希望或不能像上面那样表示对象，另一个好的方法是使用<...>表示，例如<_io.TextIOWrapper name='somefile.txt' mode='w' encoding='UTF-8'>。

除了 repr 以外，重写 str 方法也是一个好方法，该方法在使用 print(instance) 时被默认调用。使用这两种方法，你只需打印变量即可获得很多信息。

重写字典类的 missing 方法

如果出于某种原因你需要实现自定义字典类，那么当你尝试访问实际上不存在的键时，可能会因 KeyErrors 引起一些错误。为了避免在 debug 代码时没有头绪，可以实现 missing 这一特殊方法，该方法在每次引发 KeyError 时都会被调用。

class MyDict(dict):
    def __missing__(self, key):
        message = f'{key} not present in the dictionary!'
        logging.warning(message)
        return message  # Or raise some error instead

上面的实现非常简单，仅返回并记录缺少键的消息，但是你也可以记录其他有价值的信息，以便在代码出问题时给你提供更多上下文参考。

调试崩溃的应用程序

如果应用程序崩溃后你才有机会查看其中发生的情况，那么你可能会发现下面这个技巧非常有用。

你需要使用 -i 参数（python3 -i app.py）运行应用程序，该参数会使程序在退出后立即启动并进入交互式 shell。此时，你可以检查当前环境下的变量和函数。

如果这还不够好，那么你可以使用更厉害的 pdb，即 Python Debugger。pdb 具有很多功能，这些功能可以撰写一篇长文来介绍了。下面给出一个示例，我只摘抄了最重要的部分。首先让我们看一下崩溃的脚本：

# crashing_app.py
SOME_VAR = 42
class SomeError(Exception):
    pass
def func():
    raise SomeError("Something went wrong...")
func()

现在，如果我们使用 -i 参数运行它，我们将有机会对其进行调试：

# Run crashing application
~ $ python3 -i crashing_app.py
Traceback (most recent call last):
  File "crashing_app.py", line 9, in <module>
    func()
  File "crashing_app.py", line 7, in func
    raise SomeError("Something went wrong...")
__main__.SomeError: Something went wrong...
>>> # We are interactive shell
>>> import pdb
>>> pdb.pm()  # start Post-Mortem debugger
> .../crashing_app.py(7)func()
-> raise SomeError("Something went wrong...")
(Pdb) # Now we are in debugger and can poke around and run some commands:
(Pdb) p SOME_VAR  # Print value of variable
42
(Pdb) l  # List surrounding code we are working with
  2
  3   class SomeError(Exception):
  4       pass
  5
  6   def func():
  7  ->     raise SomeError("Something went wrong...")
  8
  9   func()
[EOF]
(Pdb)  # Continue debugging... set breakpoints, step through the code, etc.

上面的调试会话非常清晰地显示了可以使用 pdb 进行的操作。程序终止后，我们进入交互式调试会话。首先，我们导入 pdb 并启动调试器。此时我们可以使用所有的 pdb 命令。在上面的示例中，我们使用 p 命令打印变量，并使用 l 命令列出代码。大多数时候，你可能希望设置断点，可以使用 b LINE_NO 来设置断点，然后运行程序直到断点（c）被暂停，然后继续使用 s 逐步执行该函数，还可以选择使用 w 打印堆栈信息。有关命令的完整列表，可以查阅 pdb 使用文档。

检查堆栈信息

假设你的代码是在远程服务器上运行的 Flask 或 Django 应用程序，你是无法获得交互式调试会话的。在这种情况下，你可以借助 traceback 和 sys 软件包来更深入地了解代码中发生的异常：

import traceback
import sys
def func():
    try:
        raise SomeError("Something went wrong...")
    except:
        traceback.print_exc(file=sys.stderr)

运行后，上面的代码将打印最后引发的异常。除了打印异常信息，还可以使用 traceback 包打印堆栈信息（traceback.print_stack()）或提取原始堆栈帧，对其格式化并进一步检查（traceback.format_list(traceback.extract_stack())）。

调试过程中重新加载模块

有时你可能正在调试或在交互式 Shell 中对一些方法函数进行测试，并对其进行一些修改。为了简化代码的运行 / 测试和修改过程，可以运行 importlib.reload(module) 以避免每次更改后都必须重新启动交互式会话：

>>> import func from module
>>> func()
"This is result..."
# Make some changes to "func"
>>> func()
"This is result..."  # Outdated result
>>> from importlib import reload; reload(module)  # Reload "module" after changes made to "func"
>>> func()
"New result..."

这一技巧更多是为了提高效率。它可以帮助你跳过一些不必要的步骤，让你的工作更快、更高效。实时重新加载模块这一功能经常很好用，因为它可以帮助你避免调试已经修改过很多次的代码，节省宝贵时间。

作者介绍：

Martin Heinz，开发运维工程师，现就职于 IBM。

参考阅读：

https://martinheinz.dev/blog/24

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2020-07-22，如有侵权请联系 cloudcommunity@tencent.com 删除

python

本文分享自 InfoQ 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

python

登录后参与评论

0 条评论

热度

Python终极调试指南

Python终极调试指南

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐