前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Python写的Python解释器(七)--完结篇

Python写的Python解释器(七)--完结篇

作者头像
哒呵呵
发布2018-08-06 11:31:59
5110
发布2018-08-06 11:31:59
举报
文章被收录于专栏:鸿的学习笔记

编译自:http://www.aosabook.org/en/500L/a-python-interpreter-written-in-python.html 作者:Taavi Burns 翻译:鸿 如有翻译问题或建议,请公众号留言

Byterun

Byterun中有四种对象:

  • VirtualMachine类,它是最高级别结构,拥有frame的call stack,并且包含指令到操作的映射。
  • Frame类。每个Frame实例都有一个代码对象并管理一些其他必要的状态标识,例如全局和本地命名空间,以及调用frame的引用以及执行的最后一个字节码。
  • Function类,用来代替真正的Python函数。调用函数会在解释器中创建一个新frame。
  • Block类,它包含了block的三个属性。
The VirtualMachine Class

在程序运行时,只会创建一次VirtualMachine实例,这是因为只有一个Python解释器。 VirtualMachine存储着call stack,异常状态以及在frame之间的返回值。执行代码的入口是run_code方法,它将已经编译的代码对象作为参数。刚开始就会设置和运行一个frame。 这个frame可能会创建其他frame; 当第一个frame返回时,程序执行结束。

代码语言:javascript
复制
class VirtualMachineError(Exception):
    pass
class VirtualMachine(object):
    def __init__(self):
        self.frames = []   # The call stack of frames.
        self.frame = None  # The current frame.
        self.return_value = None
        self.last_exception = None
    def run_code(self, code, global_names=None, local_names=None):
        """ An entry point to execute code using the virtual machine."""
        frame = self.make_frame(code, global_names=global_names, 
                                local_names=local_names)
        self.run_frame(frame)
The Frame Class

frame类只是属性的集合。这些属性包括由编译器创建的代码对象; 本地,全局和内置命名空间; 对前一个frame的引用、data stack、block stack。

代码语言:javascript
复制
class Frame(object):
    def __init__(self, code_obj, global_names, local_names, prev_frame):
        self.code_obj = code_obj
        self.global_names = global_names
        self.local_names = local_names
        self.prev_frame = prev_frame
        self.stack = []
        if prev_frame:
            self.builtin_names = prev_frame.builtin_names
        else:
            self.builtin_names = local_names['__builtins__']
            if hasattr(self.builtin_names, '__dict__'):
                self.builtin_names = self.builtin_names.__dict__
        self.last_instruction = 0
        self.block_stack = []

现在增加对frame的操作到虚拟机。这里有三个有关frame的辅助函数:一个用于创建新的frame,另外两个用于在frame stacke上推入和弹出frame。 第四个函数run_frame执行frame类的主要工作。

代码语言:javascript
复制
class VirtualMachine(object):
    [... snip ...]
    # Frame manipulation
    def make_frame(self, code, callargs={}, global_names=None, local_names=None):
        if global_names is not None and local_names is not None:
            local_names = global_names
        elif self.frames:
            global_names = self.frame.global_names
            local_names = {}
        else:
            global_names = local_names = {
                '__builtins__': __builtins__,
                '__name__': '__main__',
                '__doc__': None,
                '__package__': None,
            }
        local_names.update(callargs)
        frame = Frame(code, global_names, local_names, self.frame)
        return frame
    def push_frame(self, frame):
        self.frames.append(frame)
        self.frame = frame
    def pop_frame(self):
        self.frames.pop()
        if self.frames:
            self.frame = self.frames[-1]
        else:
            self.frame = None
    def run_frame(self):
        pass
        # we'll come back to this shortly
The Function Class

函数对象的实现需要注意的是调用函数(使用call方法)会创建一个新的Frame对象并开始运行它。

代码语言:javascript
复制
class Function(object):
    """
    Create a realistic function object, defining the things the interpreter expects.
    """
    __slots__ = [
        'func_code', 'func_name', 'func_defaults', 'func_globals',
        'func_locals', 'func_dict', 'func_closure',
        '__name__', '__dict__', '__doc__',
        '_vm', '_func',
    ]
    def __init__(self, name, code, globs, defaults, closure, vm):
        """You don't need to follow this closely to understand the interpreter."""
        self._vm = vm
        self.func_code = code
        self.func_name = self.__name__ = name or code.co_name
        self.func_defaults = tuple(defaults)
        self.func_globals = globs
        self.func_locals = self._vm.frame.f_locals
        self.__dict__ = {}
        self.func_closure = closure
        self.__doc__ = code.co_consts[0] if code.co_consts else None
        # Sometimes, we need a real Python function.  This is for that.
        kw = {
            'argdefs': self.func_defaults,
        }
        if closure:
            kw['closure'] = tuple(make_cell(0) for _ in closure)
        self._func = types.FunctionType(code, globs, **kw)
    def __call__(self, *args, **kwargs):
        """When calling a Function, make a new frame and run it."""
        callargs = inspect.getcallargs(self._func, *args, **kwargs)
        # Use callargs to provide a mapping of arguments: values to pass into the new 
        # frame.
        frame = self._vm.make_frame(
            self.func_code, callargs, self.func_globals, {}
        )
        return self._vm.run_frame(frame)
    def make_cell(value):
        """Create a real Python closure and grab a cell."""
        # Thanks to Alex Gaynor for help with this bit of twistiness.
        fn = (lambda x: lambda: x)(value)
        return fn.__closure__[0]

回到VirtualMachine实例,再增加一些函数操作函数对象:

代码语言:javascript
复制
class VirtualMachine(object):
    [... snip ...]
    # Data stack manipulation
    def top(self):
        return self.frame.stack[-1]
    def pop(self):
        return self.frame.stack.pop()
    def push(self, *vals):
        self.frame.stack.extend(vals)
    def popn(self, n):
        """Pop a number of values from the value stack.
        A list of `n` values is returned, the deepest value first.
        """
        if n:
            ret = self.frame.stack[-n:]
            self.frame.stack[-n:] = []
            return ret
        else:
            return []

首先,parse_byte_and_args获取一个字节码,检查它是否有参数,如果有,则解析该参数。此方法还会更新frame的last_instruction属性。如果没有参数的话,单个指令是一个字节,有一个参数是三个字节,它们的最后两个字节是参数。每条指令参数的含义取决于它是哪一条指令。例如,对于POP_JUMP_IF_FALSE,指令参数是跳转目标。对于BUILD_LIST,指令是列表中元素的数量。LOAD_CONST,指令参数是常量列表的索引。

代码语言:javascript
复制
class VirtualMachine(object):
    [... snip ...]
    def parse_byte_and_args(self):
        f = self.frame
        opoffset = f.last_instruction
        byteCode = f.code_obj.co_code[opoffset]
        f.last_instruction += 1
        byte_name = dis.opname[byteCode]
        if byteCode >= dis.HAVE_ARGUMENT:
            # index into the bytecode
            arg = f.code_obj.co_code[f.last_instruction:f.last_instruction+2]  
            f.last_instruction += 2   # advance the instruction pointer
            arg_val = arg[0] + (arg[1] * 256)
            if byteCode in dis.hasconst:   # Look up a constant
                arg = f.code_obj.co_consts[arg_val]
            elif byteCode in dis.hasname:  # Look up a name
                arg = f.code_obj.co_names[arg_val]
            elif byteCode in dis.haslocal: # Look up a local name
                arg = f.code_obj.co_varnames[arg_val]
            elif byteCode in dis.hasjrel:  # Calculate a relative jump
                arg = f.last_instruction + arg_val
            else:
                arg = arg_val
            argument = [arg]
        else:
            argument = []
        return byte_name, argument

下一个方法是dispatch,查找给定的指令对应的操作并执行。 在CPython解释器中是通过一个超过1500行的switch语句完成的!但是这次为每个字节名称定义一个方法,然后使用getattr来查找它。

代码语言:javascript
复制
class VirtualMachine(object):
    [... snip ...]
    def dispatch(self, byte_name, argument):
        """ Dispatch by bytename to the corresponding methods.
        Exceptions are caught and set on the virtual machine."""
        # When later unwinding the block stack,
        # we need to keep track of why we are doing it.
        why = None
        try:
            bytecode_fn = getattr(self, 'byte_%s' % byte_name, None)
            if bytecode_fn is None:
                if byte_name.startswith('UNARY_'):
                    self.unaryOperator(byte_name[6:])
                elif byte_name.startswith('BINARY_'):
                    self.binaryOperator(byte_name[7:])
                else:
                    raise VirtualMachineError(
                        "unsupported bytecode type: %s" % byte_name
                    )
            else:
                why = bytecode_fn(*argument)
        except:
            # deal with exceptions encountered while executing the op.
            self.last_exception = sys.exc_info()[:2] + (None,)
            why = 'exception'
        return why
    def run_frame(self, frame):
        """Run a frame until it returns (somehow).
        Exceptions are raised, the return value is returned.
        """
        self.push_frame(frame)
        while True:
            byte_name, arguments = self.parse_byte_and_args()
            why = self.dispatch(byte_name, arguments)
            # Deal with any block management we need to do
            while why and frame.block_stack:
                why = self.manage_block_stack(why)
            if why:
                break
        self.pop_frame()
        if why == 'exception':
            exc, val, tb = self.last_exception
            e = exc(val)
            e.__traceback__ = tb
            raise e
        return self.return_value
The Block Class

block类用于某些类型的流量控制,特别是异常处理和循环。block类负责在操作完成时确保data stack处于适当的状态。为了跟踪额外的信息,解释器会设置一个标志来表示其状态。这个标志是一个叫做why的变量,有着None或字符串“continue”,“break”,“exception”或“return”。表示block stack和data stack应该执行什么样的操作。

代码语言:javascript
复制
Block = collections.namedtuple("Block", "type, handler, stack_height")
class VirtualMachine(object):
    [... snip ...]
    # Block stack manipulation
    def push_block(self, b_type, handler=None):
        stack_height = len(self.frame.stack)
        self.frame.block_stack.append(Block(b_type, handler, stack_height))
    def pop_block(self):
        return self.frame.block_stack.pop()
    def unwind_block(self, block):
        """Unwind the values on the data stack corresponding to a given block."""
        if block.type == 'except-handler':
            # The exception itself is on the stack as type, value, and traceback.
            offset = 3  
        else:
            offset = 0
        while len(self.frame.stack) > block.level + offset:
            self.pop()
        if block.type == 'except-handler':
            traceback, value, exctype = self.popn(3)
            self.last_exception = exctype, value, traceback
    def manage_block_stack(self, why):
        """ """
        frame = self.frame
        block = frame.block_stack[-1]
        if block.type == 'loop' and why == 'continue':
            self.jump(self.return_value)
            why = None
            return why
        self.pop_block()
        self.unwind_block(block)
        if block.type == 'loop' and why == 'break':
            why = None
            self.jump(block.handler)
            return why
        if (block.type in ['setup-except', 'finally'] and why == 'exception'):
            self.push_block('except-handler')
            exctype, value, tb = self.last_exception
            self.push(tb, value, exctype)
            self.push(tb, value, exctype) # yes, twice
            why = None
            self.jump(block.handler)
            return why
        elif block.type == 'finally':
            if why in ('return', 'continue'):
                self.push(self.return_value)
            self.push(why)
            why = None
            self.jump(block.handler)
            return why
        return why
The Instructions

下面就是几十种字节码方法。更多的请去GitHub去找:

代码语言:javascript
复制
class VirtualMachine(object):
    [... snip ...]
    ## Stack manipulation
    def byte_LOAD_CONST(self, const):
        self.push(const)
    def byte_POP_TOP(self):
        self.pop()
    ## Names
    def byte_LOAD_NAME(self, name):
        frame = self.frame
        if name in frame.f_locals:
            val = frame.f_locals[name]
        elif name in frame.f_globals:
            val = frame.f_globals[name]
        elif name in frame.f_builtins:
            val = frame.f_builtins[name]
        else:
            raise NameError("name '%s' is not defined" % name)
        self.push(val)
    def byte_STORE_NAME(self, name):
        self.frame.f_locals[name] = self.pop()
    def byte_LOAD_FAST(self, name):
        if name in self.frame.f_locals:
            val = self.frame.f_locals[name]
        else:
            raise UnboundLocalError(
                "local variable '%s' referenced before assignment" % name
            )
        self.push(val)
    def byte_STORE_FAST(self, name):
        self.frame.f_locals[name] = self.pop()
    def byte_LOAD_GLOBAL(self, name):
        f = self.frame
        if name in f.f_globals:
            val = f.f_globals[name]
        elif name in f.f_builtins:
            val = f.f_builtins[name]
        else:
            raise NameError("global name '%s' is not defined" % name)
        self.push(val)
    ## Operators
    BINARY_OPERATORS = {
        'POWER':    pow,
        'MULTIPLY': operator.mul,
        'FLOOR_DIVIDE': operator.floordiv,
        'TRUE_DIVIDE':  operator.truediv,
        'MODULO':   operator.mod,
        'ADD':      operator.add,
        'SUBTRACT': operator.sub,
        'SUBSCR':   operator.getitem,
        'LSHIFT':   operator.lshift,
        'RSHIFT':   operator.rshift,
        'AND':      operator.and_,
        'XOR':      operator.xor,
        'OR':       operator.or_,
    }
    def binaryOperator(self, op):
        x, y = self.popn(2)
        self.push(self.BINARY_OPERATORS[op](x, y))
    COMPARE_OPERATORS = [
        operator.lt,
        operator.le,
        operator.eq,
        operator.ne,
        operator.gt,
        operator.ge,
        lambda x, y: x in y,
        lambda x, y: x not in y,
        lambda x, y: x is y,
        lambda x, y: x is not y,
        lambda x, y: issubclass(x, Exception) and issubclass(x, y),
    ]
    def byte_COMPARE_OP(self, opnum):
        x, y = self.popn(2)
        self.push(self.COMPARE_OPERATORS[opnum](x, y))
    ## Attributes and indexing
    def byte_LOAD_ATTR(self, attr):
        obj = self.pop()
        val = getattr(obj, attr)
        self.push(val)
    def byte_STORE_ATTR(self, name):
        val, obj = self.popn(2)
        setattr(obj, name, val)
    ## Building
    def byte_BUILD_LIST(self, count):
        elts = self.popn(count)
        self.push(elts)
    def byte_BUILD_MAP(self, size):
        self.push({})
    def byte_STORE_MAP(self):
        the_map, val, key = self.popn(3)
        the_map[key] = val
        self.push(the_map)
    def byte_LIST_APPEND(self, count):
        val = self.pop()
        the_list = self.frame.stack[-count] # peek
        the_list.append(val)
    ## Jumps
    def byte_JUMP_FORWARD(self, jump):
        self.jump(jump)
    def byte_JUMP_ABSOLUTE(self, jump):
        self.jump(jump)
    def byte_POP_JUMP_IF_TRUE(self, jump):
        val = self.pop()
        if val:
            self.jump(jump)
    def byte_POP_JUMP_IF_FALSE(self, jump):
        val = self.pop()
        if not val:
            self.jump(jump)
    ## Blocks
    def byte_SETUP_LOOP(self, dest):
        self.push_block('loop', dest)
    def byte_GET_ITER(self):
        self.push(iter(self.pop()))
    def byte_FOR_ITER(self, jump):
        iterobj = self.top()
        try:
            v = next(iterobj)
            self.push(v)
        except StopIteration:
            self.pop()
            self.jump(jump)
    def byte_BREAK_LOOP(self):
        return 'break'
    def byte_POP_BLOCK(self):
        self.pop_block()
    ## Functions
    def byte_MAKE_FUNCTION(self, argc):
        name = self.pop()
        code = self.pop()
        defaults = self.popn(argc)
        globs = self.frame.f_globals
        fn = Function(name, code, globs, defaults, None, self)
        self.push(fn)
    def byte_CALL_FUNCTION(self, arg):
        lenKw, lenPos = divmod(arg, 256) # KWargs not supported here
        posargs = self.popn(lenPos)
        func = self.pop()
        frame = self.frame
        retval = func(*posargs)
        self.push(retval)
    def byte_RETURN_VALUE(self):
        self.return_value = self.pop()
        return "return"
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2018-05-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 鸿的学习笔记 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Byterun
    • The VirtualMachine Class
      • The Function Class
        • The Block Class
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档