前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >the-super-tiny-compiler

the-super-tiny-compiler

作者头像
tinyant
发布2022-11-16 14:27:12
4950
发布2022-11-16 14:27:12
举报
文章被收录于专栏:webpack@4.46.0源码分析

bable官网推荐的 compiler 原理(实现了一个小型的compiler),本文主要是摘抄思想并理解。

For an awesome tutorial on compilers, check out the-super-tiny-compiler, which also explains how Babel itself works on a high level.

git地址:https://github.com/jamiebuilds/the-super-tiny-compiler

本文案例:将lisp语言描述转为c语言描述

代码语言:javascript
复制
                   LISP                      C
 
    2 + 2          (add 2 2)                 add(2, 2)
    4 - 2          (subtract 4 2)            subtract(4, 2)
    2 + (4 - 2)    (add 2 (subtract 4 2))    add(2, subtract(4, 2))

compiler的流程

Most compilers break down into three primary stages: Parsing, Transformation, and Code Generation

  1. Parsing is taking raw code and turning it into a more abstract representation of the code.
  2. Transformation takes this abstract representation and manipulates to do whatever the compiler wants it to.
  3. Code Generation takes the transformed representation of the code and turns it into new code.

1. Parsing

Parsing typically gets broken down into two phases: Lexical Analysis and Syntactic Analysis.

Lexical Analysis

  • Takes the raw code and splits it apart into these things called tokens by a thing called a tokenizer (or lexer).
  • Tokens are an array of tiny little objects that describe an isolated piece of the syntax. They could be numbers, labels, punctuation, operators, whatever.

Syntactic Analysis

  • Takes the tokens and reformats them into a representation that describes each part of the syntax and their relation to one another. This is known as an intermediate representation or Abstract Syntax Tree.
  • An Abstract Syntax Tree, or AST for short, is a deeply nested object that represents code in a way that is both easy to work with and tells us a lot of information.
案例
代码语言:javascript
复制
const input  = '(add 2 (subtract 4 2))';

词法分析结果: Tokens

代码语言:javascript
复制
[
    { type: 'paren',  value: '('        },
    { type: 'name',   value: 'add'      },
    { type: 'number', value: '2'        },
    { type: 'paren',  value: '('        },
    { type: 'name',   value: 'subtract' },
    { type: 'number', value: '4'        },
    { type: 'number', value: '2'        },
    { type: 'paren',  value: ')'        },
    { type: 'paren',  value: ')'        },
]

语法分析结果:AST

代码语言:javascript
复制
{
 type: 'Program',
 body: [{
   type: 'CallExpression',
   name: 'add',
   params: [{
     type: 'NumberLiteral',
     value: '2',
   }, {
     type: 'CallExpression',
     name: 'subtract',
     params: [{
       type: 'NumberLiteral',
       value: '4',
     }, {
       type: 'NumberLiteral',
       value: '2',
     }]
   }]
 }]
}

2. Transformation

  • This just takes the AST from the last step and makes changes to it. It can manipulate the AST in the same language or it can translate it into an entirely new language.
  • When transforming the AST we can manipulate nodes by adding/removing/replacing properties, we can add new nodes, remove nodes, or we could leave the existing AST alone and create an entirely new one based on it.

AST 转换成 new AST

代码语言:javascript
复制
{
  type: 'Program',
  body: [{
    type: 'ExpressionStatement',
    expression: {
      type: 'CallExpression',
      callee: {
        type: 'Identifier',
        name: 'add'
      },
      arguments: [{
        type: 'NumberLiteral',
        value: '2'
      }, {
        type: 'CallExpression',
        callee: {
          type: 'Identifier',
          name: 'subtract'
        },
        arguments: [{
          type: 'NumberLiteral',
          value: '4'
        }, {
          type: 'NumberLiteral',
          value: '2'
        }]
      }]
    }
  }]
};

3. Code Generation

基于new AST生成代码

代码语言:javascript
复制
const output = 'add(2, subtract(4, 2));';

总结

代码语言:javascript
复制
/** 
 *   1. input  => tokenizer   => tokens
 *   2. tokens => parser      => ast
 *   3. ast    => transformer => newAst
 *   4. newAst => generator   => output
 */

function compiler(input) {
  let tokens = tokenizer(input);
  let ast    = parser(tokens);
  let newAst = transformer(ast);
  let output = codeGenerator(newAst);

  // and simply return the output!
  return output;
}
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020-05-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • compiler的流程
    • 1. Parsing
      • Lexical Analysis
      • Syntactic Analysis
    • 2. Transformation
      • AST 转换成 new AST
    • 3. Code Generation
    • 总结
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档