前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >DSL parser 和 PromQL

DSL parser 和 PromQL

原创
作者头像
王磊-字节跳动
发布2019-07-07 19:22:48
3K0
发布2019-07-07 19:22:48
举报
文章被收录于专栏:01ZOO

DSL parser 和 PromQL

这是promql的学习笔记,从promql的代码学习中,希望学习到的是:如何实现一个像promql的parser

先看两个算法题

1. Parse Lisp Expression

题目大意:表达式支持以下操作:

  1. 表达式可以是 integer, let-expression, add-expression, mult-expression, assigned variable. 求表达式始终返回一个 integer
  2. (let v1 e1 v2 e2 ... vn en expr) => 赋值后 eval expr
  3. (add e1 e2) => return e1 + e2
  4. (mult e1 e2) => return e1 * e2

例子:

代码语言:txt
复制
Input: (add 1 2)
Output: 3

Input: (mult 3 (add 2 3))
Output: 15

Input: (let x 2 (mult x 5))
Output: 10

Input: (let x 2 (mult x (let x 3 y 4 (add x y))))
Output: 14

Input: (let x 3 x 2 x)
Output: 2

Input: (let x 1 y 2 x (add x y) (add x y))
Output: 5

分析:

  1. 有几种token: 数字, (, ), add, mult, let, 不同token之间除了"()" 其他都有空格隔离
  2. 有赋值,所以有一个context,即变量-》integer的映射关系
  3. 求的是expression,但是里面还有 expression, 递归的解决这个问题
代码语言:txt
复制
import re
class Solution:
    def evaluate(self, expression: str) -> int:
        # 解析成 token list 方便进一步处理
        el = re.split("([ \(\)])", expression)
        el = [c for c in el if c!= "" and c != " "]
        def evalp(el, c):
            """el 是token列表, c 为 context
            """
            head = el[0]
            if head != "(":
                # 此时只能是数字或者变量
                if head.isdigit() or head[0] == "-": 
                    return int(head)
                return c[head]
            
            context = dict(c)
            # 此时为( add/mult/let...
            subels = parse(el[2:len(el)-1])
            if el[1] == "add":
                return evalp(subels[0], context) + evalp(subels[1], context)
            elif el[1] == "mult":
                return evalp(subels[0], context) * evalp(subels[1], context)
            else:
                # 为let
                for i in range(len(subels)//2):
                    context[subels[2*i][0]] = evalp(subels[2*i + 1], context)
                return evalp(subels[-1], context)
        
        def parse(el):
            """解析成子表达式
            """
            par, r, subel = 0, [], []
            for a in el:
                if a == "(": par += 1
                elif a == ")": par -= 1
                subel.append(a)
                if par == 0:
                    r.append(list(subel))
                    subel = []
            if subel: r.append(list(subel))
            return r
                      
        return evalp(el, {})
代码语言:txt
复制
Solution().evaluate("(let x 1 y 2 x (add x y) (add x y))")
代码语言:txt
复制
5

2. Parsing A Boolean Expression

题目大意:

  1. 表达式eval结果是一个bool值, t表示True, f 表示False
  2. 支持&(expr1,expr2,...); |(expr1,expr2,...); !(expr)

例子:

代码语言:txt
复制
Input: expression = "|(&(t,f,t),!(t))"
Output: false

分析:

和上题类似,先解析token,再递归求表达式

代码语言:txt
复制
class Solution:
    def parseBoolExpr(self, expression: str) -> bool:
        # 表达式比较紧凑,不再解析成list
        def evalp(e):
            head = e[0]
            if head == "t" or head == "f":
                return True if head == "t" else False
            
            subes = parse(e[2:len(e)-1])
            if e[0] == "!":
                return not evalp(subes[0])
            elif e[0] == "&":
                return all(evalp(sube) for sube in subes)
            else:
                return any(evalp(sube) for sube in subes)
        
        def parse(e):
            """解析成子表达式
            """
            par, r, sube = 0, [], ""
            for a in e:
                if a == "(": par += 1
                elif a == ")": par -= 1
                if par == 0 and a == ",":
                    r.append(sube)
                    sube = ""
                else:
                    sube += a
            if sube: r.append(sube)
            return r
                      
        return evalp(expression) 
代码语言:txt
复制
Solution().parseBoolExpr("|(f,t)")
代码语言:txt
复制
True

PromQL简介

参考1 参考2

PromQL (Prometheus Query Language) 是 Prometheus 自己开发的数据查询 DSL 语言,语言表现力非常丰富,内置函数很多,在日常数据可视化以及rule 告警中都会使用到它。

在页面 http://localhost:9090/graph 中,输入下面的查询语句,查看结果,例如:

代码语言:txt
复制
http_requests_total{code="200"}
字符串和数字

字符串: 在查询语句中,字符串往往作为查询条件 labels 的值,和 Golang 字符串语法一致,可以使用 "", '', 或者 , 格式如:

代码语言:txt
复制
"this is a string"
'these are unescaped: \n \\ \t'
`these are not unescaped: \n ' " \t`

正数,浮点数: 表达式中可以使用正数或浮点数,例如:

代码语言:txt
复制
3
-2.4
查询结果类型

PromQL 查询结果主要有 3 种类型:

  • 瞬时数据 (Instant vector): 包含一组时序,每个时序只有一个点,例如:http_requests_total
  • 区间数据 (Range vector): 包含一组时序,每个时序有多个点,例如:http_requests_total5m
  • 纯量数据 (Scalar): 纯量只有一个数字,没有时序,例如:count(http_requests_total)
查询条件

Prometheus 存储的是时序数据,而它的时序是由名字和一组标签构成的,其实名字也可以写出标签的形式,例如 http_requests_total 等价于 {name="http_requests_total"}。

一个简单的查询相当于是对各种标签的筛选,例如:

代码语言:txt
复制
http_requests_total{code="200"} // 表示查询名字为 http_requests_total,code 为 "200" 的数据

查询条件支持正则匹配,例如:

代码语言:txt
复制
http_requests_total{code!="200"}  // 表示查询 code 不为 "200" 的数据
http_requests_total{code=~"2.."} // 表示查询 code 为 "2xx" 的数据
http_requests_total{code!~"2.."} // 表示查询 code 不为 "2xx" 的数据
操作符

Prometheus 查询语句中,支持常见的各种表达式操作符,例如

算术运算符:

  • 支持的算术运算符有 +,-,,/,%,^, 例如 http_requests_total 2 表示将 http_requests_total 所有数据 double 一倍。
  • 比较运算符:支持的比较运算符有 ==,!=,>,<,>=,<=, 例如 http_requests_total > 100 表示 http_requests_total 结果中大于 100 的数据。
  • 逻辑运算符:支持的逻辑运算符有 and,or,unless, 例如 http_requests_total == 5 or http_requests_total == 2 表示 http_requests_total 结果中等于 5 或者 2 的数据。
  • 聚合运算符:支持的聚合运算符有 sum,min,max,avg,stddev,stdvar,count,count_values,bottomk,topk,quantile,, 例如 max(http_requests_total) 表示 http_requests_total 结果中最大的数据。注意,和四则运算类型,Prometheus 的运算符也有优先级,它们遵从(^)> (*, /, %) > (+, -) > (==, !=, <=, <, >=, >) > (and, unless) > (or) 的原则。
内置函数

Prometheus 内置不少函数,方便查询以及数据格式化,例如将结果由浮点数转为整数的 floor 和 ceil,

代码语言:txt
复制
floor(avg(http_requests_total{code="200"}))
ceil(avg(http_requests_total{code="200"}))
查看 http_requests_total 5分钟内,平均每秒数据
rate(http_requests_total[5m])

PromQL实现

本质上看,PromQL很类似上面列举的两个算法题,无非是求一个expresion,只是PromQL支持的能力要强大很多。从promql的代码来看看他是怎么实现的,和我们简单expression的实现有什么不同。

通过上面的简介,我们可以大概整理出 PromQL支持的 expresion 格式

  1. Literals: 1, 1.0// Vector matching one to one <vector expr> <bin-op> ignoring(<label list>) <vector expr> <vector expr> <bin-op> on(<label list>) <vector expr> // Vector matching one to many/many to one <vector expr> <bin-op> ignoring(<label list>) group_left(<label list>) <vector expr> <vector expr> <bin-op> ignoring(<label list>) group_right(<label list>) <vector expr> <vector expr> <bin-op> on(<label list>) group_left(<label list>) <vector expr> <vector expr> <bin-op> on(<label list>) group_right(<label list>) <vector expr>
  2. Time series Selectors
    • Instant vector selectors: a, a{k=v}
    • Range vector selectors: <instant_query> '[' <range> ':' [<resolution>] ']' [ offset <duration>]
  3. 支持subQuery, subquery本身是一个 range vector selector
  4. 支持Binary operators:unaryExpr可以总结为 <Vector_selector> | <Matrix_selector> | (+|-) <number_literal> | '(' <expr> ')' 而 binary expr则为:BinaryOperators(InstantVector)
  5. 支持Vector matching:是有相同label的vector直接的计算, 定义见下
  6. 支持Aggregation operators:<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]
  7. 支持内置function,定义取决于具体的function实现:比如 abs(v instant-vector), rate(v range-vector)

PromQL入口

PromQL本质就是实现下面的interface, 执行一个query,返回结果,支持取消、关闭和获取解析后表达式以及执行统计信息。

代码语言:txt
复制
// A Query is derived from an a raw query string and can be run against an engine
// it is associated with.
type Query interface {
	// Exec processes the query. Can only be called once.
	Exec(ctx context.Context) *Result
	// Close recovers memory used by the query result.
	Close()
	// Statement returns the parsed statement of the query.
	Statement() Statement
	// Stats returns statistics about the lifetime of the query.
	Stats() *stats.QueryTimers
	// Cancel signals that a running query execution should be aborted.
	Cancel()
}

内部去Storage查询,会转化成另一个interface, 这个interface则更直接,支持的是对数据层的查询动作:select series, 获取label key, value

代码语言:txt
复制
// Querier provides reading access to time series data.
type Querier interface {
	// Select returns a set of series that matches the given label matchers.
	Select(*SelectParams, ...*labels.Matcher) (SeriesSet, Warnings, error)

	// LabelValues returns all potential values for a label name.
	LabelValues(name string) ([]string, Warnings, error)

	// LabelNames returns all the unique label names present in the block in sorted order.
	LabelNames() ([]string, Warnings, error)

	// Close releases the resources of the Querier.
	Close() error
}

// query主要有两种
// NewInstantQuery returns an evaluation query for the given expression at the given time.
func (ng *Engine) NewInstantQuery(q storage.Queryable, qs string, ts time.Time) (Query, error) {
	expr, err := ParseExpr(qs)
	// ....
	qry := ng.newQuery(q, expr, ts, ts, 0)
	qry.q = qs
	return qry, nil
}

// NewRangeQuery returns an evaluation query for the given time range and with
// the resolution set by the interval.
func (ng *Engine) NewRangeQuery(q storage.Queryable, qs string, start, end time.Time, interval time.Duration) (Query, error) {
	expr, err := ParseExpr(qs)
	// ....
	qry := ng.newQuery(q, expr, start, end, interval)
	qry.q = qs
	return qry, nil
}

Parser

从上面的代码可以看出,所有Query被计算的第一步是ParseExpr,也是我们想在这里重点关注的内容。

ParseExpr的实现在lex.go 和 parse.go里面

求值过程

lex (token/item 相当于上面的split操作) => parser (Expr, AST语法树) => engine (eval)

lex
代码语言:txt
复制
// lex.go 本质是一个scan expression =》 token的状态机,lexer结构体里面定义了和这个状态机有关的状态信息
// 里面比较有特色的是 state,这里并不向一版的状态机用一个string表示状态,而是给了一个下一个token的处理函数
// 这样这个state,不仅体现了状态,同时把此状态需要的处理函数也传过来了
// lexer holds the state of the scanner.
type lexer struct {
	input   string    // The string being scanned. 原始字符串
	state   stateFn   // The next lexing function to enter. 状态: 解析下一个token需要的函数
	pos     Pos       // Current position in the input. 状态:当前解析位置
	start   Pos       // Start position of this item. 状态:当前token起始位置
	width   Pos       // Width of last rune read from input. 状态
	lastPos Pos       // Position of most recent item returned by nextItem. 状态
	items   chan item // Channel of scanned items. token传输通道

	parenDepth  int  // Nesting depth of ( ) exprs.  状态
	braceOpen   bool // Whether a { is opened.  状态
	bracketOpen bool // Whether a [ is opened.  状态
	gotColon    bool // Whether we got a ':' after [ was opened. 状态
	stringOpen  rune // Quote rune of the string currently being read. 状态

	// seriesDesc is set when a series description for the testing
	// language is lexed.
	seriesDesc bool
}


// token的定义如下
const (
	ItemError ItemType = iota // Error occurred, value is error message
	ItemEOF
	ItemComment
	ItemIdentifier
	ItemMetricIdentifier
	ItemLeftParen
	ItemRightParen
	ItemLeftBrace
	ItemRightBrace
	ItemLeftBracket
	ItemRightBracket
    // 省略
)

// 同时有个几个关键字 =》 token的map,用于转义关键字到token,比如
var key = map[string]ItemType{
	// Operators.
	"and":    ItemLAND,
	"or":     ItemLOR,
	"unless": ItemLUnless,

	// Aggregators.
	"sum":          ItemSum,
	"avg":          ItemAvg,
	"count":        ItemCount,
    // 省略
}


// scan 原始 string需要的几个函数 定义为next, peek, backup 是比较经典的写法
// next returns the next rune in the input.
func (l *lexer) next() rune {
	if int(l.pos) >= len(l.input) {
		l.width = 0
		return eof
	}
	r, w := utf8.DecodeRuneInString(l.input[l.pos:])
	l.width = Pos(w)
	l.pos += l.width
	return r
}

// peek returns but does not consume the next rune in the input.
func (l *lexer) peek() rune {
	r := l.next()
	l.backup()
	return r
}

// backup steps back one rune. Can only be called once per call of next.
func (l *lexer) backup() {
	l.pos -= l.width
}
parse
代码语言:txt
复制
type parser struct {
	lex       *lexer
	token     [3]item
	peekCount int
}

// 入口函数为 (同时支持ParseMetric / ParseMetricSelector这里我们不关注)
// ParseExpr returns the expression parsed from the input.
func ParseExpr(input string) (Expr, error) {
	p := newParser(input)

	expr, err := p.parseExpr()
	if err != nil {
		return nil, err
	}
	err = p.typecheck(expr)
	return expr, err
}

// ParseExpr(input string) (Expr, error) 
// => (p *parser) parseExpr() (expr Expr, err error) 
// => for{ (p *parser) peek() item  ==> (p *parser) expr() Expr}
// peek 即调用lex获取下一个item(跳过comment)
// expr是核心,用于解析token流到 Expr即语法树 AST
// expr parses any expression.
func (p *parser) expr() Expr {
	// Parse the starting expression.
	expr := p.unaryExpr()

	// Loop through the operations and construct a binary operation tree based
	// on the operators' precedence.
	for {
        // 如果不是 operator, 那么结束
		op := p.peek().typ
		if !op.isOperator() {
			// Check for subquery.
			// ... 省略
			return expr
		}
		p.next() // Consume operator.

		// 处理operator的情况,对于operator 肯定有有 lhs, rhs;
        // ... 省略
            
		// Parse the next operand.
		rhs := p.unaryExpr()

		// Assign the new root based on the precedence of the LHS and RHS operators.
		expr = p.balance(expr, op, rhs, vecMatching, returnBool)
	}
}

// left op right with vecMatching => BinaryExpr
func (p *parser) balance(lhs Expr, op ItemType, rhs Expr, vecMatching *VectorMatching, returnBool bool) *BinaryExpr 


// unaryExpr parses a unary expression.
//
//		<Vector_selector> | <Matrix_selector> | (+|-) <number_literal> | '(' <expr> ')'
//
func (p *parser) unaryExpr() Expr {
	switch t := p.peek(); t.typ {
	case ItemADD, ItemSUB:
		p.next()
		e := p.unaryExpr()
        // 省略
		return &UnaryExpr{Op: t.typ, Expr: e}

	case ItemLeftParen:
		// 省略

		return &ParenExpr{Expr: e}
	}
	e := p.primaryExpr()

	// Expression might be followed by a range selector.
	if p.peek().typ == ItemLeftBracket {
		e = p.subqueryOrRangeSelector(e, true)
	}

	// Parse optional offset.
	if p.peek().typ == ItemOffset {
		// 省略
	}

	return e
}

// primaryExpr parses a primary expression.
//
//		<metric_name> | <function_call> | <Vector_aggregation> | <literal>
//
func (p *parser) primaryExpr() Expr {
	// 省略
}


// 这里复用了这个函数做 range的解析,和subquery的解析,通过一个bool值区分使用场景
// subqueryOrRangeSelector parses a Subquery based on given Expr (or)
// a Matrix (a.k.a. range) selector based on a given Vector selector.
//
//		<Vector_selector> '[' <duration> ']' | <Vector_selector> '[' <duration> ':' [<duration>] ']'
//
func (p *parser) subqueryOrRangeSelector(expr Expr, checkRange bool) Expr {
	// 省略
	return &SubqueryExpr{
		Expr:  expr,
		Range: erange,
		Step:  estep,
	}
}

// primaryExpr parses a primary expression.
//
//		<metric_name> | <function_call> | <Vector_aggregation> | <literal>
//
func (p *parser) primaryExpr() Expr {
    // 省略
}

// aggrExpr parses an aggregation expression.
//
//		<aggr_op> (<Vector_expr>) [by|without <labels>]
//		<aggr_op> [by|without <labels>] (<Vector_expr>)
//
func (p *parser) aggrExpr() *AggregateExpr 

// call parses a function call.
//
//		<func_name> '(' [ <arg_expr>, ...] ')'
//
func (p *parser) call(name string) *Call 
ast

经过parse之后,表达式就被转化成一个Expr,也是一个AST

这个AST里面有多个Node,每个Node可能是下面的几种

代码语言:txt
复制
// AggregateExpr represents an aggregation operation on a Vector.
type AggregateExpr struct {
	Op       ItemType // The used aggregation operation.
	Expr     Expr     // The Vector expression over which is aggregated.
	Param    Expr     // Parameter used by some aggregators.
	Grouping []string // The labels by which to group the Vector.
	Without  bool     // Whether to drop the given labels rather than keep them.
}

// BinaryExpr represents a binary expression between two child expressions.
type BinaryExpr struct {
	Op       ItemType // The operation of the expression.
	LHS, RHS Expr     // The operands on the respective sides of the operator.

	// The matching behavior for the operation if both operands are Vectors.
	// If they are not this field is nil.
	VectorMatching *VectorMatching

	// If a comparison operator, return 0/1 rather than filtering.
	ReturnBool bool
}

// Call represents a function call.
type Call struct {
	Func *Function   // The function that was called.
	Args Expressions // Arguments used in the call.
}

// MatrixSelector represents a Matrix selection.
type MatrixSelector struct {
	Name          string
	Range         time.Duration
	Offset        time.Duration
	LabelMatchers []*labels.Matcher

	// The unexpanded seriesSet populated at query preparation time.
	unexpandedSeriesSet storage.SeriesSet
	series              []storage.Series
}

// SubqueryExpr represents a subquery.
type SubqueryExpr struct {
	Expr   Expr
	Range  time.Duration
	Offset time.Duration
	Step   time.Duration
}

// NumberLiteral represents a number.
type NumberLiteral struct {
	Val float64
}

// ParenExpr wraps an expression so it cannot be disassembled as a consequence
// of operator precedence.
type ParenExpr struct {
	Expr Expr
}

// StringLiteral represents a string.
type StringLiteral struct {
	Val string
}

// UnaryExpr represents a unary operation on another expression.
// Currently unary operations are only supported for Scalars.
type UnaryExpr struct {
	Op   ItemType
	Expr Expr
}

// VectorSelector represents a Vector selection.
type VectorSelector struct {
	Name          string
	Offset        time.Duration
	LabelMatchers []*labels.Matcher

	// The unexpanded seriesSet populated at query preparation time.
	unexpandedSeriesSet storage.SeriesSet
	series              []storage.Series
}
engine

engine 得到解析后的Expr之后包装成一个 EvalStmt =》 再包装一个query,再exec的时候求值, 下面是具体的调用链条.

代码语言:txt
复制
// exec executes the query.  调用 execEvalStmt he 
//
// At this point per query only one EvalStmt is evaluated. Alert and record
// statements are not handled by the Engine.
func (ng *Engine) exec(ctx context.Context, q *query) (Value, storage.Warnings, error)

    
// execEvalStmt evaluates the expression of an evaluation statement for the given time range.
// 先后调用 populateSeries 和Eval
func (ng *Engine) execEvalStmt(ctx context.Context, query *query, s *EvalStmt) (Value, storage.Warnings, error) 
    
// 会把相关的时间序列值都populate到相关的node上面去, 即从storage把原始数值都查出来
//     => querier.Select(params, n.LabelMatchers...)
// 注意前面的Expr中 VectorSelector 和 MatrixSelector 是最终能存 时间序列值的 node
func (ng *Engine) populateSeries(ctx context.Context, q storage.Queryable, s *EvalStmt) (storage.Querier, storage.Warnings, error) {
    
func (ev *evaluator) Eval(expr Expr) (v Value, err error)
    
// 核心处理函数,根据不同的Expr类型做处理, 即对已经带了时间序列值对Node做Eval 返回 Value
// eval evaluates the given expression as the given AST expression node requires.
func (ev *evaluator) eval(expr Expr) Value 
    
// eval流程的一个例子
case *BinaryExpr:
return ev.rangeEval(func(v []Value, enh *EvalNodeHelper) Vector {
	return ev.VectorBinop(e.Op, v[0].(Vector), v[1].(Vector), e.VectorMatching, e.ReturnBool, enh)
				}, e.LHS, e.RHS)
    
// rangeEval evaluates the given expressions, and then for each step calls
// the given function with the values computed for each expression at that
// step.  The return value is the combination into time series of all the
// function call results.
func (ev *evaluator) rangeEval(f func([]Value, *EvalNodeHelper) Vector, exprs ...Expr) Matrix 
    
// VectorBinop evaluates a binary operation between two Vectors, excluding set operators.
func (ev *evaluator) VectorBinop(op ItemType, lhs, rhs Vector, matching *VectorMatching, returnBool bool, enh *EvalNodeHelper) Vector 
{
    for { value, keep := vectorElemBinop(op, vl, vr)}
}
    

// vectorElemBinop evaluates a binary operation between two Vector elements.
func vectorElemBinop(op ItemType, lhs, rhs float64) (float64, bool) {
	switch op {
	case ItemADD:
		return lhs + rhs, true

            
// 再看一个function rate 的例子
// delta/rate/increase 背后共享了相同的计算逻辑,仅仅是参数不同。
// === rate(node ValueTypeMatrix) Vector ===
func funcRate(vals []Value, args Expressions, enh *EvalNodeHelper) Vector {
	return extrapolatedRate(vals, args, enh, true, true)
}
            
// extrapolatedRate is a utility function for rate/increase/delta.
// It calculates the rate (allowing for counter resets if isCounter is true),
// extrapolates if the first/last sample is close to the boundary, and returns
// the result as either per-second (if isRate is true) or overall.
func extrapolatedRate(vals []Value, args Expressions, enh *EvalNodeHelper, isCounter bool, isRate bool) Vector             

其他

1. prometheus的metric解析

一般来说手写的parser要比yacc生成的parser效率高,prometheus中promql的实现是手写,但是对metrics的解析则使用了yacc,而metrics的结构也比较简单,实现起来比较容易。代码在pkg/textparse

2. brainfuck

附一个brainfuck语言的解析器, 这个语言比较简单,支持如下几种操作, 不支持函数调用。

代码语言:txt
复制
# >	Move the pointer to the right
# <	Move the pointer to the left
# +	Increment the memory cell under the pointer
# -	Decrement the memory cell under the pointer
# .	Output the character signified by the cell at the pointer
# ,	Input a character and store it in the cell at the pointer
# [	Jump past the matching ] if the cell under the pointer is 0
# ]	Jump back to the matching [ if the cell under the pointer is nonzero
代码语言:txt
复制
from enum import Enum 

class BfOpKind(Enum):
    # 多了几种OP,是因为这里对program做了“编译”优化
    # 比如  LOOP_SET_TO_ZERO replaces [-]
    #      LOOP_MOVE_PTR for loops like [>>>]
    #      LOOP_MOVE_DATA for loops like [-<<<+>>>]
    INVALID_OP = 0
    INC_PTR = 1
    DEC_PTR = 2
    INC_DATA=3
    DEC_DATA=4
    READ_STDIN=5
    WRITE_STDOUT=6
    LOOP_SET_TO_ZERO=7
    LOOP_MOVE_PTR=8
    LOOP_MOVE_DATA=9
    JUMP_IF_DATA_ZERO=10
    JUMP_IF_DATA_NOT_ZERO=11
    
valid_symbol = {">": BfOpKind.INC_PTR,
                "<": BfOpKind.DEC_PTR, 
                "+": BfOpKind.INC_DATA,
                "-": BfOpKind.DEC_DATA, 
                ".": BfOpKind.WRITE_STDOUT, 
                ",": BfOpKind.READ_STDIN, 
                "[": BfOpKind.JUMP_IF_DATA_ZERO, 
                "]": BfOpKind.JUMP_IF_DATA_NOT_ZERO}

def clean_program(p):
    return [a for a in p if a in valid_symbol]
    
class BfOp():
    def __init__(self, opkind, argument=0):
        # op 不仅有opkind, 还有arg 因为 >>> 优化成了 BfOp(INC_PTR, 3)
        # 这样压缩了op,也是一种编译优化
        self.kind, self.argument = opkind, argument
    def __repr__(self):
        for a, b in valid_symbol.items():
            if b == self.kind:
                return a
        return "xxx" 
    
    def __str__(self):
        for a, b in valid_symbol.items():
            if b == self.kind:
                return a
        return "xxx" 
    
def optimize_loop(ops):
    # return op
    ret = None
    if len(ops) == 3:
        # [-] [+] [>] [<]
        op = ops[1]
        if op.kind == BfOpKind.INC_DATA or op.kind == BfOpKind.DEC_DATA:
            ret = BfOp(BfOpKind.LOOP_SET_TO_ZERO)
        elif op.kind == BfOpKind.INC_PTR or op.kind == BfOpKind.DEC_PTR:
            ret = BfOp(BfOpKind.LOOP_MOVE_PTR, op.argument if op.kind == BfOpKind.INC_PTR else -op.argument)
    elif len(ops) == 6:
        # [-<+>] and [->+<]
        if ops[1].kind == BfOpKind.DEC_DATA and ops[3].kind == BfOpKind.INC_DATA and \
             ops[1].argument == 1 and ops[3].argument == 1:
            if ops[2].kind == BfOpKind.INC_PTR and ops[4].kind == BfOpKind.DEC_PTR and \
            ops[2].argument == ops[4].argument:
                ret = BfOp(BfOpKind.LOOP_MOVE_DATA, ops[2].argument)
            elif ops[2].kind == BfOpKind.DEC_PTR and ops[4].kind == BfOpKind.INC_PTR and \
            ops[2].argument == ops[4].argument:
                ret = BfOp(BfOpKind.LOOP_MOVE_DATA, -ops[2].argument)
    #print("optimize", [a for a in ops], "to", ret)
    return ret


def translate_program(p):
    pc, psize, ops = 0, len(p), []
    open_bracket_stack = []
    while pc < psize:
        instruction = p[pc]
        if instruction == "[":
            open_bracket_stack.append(len(ops))
            ops.append(BfOp(BfOpKind.JUMP_IF_DATA_ZERO, 0))
            pc += 1
        elif instruction == ']':
            if not open_bracket_stack:
                print("unmatched closing ']' at pc=", pc)
                exit(1)
            open_bracket_offset = open_bracket_stack.pop()
            ops[open_bracket_offset].argument = len(ops)
            ops.append(BfOp(BfOpKind.JUMP_IF_DATA_NOT_ZERO, open_bracket_offset))
            optimized_op = optimize_loop(ops[open_bracket_offset:])
            if optimized_op:
                # 优化替换
                ops[open_bracket_offset:] = [optimized_op]
            pc += 1
        else:
            pc, start = pc + 1, pc
            while pc < psize and p[pc] == instruction:
                pc += 1
            num_repeats = pc - start
            if instruction not in valid_symbol:
                print("invalid symbol ", instruction, pc)
                exit(1)
            ops.append(BfOp(valid_symbol[instruction], num_repeats))
    return ops
            
def optinterp(p):
    memory, dataptr = [0]*10000, 0
    ops, pc = translate_program(p), 0
    while pc <len(ops):
        op, kind = ops[pc], ops[pc].kind
        if kind == BfOpKind.INC_PTR:
            dataptr += op.argument
        elif kind == BfOpKind.DEC_PTR:
            dataptr -= op.argument
        elif kind == BfOpKind.INC_DATA:
            memory[dataptr] += op.argument
        elif kind ==  BfOpKind.DEC_DATA:
            memory[dataptr] -= op.argument
        elif kind == BfOpKind.READ_STDIN:
            memory[dataptr] == int(input())
        elif kind == BfOpKind.WRITE_STDOUT:
            print(chr(memory[dataptr])*op.argument, end="")
        elif kind == BfOpKind.LOOP_SET_TO_ZERO:
            memory[dataptr] = 0
        elif kind == BfOpKind.LOOP_MOVE_PTR:
            while memory[dataptr]:
                dataptr += op.argument
        elif kind == BfOpKind.LOOP_MOVE_DATA:
            if memory[dataptr]:
                memory[dataptr + op.argument] += memory[dataptr]
                memory[dataptr] = 0
        elif kind == BfOpKind.JUMP_IF_DATA_ZERO:
            if memory[dataptr] == 0:
                pc = op.argument
        elif kind == BfOpKind.JUMP_IF_DATA_NOT_ZERO:
            if memory[dataptr] != 0:
                pc = op.argument
        else:
            print("INVALID_OP encountered on pc=", pc)
        pc += 1
            
def run(p):
    p = clean_program(p)
    optinterp(p)
代码语言:txt
复制
run("++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.")
代码语言:txt
复制
Hello World!

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • DSL parser 和 PromQL
    • 先看两个算法题
      • 1. Parse Lisp Expression
      • 2. Parsing A Boolean Expression
    • PromQL简介
      • PromQL实现
        • PromQL入口
        • Parser
      • 其他
        • 1. prometheus的metric解析
        • 2. brainfuck
    相关产品与服务
    Prometheus 监控服务
    Prometheus 监控服务(TencentCloud Managed Service for Prometheus,TMP)是基于开源 Prometheus 构建的高可用、全托管的服务,与腾讯云容器服务(TKE)高度集成,兼容开源生态丰富多样的应用组件,结合腾讯云可观测平台-告警管理和 Prometheus Alertmanager 能力,为您提供免搭建的高效运维能力,减少开发及运维成本。
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档