golang日志组件使用runtime.Caller性能问题分析

原创

霸王猪

发布于 2019-01-21 21:07:59

10.2K0

背景

runtime.Caller能够拿到当前执行的文件名和行号，这个方法几乎所有的日志组件里都有使用。

func Caller(skip int) (pc uintptr, file string, line int, ok bool) {}

但是问题也很明显，这个函数开销太大。跟了一下代码实现，发现主要就是不停地迭代，而这个迭代过程虽然单次消耗的时间可以忽略不计，但是对于日质量巨大的服务而言影响还是很大的。详情参见下面的代码go/src/runtime/symtab.go#L125

各家的解决方案

网上了解了一下，目前没有找到不使用runtime包获取执行文件和行号的解决方案。既然官方的包很难有优化空间，那可以考虑一下业务侧有什么解决方案。于是查了一些github上比较知名的项目，学习了一下。

beego ( https://github.com/astaxie/beego )

作为国内知名度最高的go-web框架，想必是有可取之处。话不多说上代码：github.com/beego/bee/cmd/commands/generate/generate.go#L91

记录文件名和行号的用法是这样的：

beeLogger.Log.Debugf("GOPATH: %s", utils.FILE(), utils.LINE(), gopath)

我们来看下utils.FILE()和utils.LINE()的实现：github.com/beego/bee/utils/utils.go#L252

// __FILE__ returns the file name in which the function was invoked

func FILE() string {

    _, file, _, _ := runtime.Caller(1)

    return file

}

// __LINE__ returns the line number at which the function was invoked

func LINE() int {

    _, _, line, _ := runtime.Caller(1)

    return line

}

在同一需求场景下，FILE和LINE分别调用了一次能耗较高的runtime.Caller，这个其实一次就行。可能作者出于功能单一性原则硬是分成两个函数分别获取，但是这并不是我们想要的。下一个

log （Google出品）

官方写的日志组件，在log包里的，值得参考一下: /usr/local/go/src/log/log.go#L149

func (l *Logger) Output(calldepth int, s string) error {

    now := time.Now() // get this early.

    var file string

    var line int

    l.mu.Lock()

    defer l.mu.Unlock()

    if l.flag&(Lshortfile|Llongfile) != 0 {

        // Release lock while getting caller info - it's expensive.

        l.mu.Unlock()

        var ok bool

        _, file, line, ok = runtime.Caller(calldepth)

        if !ok {

            file = "???"

            line = 0

        }

        l.mu.Lock()

    }

    ...

    return er

}

注释里说明官方也是知道runtime.Caller性能有问题。这里的Logger里带有一个Mutex锁，方便在高并发或者多协程的时候保护上下文数据一致。

这里值得借鉴的是并没有所有的日志都记录文件名和行号，而是添加了标记位flag，只有在需要的业务场景下(Lshortfile或者Llongfile )，才调用runtime.Caller，避免不必要的开支。

另外一方面，在调用性能比较差的runtime.Caller之前，把锁释放，执行完毕后再把锁加上，继续处理自己后续的业务。这个可以避免锁的时间过长，影响其他业务。

zap （uber出品 5565星 https://github.com/uber-go/zap ）

敢以zap命名就可以看出官方的自信，从官方的benchmark数据来看，的确是算得上很快的：

Log a message and 10 fields:

| Package | Time | Objects Allocated |

| :--- | :---: | :---: |

| zap | 3131 ns/op | 5 allocs/op |

| zap (sugared) | 4173 ns/op | 21 allocs/op |

| zerolog | 16154 ns/op | 90 allocs/op |

| lion | 16341 ns/op | 111 allocs/op |

| go-kit | 17049 ns/op | 126 allocs/op |

| logrus | 23662 ns/op | 142 allocs/op |

| log15 | 36351 ns/op | 149 allocs/op |

| apex/log | 42530 ns/op | 126 allocs/op |

看一下实现行号和文件名的代码：go.uber.org/zap/logger.go#L284

    // Only do further annotation if we're going to write this message; checked

    // entries that exist only for terminal behavior don't benefit from

    // annotation.

    if !willWrite {

        return ce

    }

    // Thread the error output through to the CheckedEntry.

    ce.ErrorOutput = log.errorOutput

    if log.addCaller {

        ce.Entry.Caller = zapcore.NewEntryCaller(runtime.Caller(log.callerSkip + callerSkipOffset))

        if !ce.Entry.Caller.Defined {

            fmt.Fprintf(log.errorOutput, "%v Logger.check error: failed to get caller\n", time.Now().UTC())

            log.errorOutput.Sync()

        }

    }

它们将Caller信息放在CallerEntry.Entry.Caller结构里，并且和google类似，提供了标记位(addCaller)，只有在需要的场景下才会调用runtime.Caller.

值得一提的是，zap提供了Sugar和Desugar的模式，所谓Sugar就是提供程序员更友好方便的日志记录方式，但是牺牲了部分性能，核心就是Caller的skipCaller+2。Desugar就是个逆向的过程，可以将Sugared的结构再次回退到性能较高的原始模式，核心是Caller的callerSkip-2。两者可以在代码里面共同存在使用。

Sugar和Desuagr的实现：

//go.uber.org/zap/logger.go#L126

func (log *Logger) Sugar() *SugaredLogger {

    core := log.clone()

    core.callerSkip += 2

    return &SugaredLogger{core}

}

//go.uber.org/zap/sugar.go#L53

func (s *SugaredLogger) Desugar() *Logger {

    base := s.base.clone()

    base.callerSkip -= 2

    return base

}

顺便在这里提一下受zap启发的zerolog项目( https://github.com/rs/zerolog )，zerolog更是将行号和文件名的开关提到了顶层位置，使用时需要手动触发(log.With().Caller())才会去调用Caller: github.com/rs/zerolog/event.go#L635

//add file name and file line number to log

log.Logger = log.With().Caller().Logger()

log.Info().Msg("hello world")



// Caller adds the file:line of the caller with the zerolog.CallerFieldName key.

func (e *Event) Caller() *Event {

    return e.caller(CallerSkipFrameCount)

}



func (e *Event) caller(skip int) *Event {

    if e == nil {

        return e

    }

    _, file, line, ok := runtime.Caller(skip)

    if !ok {

        return e

    }

    e.buf = enc.AppendString(enc.AppendKey(e.buf, CallerFieldName), file+":"+strconv.Itoa(line))

    return e

}

logrus (9017星，github同类最多星 https://github.com/sirupsen/logrus)

logrus作为GitHub上标星最多的golang日志组件，还是有必要研究一下的。

首先看下核心Logger的结构：github.com/logrus/logger.go#L11

type Logger struct {

    // The logs are `io.Copy`'d to this in a mutex. It's common to set this to a

    // file, or leave it default which is `os.Stderr`. You can also set this to

    // something more adventurous, such as logging to Kafka.

    Out io.Write

    // Hooks for the logger instance. These allow firing events based on logging

    // levels and log entries. For example, to send errors to an error tracking

    // service, log to StatsD or dump the core on fatal errors.

    Hooks LevelHooks

    // All log entries pass through the formatter before logged to Out. The

    // included formatters are `TextFormatter` and `JSONFormatter` for which

    // TextFormatter is the default. In development (when a TTY is attached) it

    // logs with colors, but to a file it wouldn't. You can easily implement you

    // own that implements the `Formatter` interface, see the `README` or included

    // formatters for examples.

    Formatter Formatte



    // Flag for whether to log caller info (off by default)

    ReportCaller bool



    // The logging level the logger should log at. This is typically (and defaults

    // to) `logrus.Info`, which allows Info(), Warn(), Error() and Fatal() to be

    // logged.

    Level Level

    // Used to sync writing to the log. Locking is enabled by Default

    mu MutexWrap

    // Reusable empty entry

    entryPool sync.Pool

    // Function to exit the application, defaults to `os.Exit()`

    ExitFunc exitFunc

}

同样的，标示位ReportCaller用于标识哪些场合需要去获取行号和文件名等信息。而这里提升性能的一种解决方案是使用资源池entryPool sync.Pool，当然这里的资源池不是说runtime.Caller之类的可以复用，而是空的entry实例。使用资源池后，为了避免竞争出问题，自然而然用上了mutex。

值得一提的是这里的Hooks,用于特殊日志出现时执行相应的动作。于是我想到，在设计日志组件的时候，是否可以将需要打印文件名和行号的日志作为特殊日志，为这类特殊日志添加hook，hook里才会去运行runtime.Calle

和上面几个组件不同的是，logrus使用的并不是runtime.Caller，而是获取runtime.Frame。行号和文件名从Frame里获取: github.com/logrus/entry.go#L146

// getCaller retrieves the name of the first non-logrus calling function

func getCaller() \*runtime.Frame {

    // Restrict the lookback frames to avoid runaway lookups

    pcs := make([]uintptr, maximumCallerDepth)

    depth := runtime.Callers(minimumCallerDepth, pcs)

    frames := runtime.CallersFrames(pcs[:depth])

    // cache this package's fully-qualified name

    callerInitOnce.Do(func() {

        logrusPackage = getPackageName(runtime.FuncForPC(pcs[0]).Name())

        // now that we have the cache, we can skip a minimum count of known-logrus functions

        // XXX this is dubious, the number of frames may vary store an entry in a logger interface

        minimumCallerDepth = knownLogrusFrames

    })

    for f, again := frames.Next(); again; f, again = frames.Next() {

        pkg := getPackageName(f.Function)

        // If the caller isn't part of this package, we're done

        if pkg != logrusPackage {

            return &f

        }

    }

    // if we got here, we failed to find the caller's context

    return nil

}

题外话：这里顺带也提一下刚刚看到的logrus的一个issue，个人感觉还是比较有意思的：https://github.com/sirupsen/logrus/issues/63

这个issue是2014年提出，讲的是当时logrus还不支持文件名和行号。而这个issue出现的时候，logrus已经有4年历史了。

不知道这是bug还是feature的程序员当时感觉世界都崩了，都4012年了，竟然还有不支持文件名和行号等stacktrace信息的日志组件。

作者当时也是出于性能考虑，特别谨慎对待这一特性，迟迟不肯加入这个功能，更是希望有需求的开发者，自己在Hook内实现。

还别说，那时候还真有人在Hook里实现了添加行号文件名，详情在这里： https://github.com/onrik/logrus

用法是这样的：

filenameHook := filename.NewHook()

filenameHook.Field = "source"

logrus.AddHook(filenameHook)

看到这些人早期都推荐用Hook，突然有种英雄所见略同的感觉，HIA~HIA~HIA~

精华

上面也参考了不少当下比较知名的框架或者日志组件。总结了以下几个可能是优化点的地方，当然本文并没有深入探讨性能数据，只是参考了各家的思路。

使用标识位。只有在需要用上行号和文件名的场景才改变标记位，然后调用runtime.Caller或者runtimr.Frame。
日志分级，或者logger实例分级(sugar&desugar)。不同级别的日志信息或者logger实例，拥有详细度不同的信息，性能也不一样。
使用资源池复用logger实例。
如果logger实例使用了mutex锁，那么在调用runtime.Caller或者runtimr.Frame之前先将锁释放，调用完毕后再锁上。
将一些不是普遍适用的逻辑，使用Hook的思想，由特殊事件勾起，减少不必要的开销。

参考

logrus没有行号和文件名的issue https://github.com/sirupsen/logrus/issues/63
https://golang.org/pkg/runtime
awesome-go项目 https://github.com/avelino/awesome-go#logging
https://blog.scalyr.com/2018/06/go-logging/
https://talks.godoc.org/github.com/davecheney/go-1.9-release-party/presentation.slide#20

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

github

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

登录后参与评论

0 条评论

热度