技术解码 | 内存问题的分析与定位

腾讯云音视频

发布于 2021-04-29 14:10:48

3.9K0

发布于 2021-04-29 14:10:48

文章被收录于专栏：音视频咖音视频咖

本期的技术解码，为您解析

编程中，内存问题的分析与定位方法

对编程语言设计来说，内存管理分为两大类：手动内存管理(manual memory management) 和垃圾回收(garbage collection). 常见的如C、C++使用手动内存管理，Java使用垃圾回收。本文主要关注手动内存管理。

GC使内存管理自动化，缺点是引入了GC时不可预测的暂停(unpredictable stall)，对实时性要求高的场景不适用。现代的GC实现一直朝着减小“stop-the-world"影响的方向进行优化。

有GC机制的编程语言不代表彻底告别了内存泄漏（此时内存泄漏的含义与手动内存管理语言稍有不同）。当短生命周期对象被长生命周期对象一直持有时，短生命周期对象实际不再被调用但又得不到GC，即为内存泄漏。这类泄漏在Android应用开发中普遍存在，尤其要注意匿名内部类的使用。可以用LeakCanary工具进行内存泄漏检查LeakCanary（https://square.github.io/leakcanary/）.

手动内存管理

对于手动内存管理，引用计数(reference counting)是常用的避免内存泄漏的手段。实际上，引用计数可以解决两大问题：

内存泄漏(memory leak)
重复释放(double free)

引用计数存在一个缺点，无法解决循环引用(reference cycles)的问题。当存在循环引用时，计数始终 > 0，对象得不到释放。

如果编程时能够识别出循环引用的场景，可以使用弱引用来解决。C++11引入了std::shared_ptr和std::weak_ptr。多个shared_ptr和weak_ptr共享一个control block，control block内记录了shared_ptr和weak_ptr的数量，如下图所示。当所有的shared_ptr out of scope之后，被管理对象会被析构，但是control block本身占用的内存不会被释放，用来记录还有多少个weak_ptr。只有所有weak_ptr也析构之后，control block才被释放。 注意这里有个细节：当使用std::make_shared()时，被管理对象的内存和control block可能一起分配（占用一块大内存）。如果使用weak_ptr，当shared_ptr都析构之后，weak_ptr还在使用control block. 因为被管理对象的内存和control block是一起分配的，所以被管理对象也只能析构，而不能释放内存。细节见附录：std::make_shared and std::weak_ptr.

识别循环引用，就像识别死锁的lock-order-inversion一样，可能非常困难。有些循环引用比较隐蔽，特别是不同的模块由不同的开发者开发时，调用者被调用者存在复杂的引用关系。GC可以使用可达性判断，彻底解决循环引用问题。

C语言和C++都可以使用引用计数，但只有引用计数与RAII(Resource Acquisition Is Initialization)的结合，才使得手动内存管理的便捷程度接近于GC. C语言必须手动调用hold, release等方法来对引用计数做增减和释放内存。如果某些代码路径特别是错误处理上漏了一个release，即导致内存泄漏。而RAII可以通过对象的构造和析构来自动增减引用计数，即使出现exception的场景，也可以保证正确的引用计数。

RAII本身可以独立使用，可以用于非内存对象的场景，比如文件描述符。GC的一个缺点是无法及时自动释放非内存资源，例如Java的finalizer并不等于C++的析构，finalizer可以作为最后的兜底策略，不能作为关闭文件描述符的第一选择。

Rust也是使用引用计数 + RAII来解决内存安全问题。Rust的语言设计使得简单的循环引用场景在编译时报错，降低循环引用出现的可能性，但不能彻底避免循环引用。

常见的内存问题有：

内存泄漏 (memory leak)
空指针解引用 (null pointer dereference)
释放后又访问 (use after free, dangling pointer)
重复释放 (double free)
越界访问 (buffer overflow, index out of range)
- 堆上和栈上都可能出现
栈溢出(stack overflow)
读取未初始化的数据
内存地址不对齐 (aligment)
- 例如，把char 强转成int , 再解引用，可能导致crash
线程安全中的内存问题

有一些常见的误区：

通过空指针调用对象方法一定崩溃吗？不一定崩溃。如果成员函数是实函数，又没有直接或间接访问成员变量，则不会发生崩溃。这种情况下，普通成员函数与静态成员函数类似。
通过野指针调用对象方法一定崩溃吗？不一定崩溃。取决于对象的内存是否被重新分配、是否被覆写、是否访问成员变量、是否为虚函数等。可能不立即崩溃但误操作内存数据，导致程序后续运行逻辑异常或crash，即埋下一颗地雷。
内存不足malloc一定返回空指针吗？不一定。涉及内存分配的overcommit问题：https://www.kernel.org/doc/Documentation/vm/overcommit-accounting。C++ new更复杂一些。开启exception的情况下，内存分配失败可能throw std::bad_alloc，不返回空指针。可以通过 new(std::nothrow) 让new不抛出异常，例如：

 void test() {      try {          int *p = new int[1ULL << 50U];          std::cout << p << '\n';          delete[] p;      } catch (const std::bad_alloc &e) {          std::cout << e.what() << '\n';      }
      int *p = new(std::nothrow) int[1ULL << 50U];      if (p == nullptr) {          std::cout << "Allocation returned nullptr\n";      }      delete[] p;  }

另外，free(NULL)和delete nullptr都是安全的，是否判断非空指针再delete是代码风格问题。

内存操作错误会导致undefined behavior，可能让程序逻辑异常，最明显的是crash。通过crash来分析、定位和解决内存相关bug，是一种亡羊补牢的做法，如果能够在程序灰度过程中及时解决，犹未晚矣。

NDK开发是Android应用开发的重要组成部分，尤其是包含音视频功能的应用。Java、Kotlin等语言异常crash时，往往有清晰的backtrace，理清crash现场相对容易。而面对native的crash以及上报系统上报的一堆寄存器信息等，一些开发同学可能觉得无从下手。下面以Android平台为例，简述native crash的分析工具、分析方法。

native crash 现场分类

Android平台App出现native crash，少数情况是发生了不可恢复的错误，主动abort (SIGABRT)，多数情况是由内存问题导致的。

主动abort()的场景，一般会给出abort()的原因。例如Android的日志打印LOG_FATAL()，会先打印出log message，再abort(). 应用一般不调用LOG_FATAL(), 偶尔可以看到Android系统因为一些异常情况而LOG_FATAL(). 如果crash上报系统有崩溃现场完善的日志，通过日志分析原因是比较容易的。

因不同的内存问题导致的crash，呈现不同的现场，例如：

SIGSEGV: segmentation violation
- 访问内存地址非法，可能是空指针，可能是空指针加了一个比较小的offset，也可能是任意数值
SIGILL: illegal instruction 非法指令问题
- 少数情况是cpu架构、toolchain编译配置等导致
- 多数情况是程序跑飞了
SIGBUS: 内存地址不对齐
内存不足
- 可能是程序逻辑正常但使用了过多的内存
- 可能是内存泄漏导致的内存不足

崩溃现场信息

crash上报系统通常会上报如下信息：

日志
backtrace调用栈
寄存器信息
动态库加载地址

日志中可能同时包含backtrace和寄存器信息，例如：

图中的一些关键信息：

Signal: SIGSEGV，说明访问了非法的内存地址(invalid memory reference). SEGV有多种，常见的两种是：
- SEGV_MAPERR: address not mapped to object, 访问的内存没有映射到用户地址空间，空指针或野指针导致
- SEGV_ACCERR: invalid permissions for mapped object, 权限错误，比如往只读内存区域写数据其他见：asm-generic/siginfo.h
Fault address: 非法访问的内存地址
线程号和线程名: tid 13876(network_thread)
- 各个系统平台没有统一的设置线程名的API，即使同为类Unix系统也不一样。例如，gstreamer设置线程名的方法：https://gitlab.gnome.org/GNOME/glib/-/blob/master/glib/gthread-posix.c#L1382

  void    g_system_thread_set_name (const gchar *name)    {    #if defined(HAVE_PTHREAD_SETNAME_NP_WITHOUT_TID)      pthread_setname_np (name); /* on OS X and iOS */    #elif defined(HAVE_PTHREAD_SETNAME_NP_WITH_TID)      pthread_setname_np (pthread_self (), name); /* on Linux and Solaris */    #elif defined(HAVE_PTHREAD_SETNAME_NP_WITH_TID_AND_ARG)      pthread_setname_np (pthread_self (), "%s", (gchar *) name); /* on NetBSD */    #elif defined(HAVE_PTHREAD_SET_NAME_NP)      pthread_set_name_np (pthread_self (), name); /* on FreeBSD, DragonFlyBSD, OpenBSD */    #endif    }

寄存器和backtrace
- 注意，上图除了#00 pc，其他的值并不是pc寄存器的值，而是已经减去了动态库加载的基地址，还原成了动态库中的相对地址

不同信息用不同工具来分析：

addr2line, ndk-stack等，可以根据backtrace定位代码行
用objdump反编译动态库，再根据pc地址、寄存器信息，可以找到导致崩溃的汇编指令和操作符的值
pc寄存器的值和动态库加载地址信息，可以算出对应动态库中的相对地址。logcat打印的backtrace已经是转换之后的地址，一般不需要手动换算

基本分析流程

第0步：编译时保存带符号动态库

如果在编译构建环节没有保存带符号动态库，而是crash发生之后再重新生成动态库，新生成的动态库不一定与上线发布的版本匹配。能够准确还原调用栈，这一步尤其重要。无符号分析和调试也是一种手段，但复杂性显著增加。

另外，要特别注意是否关掉了unwind table. 例如编译webrtc release会关闭unwind table，然后导致crash时backtrace不完整

build/config/compiler/BUILD.gn:

if (!is_nacl) {      if (exclude_unwind_tables) {        cflags += [          "-fno-unwind-tables",          "-fno-asynchronous-unwind-tables",        ]        defines += [ "NO_UNWIND_TABLES" ]      } else {        cflags += [ "-funwind-tables" ]      }    }

webrtc为何关闭unwind table在compiler.gni里有说明：

# Exclude unwind tables for official builds as unwinding can be done from stack# dumps produced by Crashpad at a later time "offline" in the crash server.# For unofficial (e.g. development) builds and non-Chrome branded (e.g. Cronet# which doesn't use Crashpad, crbug.com/479283) builds it's useful to be able# to unwind at runtime.exclude_unwind_tables =    is_official_build || (is_chromecast && !is_cast_desktop_build &&                          !is_debug && !cast_is_debug && !is_fuchsia)

关于符号的一些说明：

符号有调试符号和函数符号等，strip命令有参数控制strip级别，是只裁剪调试符号还是裁剪所有不需要的符号
编译器优化级别和是否带调试符号两者是正交的，并不是只有-O0/-Og才能加-g
CMake有几个CMAKE_BUILD_TYPE
- Debug
- Release
- RelWithDebInfo
- MinSizeRel ... 如果脱离Android Studio，单独用cmake构建Android的动态库，可以使用RelWithDebInfo生成带符号的release版动态库，再strip
有时候为了避免符号冲突，有几种处理方式：
- 编译控制，例如cmake

   set(CMAKE_C_VISIBILITY_PRESET hidden)    set(CMAKE_CXX_VISIBILITY_PRESET hidden)

或直接配置: -fvisibility=hidden
JNI函数用注册的方式： https://developer.android.com/training/articles/perf-jni
linker version script： https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_25.html 几种方式可以配合使用

还原调用栈

第一步通常是用addr2line还原调用栈。ndk提供了简化工具ndk-stack，可以直接输入日志输出还原的调用栈。例如

adb logcat > /tmp/foo.txt$NDK/ndk-stack -sym $PROJECT_PATH/obj/local/armeabi-v7a -dump foo.txt

使用addr2line，常用参数有：（https://linux.die.net/man/1/addr2line）

-e filename Specify the name of the executable for which addresses should be translated. The default file is a.out.
-C Decode (demangle) low-level symbol names into user-level names. 不使用此参数，得到的是C++ mangle之后的符号，可读性差，demangle之后可以得到与源码一致的class和函数名。另有一个专门的工具做demangle: c++filt. 例如

  $ c++filt -n _Z1fv  f()

-f Display function names as well as file and line number information. 注意，-f 可以用于strip前的动态库也可以用于strip后的动态库，取决于strip的级别，多个函数代码段可能被合并到一个符号处，-f不一定能得到正确的符号名

还原调用栈之后，结合日志信息，有些崩溃可以立刻定位出原因，比如对象空指针；有些则原因不明，或者看起来像是发生了“不可能的崩溃”，需要进一步分析。

反编译

addr2line定位出代码行之后，一行代码可能包含多次解引用，可能包含多个条件语句判断，不能确定具体是哪个操作触发了crash（另一方面我们可以反思，应当避免把一行代码写的过于复杂）。通过反编译可以找出触发crash的汇编指令。

可以使用objdump进行反编译。objdump的功能很多，用于反编译时，常用参数如下：（https://linux.die.net/man/1/objdump）

-d Display the assembler mnemonics for the machine instructions from objfile.
-D Like -d, but disassemble the contents of all sections, not just those expected to contain instructions.
-C 与addr2line的-C一样，也是demangle功能

例如：

aarch64-linux-android-objdump -D -C libvlc.so > dump

在objdump输出的文件中，查找pc地址。例如，前面crash在0x91c30的例子：

0000000000091c04 <cricket::Connection::ToString() const>:  91c04:  a9bb7bfd   stp  x29, x30, [sp,#-80]!  91c08:  f9000bfc   str  x28, [sp,#16]  91c0c:  a9025ff8   stp  x24, x23, [sp,#32]  91c10:  a90357f6   stp  x22, x21, [sp,#48]  91c14:  a9044ff4   stp  x20, x19, [sp,#64]  91c18:  910003fd   mov  x29, sp  91c1c:    d104c3ff   sub  sp, sp, #0x130  91c20:  f9400009   ldr  x9, [x0]  91c24:  aa0003f4   mov  x20, x0  91c28:  aa0803f3   mov  x19, x8  91c2c:  f9400929   ldr  x9, [x9,#16]  91c30:  d63f0120   blr  x9

可以看到，是 blr x9 时crash. crash时的寄存器信息，fault address等都可以一一对应起来。

反编译之后，需要一些汇编的基础知识来阅读分析汇编代码：

ARM的汇编指令见：Arm Architecture Reference. 注意区分不同架构的ARM指令。编译armeabi-v7a架构动态库时，默认会开启thumb2指令. thumb2指令是16位的，可以让生成的动态库更小。objdump输出的汇编中，pc每次增加4字节的是arm指令，增加2字节的是thumb2指令
除了汇编指令之外，还要了解ARM的ABI，在C和C++语言中如何传递参数和返回值。ARM ABI，见：https://developer.arm.com/architectures/system-architectures/software-standards/abi.
- 注意Apple虽然也使用ARM架构，但Apple用的ABI与ARM自己定义的ABI有差异。Apple使用的ABI，见：https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms

因为这段汇编逻辑不复杂，已经可以定位出是Connection对象野指针，从虚函数表读取了错误的虚函数地址，再去执行虚函数时crash。

有时候代码逻辑复杂，能够定位出crash时的指令，但不清楚怎么和C++代码对应的，可以借助调试器来分析和验证猜想。

调试器调试

代码调试通常只需要单步调试，但在crash分析场景，单指令调试更加方便。单指令调试结合打印寄存器值，可以快速找出汇编指令和C++的对应关系。例如，通过调试可以确认，x9是哪个虚函数的地址。

单指令调试：

溯因

通过还原调用栈、反编译、调试验证等，可以理清楚崩溃现场，找到crash的直接原因。但是问题的根本原因可能还未暴露。比如，从虚函数表加载的虚函数地址异常，可以推出Connection对象异常，但问题未必出在Connection. 需要把思路拓宽，避免紧盯着crash的一行代码而找不到根本原因。考虑如下方向

空指针，实函数内操作成员变量crash
野指针
- 实函数内操作成员变量crash
- 虚函数寻址crash
- 成员的成员函数，父类的成员的成员函数
- 被其他野指针破坏了内存数据
ABI兼容问题
- 头文件和库不匹配，导致越界访问或代码逻辑错乱

调试器在溯因过程中也非常有用。gdb和lldb都支持设置watchpoint. watchpoint可以用来分析越界读写数据：当发生了对某些地址的读写行为时，暂停程序。基本用法见下表GDB to LLDB command map：（https://lldb.llvm.org/use/map.html）

watchpoint实现原理见：Hardware Breakpoint (or watchpoint) usage in Linux Kernel

Android Studio中调试ndk代码见：https://developer.android.com/studio/debug

从崩溃分析定位和解决内存问题是亡羊补牢，而在开发过程中，我们应当做到未雨绸缪。一些工具可以方便的进行内存问题检查，与持续集成相结合，可以有效减少crash问题，提高软件质量。

基础手段

一些基础手段可以用来验证是否有内存泄漏。

top/htop 查看程序的内存占用和变化趋势，可以发现一些大块的内存泄漏
malloc hook
- 在程序内对内存的使用做一个统计分析
- Android和Linux下都有提供： https://android.googlesource.com/platform/bionic/+/master/libc/malloc_hooks/README.md https://man7.org/linux/man-pages/man3/malloc_hook.3.html 注意，malloc_hook()有严重的线程安全性问题
封装自己的内存管理函数，添加调试开关，记录内存的分配和释放。例如，mpv使用ta ("Tree Allocator") https://github.com/mpv-player/mpv/tree/master/ta

TA ("Tree Allocator") is a wrapper around malloc() and related functions,adding features like automatically freeing sub-trees of memory allocations ifa parent allocation is freed.
Generally, the idea is that every TA allocation can have a parent (indicatedby the ta_parent argument in allocation function calls). If a parent is freed,its child allocations are automatically freed as well. It is also allowed tofree a child before the parent, or to move a child to another parent withta_set_parent().
It also provides a bunch of convenience macros and debugging facilities.
The TA functions are documented in the implementation files (ta.c, ta_utils.c).
TA is intended to be useable as library independent from mpv. It doesn'tdepend on anything mpv specific.

对特定class的简单计数

Valgrind

简介

Valgrind是Linux平台常用的内存检查工具。用Valgrind启动应用，Valgrind相当于一个虚拟机，跟踪记录应用的内存申请释放等操作。Valgrind工具集包含多个工具，最常用的是memcheck. memcheck能够检查如下问题：

Use of uninitialized memory
Reading/writing memory after it has been free'd
Reading/writing off the end of malloc'd blocks
Memory leaks

Valgrind memcheck的优点是不需要重新编译应用，缺点是运行速度慢，比正常运行的应用慢20~30倍，并且占用更多内存。因此，Valgrind不适用于强实时性应用，如播放器。

另外，massif是heap profiler工具，可以量化各个模块的内存占用，以便有针对性的进行内存优化。

详细文档见：http://valgrind.org/docs/manual/index.html

Android早期版本支持Valgrind，AOSP源码中包含Valgrind的编译构建。但是Android 8.0以后，Valgrind基本无法运行。而且运行Valgrind需要root权限，因此很难找到一个可以运行Valgrind的Android设备。下面简述一下在Android上使用Valgrind的基本流程。

下载编译

从此页面http://valgrind.org/downloads/current.html下载最新Release压缩包
编译Android平台版本主要参考压缩包里的README.android

export TOOLCHAINS_PATH=$ANDROID_NDK/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin    export AR=$TOOLCHAINS_PATH/arm-linux-androideabi-ar    export LD=$TOOLCHAINS_PATH/arm-linux-androideabi-ld    export CC=$TOOLCHAINS_PATH/arm-linux-androideabi-gcc    export RANLIB=$TOOLCHAINS_PATH/arm-linux-androideabi-ranlib    export CPPFLAGS="--sysroot=$ANDROID_NDK/platforms/android-9/arch-arm"    export CFLAGS="--sysroot=$ANDROID_NDK/platforms/android-9/arch-arm"    ./configure --prefix=/data/local/tmp/Inst \        --host=armv7-unknown-linux --target=armv7-unknown-linux \        --with-tmpdir=/sdcard    make -j8 && make -j8 install DESTDIR="$PWD/Inst"

安装Valgrind

首先确认要安装的设备有root权限否则无法通过Valgrind启动应用
adb push到设备，注意: 安装到设备时，安装的目录必须和交叉编译时--prefix指定的目录一致

   adb push Inst/data/local/tmp/Inst/ /data/local/tmp/

准备应用程序

进行内存检查时，Valgrind能够给出异常的代码行和调用栈，前提是应用程序包含调试符号信息

启动应用程序

创建Valgrind日志输出目录 adb shell mkdir /sdcard/valgrind/
adb push start_valgrind.sh 到 /data/local/tmp/ 目录，并添加执行权限，start_valgrind.sh 内容如下(请根据实际情况修改包名PACKAGE)

    #!/system/bin/sh    PACKAGE="com.example.helloworld"    # Callgrind tool    #VGPARAMS='-v --error-limit=no --trace-children=yes --log-file=/sdcard/valgrind.log.%p --tool=callgrind --callgrind-out-file=/sdcard/callgrind.out.%p'    # Memcheck tool    #VGPARAMS='-v --error-limit=no --trace-children=yes --tool=memcheck --leak-check=full --show-reachable=yes'    VGPARAMS='-v --error-limit=no --trace-children=yes --track-origins=yes --log-file=/sdcard/valgrind/log.%p --tool=memcheck --leak-check=full --show-reachable=yes'    export TMPDIR=/data/data/$PACKAGE    exec /data/local/tmp/Inst/bin/valgrind $VGPARAMS $*

在设备上执行：

    PACKAGE="com.example.helloworld"    setprop "wrap.$PACKAGE" 'logwrapper /data/local/tmp/start_valgrind.sh'    getprop "wrap.$PACKAGE" #make sure setprop success    am force-stop $PACKAGE    am start -a android.intent.action.MAIN -n $PACKAGE/.MainActivity

setprop生效之后，通过am start方式和UI界面操作方式都可以启动应用。启用应用时，会先执行 start_valgrind.sh，start_valgrind.sh 执行Valgrind，Valgrind再启动要测试的应用程序。耐心等待应用程序启动，然后进行常规操作测试。

输出结果

程序执行过程中，Valgrind会把部分检查结果（如未初始化，越界访问等）输出到 /sdcard/valgrind/ 目录下。但只有程序完全退出后，Valgrind才会给出内存泄漏汇总的结果。

Android上，可以通过kill -TERM让程序退出。避免使用kill -KILL，kill -KILL会让程序会立刻终止，Valgrind无法输出结果。

在Linux系统上对demo程序做检查

#include <iostream>int main(int argc, char *argv[]){    char *p = new char[2];    p[2] = 123;    if (p[0] > 'a')        std::cout << __LINE__ << std::endl;    else        std::cout << __LINE__ << std::endl;    return 0;}

Valgrind输出如下：

==1157== Memcheck, a memory error detector==1157== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.==1157== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info==1157== Command: ./foo==1157== ==1157== Invalid write of size 1==1157==    at 0x40088B: main (foo.cpp:7)==1157==  Address 0x5ab6c82 is 0 bytes after a block of size 2 alloc'd==1157==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)==1157==    by 0x40087E: main (foo.cpp:5)==1157== ==1157== Conditional jump or move depends on uninitialised value(s)==1157==    at 0x400897: main (foo.cpp:9)==1157==  Uninitialised value was created by a heap allocation==1157==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)==1157==    by 0x40087E: main (foo.cpp:5)==1157== 12==1157== ==1157== HEAP SUMMARY:==1157==     in use at exit: 72,706 bytes in 2 blocks==1157==   total heap usage: 3 allocs, 1 frees, 73,730 bytes allocated==1157== ==1157== 2 bytes in 1 blocks are definitely lost in loss record 1 of 2==1157==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)==1157==    by 0x40087E: main (foo.cpp:5)==1157== ==1157== LEAK SUMMARY:==1157==    definitely lost: 2 bytes in 1 blocks==1157==    indirectly lost: 0 bytes in 0 blocks==1157==      possibly lost: 0 bytes in 0 blocks==1157==    still reachable: 72,704 bytes in 1 blocks==1157==         suppressed: 0 bytes in 0 blocks==1157== Reachable blocks (those to which a pointer was found) are not shown.==1157== To see them, rerun with: --leak-check=full --show-leak-kinds=all==1157== ==1157== For counts of detected and suppressed errors, rerun with: -v==1157== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

可以看到，p[2] 越界访问、p[0] 未初始化、内存泄漏，三个bug都能被定位到。

sanitizer

简介

Sanitizers包含如下几种工具

https://github.com/google/sanitizers

AddressSanitizer (detects addressability issues) and LeakSanitizer (detects memory leaks)
ThreadSanitizer (detects data races and deadlocks) for C++ and Go
MemorySanitizer (detects use of uninitialized memory)
HWASAN, or Hardware-assisted AddressSanitizer, a newer variant of AddressSanitizer that consumes much less memory
UBSan, or UndefinedBehaviorSanitizer

AddressSanitizer

实现原理见：《AddressSanitizer: A Fast Address Sanity Checker》

平台支持情况：

功能：

Use after free (dangling pointer dereference)
Heap buffer overflow
Stack buffer overflow
Global buffer overflow
Use after return
Use after scope
Initialization order bugs
Memory leaks.

注意: 检查内存泄漏的功能LeakSanitizer当前只支持Linux和macOS，且macOS上需要另外安装llvm toolchain，Xcode自带的不支持。Linux上默认开启LeakSanitizer，macOS上需要环境变量控制开启：ASAN_OPTIONS=detect_leaks=1

与Valgrind相比，Address Sanitizers对程序执行速度影响小，大约是正常执行速度的1/2.

LLVM 3.1，GCC 4.8开始支持Address Sanitizer.

编译参数，配置cflags, cxxflags, link flags: -fsanitize=address. 为了输出结果更具可读性，还需要配置下编译器优化级别、开启调试符号、跳过strip等。

Xcode内开启asan的方法：

Android上使用asan的基本步骤https://developer.android.com/ndk/guides/asan

修改编译选项

 target_compile_options(${TARGET} PUBLIC -fsanitize=address -fno-omit-frame-pointer)  set_target_properties(${TARGET} PROPERTIES LINK_FLAGS -fsanitize=address)

把asan运行时libclang_rt.asan-arm-android.so、libclang_rt.asan-aarch64-android.so等放到app动态库目录
在app动态库目录添加一个wrap.sh

 #!/system/bin/sh  HERE="$(cd "$(dirname "$0")" && pwd)"  export ASAN_OPTIONS=log_to_syslog=false,allow_user_segv_handler=1  ASAN_LIB=$(ls $HERE/libclang_rt.asan-*-android.so)  if [ -f "$HERE/libc++_shared.so" ]; then      # Workaround for https://github.com/android-ndk/ndk/issues/988.      export LD_PRELOAD="$ASAN_LIB $HERE/libc++_shared.so"  else      export LD_PRELOAD="$ASAN_LIB"  fi  "$@"

整体结构如下：

 <project root>└── app    └── src        └── main            ├── jniLibs            │   ├── arm64-v8a            │   │   └── libclang_rt.asan-aarch64-android.so            │   ├── armeabi-v7a            │   │   └── libclang_rt.asan-arm-android.so            │   ├── x86            │   │   └── libclang_rt.asan-i686-android.so            │   └── x86_64            │       └── libclang_rt.asan-x86_64-android.so            └── resources                └── lib                    ├── arm64-v8a                    │   └── wrap.sh                    ├── armeabi-v7a                    │   └── wrap.sh                    ├── x86                    │   └── wrap.sh                    └── x86_64                        └── wrap.sh

编译运行启动app，当发生内存操作错误时，日志输出错误信息

示例代码：

#include <stdlib.h>int main() {  char *x = (char*)malloc(10 * sizeof(char*));  free(x);  return x[5];}

编译： /tmp$ gcc -fsanitize=address foo.c -o foo -g -O0

执行：

/tmp$ ./foo===================================================================6145==ERROR: AddressSanitizer: heap-use-after-free on address 0x60700000dfb5 at pc 0x0000004007d4 bp 0x7ffcf79be420 sp 0x7ffcf79be410READ of size 1 at 0x60700000dfb5 thread T0    #0 0x4007d3 in main /tmp/foo.c:5    #1 0x7f84f366482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)    #2 0x4006a8 in _start (/tmp/foo+0x4006a8)0x60700000dfb5 is located 5 bytes inside of 80-byte region [0x60700000dfb0,0x60700000e000)freed by thread T0 here:    #0 0x7f84f3aa62ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)    #1 0x400797 in main /tmp/foo.c:4    #2 0x7f84f366482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)previously allocated by thread T0 here:    #0 0x7f84f3aa6602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)    #1 0x400787 in main /tmp/foo.c:3    #2 0x7f84f366482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)SUMMARY: AddressSanitizer: heap-use-after-free /tmp/foo.c:5 main...

Thread Sanitizer

ThreadSanitizer (tsan) 用来检查多线程竞争问题。clang 3.2版本、gcc 4.8版本开始支持ThreadSanitizer.

编译参数，配置cflags, cxxflags, link flags: -fsanitize=thread.

示例代码：

#include <pthread.h>#include <stdio.h>int Global;void *Thread1(void *x) {  Global++;  return NULL;}void *Thread2(void *x) {  Global--;  return NULL;}int main() {  pthread_t t[2];  pthread_create(&t[0], NULL, Thread1, NULL);  pthread_create(&t[1], NULL, Thread2, NULL);  pthread_join(t[0], NULL);  pthread_join(t[1], NULL);}

编译

/tmp$ gcc -fsanitize=thread -o foo foo.c -g -O0

执行

/tmp$ ./foo==================WARNING: ThreadSanitizer: data race (pid=6761)  Read of size 4 at 0x00000060107c by thread T2:    #0 Thread2 /tmp/foo.c:12 (foo+0x000000400998)    #1 <null> <null> (libtsan.so.0+0x0000000230d9)  Previous write of size 4 at 0x00000060107c by thread T1:    #0 Thread1 /tmp/foo.c:7 (foo+0x00000040095b)    #1 <null> <null> (libtsan.so.0+0x0000000230d9)  Location is global 'Global' of size 4 at 0x00000060107c (foo+0x00000060107c)  Thread T2 (tid=6764, running) created by main thread at:    #0 pthread_create <null> (libtsan.so.0+0x000000027577)    #1 main /tmp/foo.c:19 (foo+0x000000400a23)  Thread T1 (tid=6763, finished) created by main thread at:    #0 pthread_create <null> (libtsan.so.0+0x000000027577)    #1 main /tmp/foo.c:18 (foo+0x000000400a04)SUMMARY: ThreadSanitizer: data race /tmp/foo.c:12 Thread2==================ThreadSanitizer: reported 1 warnings

Memory Sanitizer

MemorySanitizer (MSan)用来检查对未初始化内存的访问，clang 3.3版本开始支持。

编译配置：-fsanitize=memory.

示例代码：

#include <stdio.h>#include <stdlib.h>#include <string.h>int main(int argc, char *argv[]){    int n;    if (n > 0)        printf("%d\n", n);    return 0;}

编译

/tmp$ clang -fsanitize=memory -fPIE -pie -fno-omit-frame-pointer -g bar.c -o bar

执行

/tmp$ ./bar==201433==WARNING: MemorySanitizer: use-of-uninitialized-value    #0 0x494fc7 in main /tmp/bar.c:7:7    #1 0x7f9e2ba4b0b2 in __libc_start_main /build/glibc-YbNSs7/glibc-2.31/csu/../csu/libc-start.c:308:16    #2 0x41c26d in _start (/tmp/bar+0x41c26d)SUMMARY: MemorySanitizer: use-of-uninitialized-value /tmp/bar.c:7:7 in mainExiting

借鉴开源项目经验——以FFmpeg为例

多媒体音视频领域涉及对不可信输入做解析、处理IO错误、进行汇编优化等等，会出现各种内存操作错误。FFmpeg为了减小这类错误的发生，做了以下工作：

Patch review: 大体来说，项目代码质量与参与review人数成正比
Fate测试 FFmpeg Automated Testing Environment
把内存问题检测工具引入编译构建

$ ./configure --help |less...  --toolchain=NAME         set tool defaults according to NAME                           (gcc-asan, clang-asan, gcc-msan, clang-msan,                           gcc-tsan, clang-tsan, gcc-usan, clang-usan,                           valgrind-massif, valgrind-memcheck,                           msvc, icl, gcov, llvm-cov, hardened)...

可以看到，通过简单配置configure option，可以开启gcc/clang的几个sanitizer工具，valgrind memcheck工具等。结合out-of-tree编译和fate测试FFmpeg Automated Testing Environment，每次代码改动可以执行多种编译配置，用多种工具对单元测试、系统测试进行检查。

oss-fuzz https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg

附录

std::make_shared and std::weak_ptr

看下面这段代码：

 1 #include <iostream>  2 #include <memory>  3  4 class Foo {  5 public:  6   ~Foo() {  7       std::cout << __PRETTY_FUNCTION__ << '\n';  8   }  9 10   char buf[256]; 11 }; 12 13 void test() { 14   auto a = std::make_shared<Foo>(); 15   std::weak_ptr<Foo> b(a); 16   a = nullptr; 17   b.lock(); 18 } 19 20 int main() { 21   test(); 22   return 0; 23 }

先看内存分配的size:

(lldb) bt *   thread #1, name = 'foo', stop reason = breakpoint 1.2 *   frame #0: 0x00007ffff7af7260 libc.so.6`__GI___libc_malloc(bytes=272) at malloc.c:3023:1     frame #1: 0x00007ffff7e60c29 libstdc++.so.6`operator new(unsigned long) + 25     frame #2: 0x0000000000401d44 foo`__gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2> >::allocate(this=0x00007fffffffdd50, __n=1, (null)=0x0000000000000000) at new_allocator.h:114:27     frame #3: 0x0000000000401cb4 foo`std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2> > >::allocate(__a=0x00007fffffffdd50, __n=1)2> >&, unsigned long) at alloc_traits.h:444:20     frame #4: 0x0000000000401aba foo`std::__allocated_ptr<std::allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2> > > std::__allocate_guarded<std::allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__a=0x00007fffffffdd50)2> > >(std::allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2> >&) at allocated_ptr.h:97:21     frame #5: 0x0000000000401950 foo`std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<Foo, std::allocator<Foo> >(this=0x00007fffffffdea8, __p=0x00007fffffffdea0, __a=_Sp_alloc_shared_tag<std::allocator<Foo> > @ 0x00007fffffffdd68) at shared_ptr_base.h:677:19     frame #6: 0x00000000004018f0 foo`std::__shared_ptr<Foo, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<Foo> >(this=0x00007fffffffdea0, __tag=_Sp_alloc_shared_tag<std::allocator<Foo> > @ 0x00007fffffffdd98) at shared_ptr_base.h:1344:14     frame #7: 0x00000000004018a8 foo`std::shared_ptr<Foo>::shared_ptr<std::allocator<Foo> >(this=0x00007fffffffdea0, __tag=_Sp_alloc_shared_tag<std::allocator<Foo> > @ 0x00007fffffffddc8) at shared_ptr.h:359:4     frame #8: 0x000000000040182b foo`std::shared_ptr<Foo> std::allocate_shared<Foo, std::allocator<Foo> >(__a=0x00007fffffffde40) at shared_ptr.h:701:14     frame #9: 0x0000000000401494 foo`std::shared_ptr<Foo> std::make_shared<Foo>() at shared_ptr.h:717:14     frame #10: 0x0000000000401271 foo`test() at foo.cpp:14:12

可以看到，std::make_shared<Foo>()申请分配内存size为272，比sizeof(Foo)多了16字节。

再看何时执行析构：

(lldb) c  Process 206788 resumingProcess 206788 stopped  * thread #1, name = 'foo', stop reason = breakpoint 3.1     frame #0: 0x000000000040218c foo`Foo::~Foo(this=0x0000000000418ec0) at foo.cpp:7:17    4    class Foo {    5    public:    6      ~Foo() {  -> 7          std::cout << __PRETTY_FUNCTION__ << '\n';    8      }    9    10     char buf[256];  (lldb) bt  * thread #1, name = 'foo', stop reason = breakpoint 3.1   * frame #0: 0x000000000040218c foo`Foo::~Foo(this=0x0000000000418ec0) at foo.cpp:7:17     frame #1: 0x0000000000402179 foo`void __gnu_cxx::new_allocator<Foo>::destroy<Foo>(this=0x0000000000418ec0, __p=0x0000000000418ec0) at new_allocator.h:153:10     frame #2: 0x0000000000402110 foo`void std::allocator_traits<std::allocator<Foo> >::destroy<Foo>(__a=0x0000000000418ec0, __p=0x0000000000418ec0) at alloc_traits.h:497:8     frame #3: 0x0000000000401edf foo`std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2>::_M_dispose(this=0x0000000000418eb0) at shared_ptr_base.h:557:2     frame #4: 0x00000000004016dc foo`std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(this=0x0000000000418eb0) at shared_ptr_base.h:155:6     frame #5: 0x000000000040168a foo`std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count(this=0x00007fffffffddf8) at shared_ptr_base.h:730:11     frame #6: 0x000000000040252e foo`std::__shared_ptr<Foo, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr(this=0x00007fffffffddf0) at shared_ptr_base.h:1169:31     frame #7: 0x0000000000402423 foo`std::__shared_ptr<Foo, (__gnu_cxx::_Lock_policy)2>::operator=(this=0x00007fffffffdea0, __r=0x00007fffffffde80)2>&&) at shared_ptr_base.h:1265:2     frame #8: 0x0000000000401554 foo`std::shared_ptr<Foo>::operator=(this=0x00007fffffffdea0, __r=nullptr) at shared_ptr.h:335:27     frame #9: 0x0000000000401298 foo`test() at foo.cpp:16:5

a = nullptr 时执行了析构。

再看何时free：

* thread #1, name = 'foo', stop reason = breakpoint 2.2    frame #0: 0x00007ffff7af7850 libc.so.6`__GI___libc_free(mem=0x0000000000418eb0) at malloc.c:3087:1(lldb) bt  * thread #1, name = 'foo', stop reason = breakpoint 2.2    * frame #0: 0x00007ffff7af7850 libc.so.6`__GI___libc_free(mem=0x0000000000418eb0) at malloc.c:3087:1      frame #1: 0x00000000004022f0 foo`__gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2> >::deallocate(this=0x00007fffffffddb0, __p=0x0000000000418eb0, (null)=1)2>*, unsigned long) at new_allocator.h:128:2      frame #2: 0x00000000004022c8 foo`std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2> > >::deallocate(__a=0x00007fffffffddb0, __p=0x0000000000418eb0, __n=1)2> >&, std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2>*, unsigned long) at alloc_traits.h:470:13      frame #3: 0x0000000000401c44 foo`std::__allocated_ptr<std::allocator<std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2> > >::~__allocated_ptr(this=0x00007fffffffdda0) at allocated_ptr.h:73:4      frame #4: 0x0000000000401f45 foo`std::_Sp_counted_ptr_inplace<Foo, std::allocator<Foo>, (__gnu_cxx::_Lock_policy)2>::_M_destroy(this=0x0000000000418eb0) at shared_ptr_base.h:567:7      frame #5: 0x00000000004017e9 foo`std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release(this=0x0000000000418eb0) at shared_ptr_base.h:194:6      frame #6: 0x000000000040179a foo`std::__weak_count<(__gnu_cxx::_Lock_policy)2>::~__weak_count(this=0x00007fffffffde98) at shared_ptr_base.h:823:11      frame #7: 0x000000000040175e foo`std::__weak_ptr<Foo, (__gnu_cxx::_Lock_policy)2>::~__weak_ptr(this=0x00007fffffffde90) at shared_ptr_base.h:1596:29      frame #8: 0x00000000004015e8 foo`std::weak_ptr<Foo>::~weak_ptr(this=0x00007fffffffde90) at shared_ptr_base.h:351:11      frame #9: 0x00000000004012c4 foo`test() at foo.cpp:18:1

可以看到，直到weak_ptr out of scope时才执行free(). 所以make_shared和weak_ptr 一块使用的场景需要谨慎处理。

参考文献

1. Scott Meyers, "Effective Modern C++", 2014.

2. Bjarne Stroustrup, "The C++ Programming Language, 4th Edition", 2013.

3. "Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile", https://developer.arm.com/documentation/ddi0487/latest/.

4. Prasad Krishnan, "Hardware Breakpoint (or watchpoint) usage in Linux Kernel".

5. Konstantin Serebryany, Derek Bruening, etc., "AddressSanitizer: A Fast Address Sanity Checker".