前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >如何对CDH集群中的Impala打印线程堆栈

如何对CDH集群中的Impala打印线程堆栈

作者头像
Fayson
发布2019-11-11 13:26:44
3K0
发布2019-11-11 13:26:44
举报
文章被收录于专栏:Hadoop实操Hadoop实操

作者简介:黄权隆,Cloudera研发工程师,Apache Impala PMC & Comitter,毕业于北大计算机系网络所数据库实验室,曾就职于Hulu大数据基础架构团队,负责大数据系统的维护和二次开发,主要负责Impala和HBase方向。现就职于Cloudera,专注于Impala开发。

上一篇文章《Impala查询卡顿分析案例》介绍了怎么对Impala进程打印线程堆栈,JVM部分直接用 jstack 比较直接,但 C++ 部分由于要使用 gdb 或 breakpad 工具,还需要编译源码,显得比较繁琐。本文直接演示如何在 CDH 集群中打印 Impala 进程的线程堆栈,不再需要编译源码。当然第一次操作时还是需要下载一些工具,可以在集群中固定选一台机器来配置环境,以后再操作时就比较方便了。

1. 生成 Minidump 文件

登上 impalad 所在机器,找到 impalad 进程ID.

代码语言:javascript
复制
$ ps aux | grep impalad
root      4374  0.0  0.0  12944   972 pts/0    S+   16:49   0:00 grep --color=auto impalad
impala   29645  1.0  3.0 2999416 231972 ?      Sl   16:17   0:20 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad --flagfile=/run/cloudera-scm-agent/process/55-impala-IMPALAD/impala-conf/impalad_flags
impala   29652  0.0  0.1 197888 13556 ?        Sl   16:17   0:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-redactor /usr/lib/cmf/service/impala/impala.sh impalad impalad_flags false

上面进程号为 29645 就是 impalad 进程。对它发送 SIGUSR1 信号触发 minidump:

代码语言:javascript
复制
$ kill -s SIGUSR1 29645

在 /var/log/impalad/impalad.INFO 中可以找到:

代码语言:javascript
复制
Wrote minidump to /var/log/impala-minidumps/impalad/3745e5d7-9281-4548-2fd5b4b1-adc7f7eb.dmp

2. 生成 Breakpad symbol 文件

2.1 配置 Breakpad 工具

Impala 源码中有一个脚本 (bin/dump_breakpad_symbols.py) 可以生成 breakpad 形式的 symbol 文件。下载对应版本的 Impala 源码,可以在 cloudera github 的 release 页面查找:https://github.com/cloudera/Impala/releases

本例中 CDH 版本是 5.16.2,下载并解压 https://github.com/cloudera/Impala/archive/cdh5.16.2-release.tar.gz (大小为 692MB)

注:cloudera impala repo很大 (15GB),如果只需要一个版本的代码,没必要 git clone.

代码语言:javascript
复制
wget https://github.com/cloudera/Impala/archive/cdh5.16.2-release.tar.gz
tar zxf cdh5.16.2-release.tar.gz
cd Impala-cdh5.16.2-release

为了让 bin/dump_breakpad_symbols.py 能运行,我们还需要配置一下环境。确保 JAVA_HOME 变量指向了正确的目录,然后运行

代码语言:javascript
复制
# 确保 JAVA_HOME 变量有配置并指向了正确的目录
$ export JAVA_HOME=/usr/java/jdk1.8.0_162-cloudera
$ source bin/impala-config.sh

# 国内用户可以使用阿里云的 python 镜像
$ export PYPI_MIRROR="http://mirrors.aliyun.com/pypi"
$ $IMPALA_HOME/infra/python/deps/download_requirements

然后需要初始化一下toolchain里的breakpad,使用 bin/bootstrap_toolchain.py。正常来说这个脚本会下载所有的toolchain,耗时较长,我们只需要breakpad部分,可以对 bin/boostrap_toolchain.py 作如下修改:

代码语言:javascript
复制
   # LLVM and Kudu are the largest packages. Sort them first so that
   # their download starts as soon as possible.
-  packages = map(Package, ["llvm", "kudu",
-      "avro", "binutils", "boost", "breakpad", "bzip2", "cmake", "crcutil",
-      "flatbuffers", "gcc", "gflags", "glog", "gperftools", "gtest", "libev",
-      "lz4", "openldap", "openssl", "protobuf",
-      "rapidjson", "re2", "snappy", "thrift", "tpc-h", "tpc-ds", "zlib"])
-  packages.insert(0, Package("llvm", "3.9.1-asserts"))
+  packages = map(Package, ["breakpad"])
   bootstrap(toolchain_root, packages)

即在 bootstrap_toolchain.py 的最后部分里把其它 package 都去掉,只加上 breakpad 的。然后再执行这个脚本:

代码语言:javascript
复制
$ bin/bootstrap_toolchain.py
INFO:bootstrap_virtualenv:Creating python virtualenv
INFO:bootstrap_virtualenv:Installing packages into the virtualenv
INFO:bootstrap_virtualenv:Installing stage 2 packages into the virtualenv
2019-11-10 01:31:23,683 Thread-3 INFO: Downloading https://native-toolchain.s3.amazonaws.com/build/257-0847514126/breakpad/97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz to /root/Impala-cdh5.16.2-release/toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz (attempt 1)
2019-11-10 01:31:24,452 Thread-3 INFO: Extracting breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz

2.2 生成 symbol 文件

2.2.1 使用本地 parcel 里的可执行文件

之后就可以使用 dump_breakpad_symbols.py 了,前面在用 ps 查找 impalad 进程的时候看到可执行文件是 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad,对它来生成 symbol 文件,放到 /tmp/syms 目录下:

代码语言:javascript
复制
$ bin/dump_breakpad_symbols.py -f /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad -d /tmp/syms
INFO:root:Processing binary file: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.
2.2.2 使用 deb 安装包里的可执行文件

上述方式生成的 symbol 文件不带有文件名和行号,如果想尽可能地结合代码,可以下载并解析对应系统的 rpm/deb 包。这些包可以在 http://archive.cloudera.com 中找到,比如 cdh5 对应的 ubuntu 的包都在 http://archive.cloudera.com/cdh5/ubuntu 下。本例中使用的系统是 ubuntu16.04,各个版本的 impala cdh 包在 http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala 下都可以找到,下载如下两个文件:

  • 可执行文件deb包 (345MB):http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb
  • 包含上述可执行文件debug信息的deb包 (471MB):http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb

然后仍是使用 dump_breakpad_symbols.py:

代码语言:javascript
复制
$ bin/dump_breakpad_symbols.py -r ~/Downloads/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb -s ~/Downloads/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb -d /tmp/syms
INFO:root:Extracting to /tmp/tmpBDEwFI: /home/quanlong/Downloads/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb
INFO:root:Extracting to /tmp/tmpBDEwFI: /home/quanlong/Downloads/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libstdc++.so.6.0.20
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libgcc_s.so.1
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libkudu_client.so.0.1.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libstdc++.so.6
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libkudu_client.so.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libssl.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libcrypto.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libcrypto.so.1.0.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libssl.so.1.0.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-debug/libfesupport.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-debug/impalad
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-retail/libfesupport.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-retail/impalad

这样 /tmp/syms 里的 symbol 信息就包含文件名和行号了。

3. 使用 symbol 文件解析 minidump

使用 Impala 源码目录里 toolchain 下的 breakpad 目录下的 minidump_stackwalk 工具就可以根据 symbol 文件来解析 minidump,假设把解析结果放到 /tmp/resolved.txt,把 breakpad 的日志放到 /tmp/breakpad.log,指令如下:

代码语言:javascript
复制
$ toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2/bin/minidump_stackwalk /var/log/impala-minidumps/impalad/3745e5d7-9281-4548-2fd5b4b1-adc7f7eb.dmp /tmp/syms > /tmp/resolved.txt 2>/tmp/breakpad.log

生成的 resolved.txt 形式如下:

代码语言:javascript
复制
Operating system: Linux
                  0.0.0 Linux 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64
CPU: amd64
     family 6 model 63 stepping 0
     2 CPUs

GPU: UNKNOWN

Crash reason:  DUMP_REQUESTED
Crash address: 0x217a097
Process uptime: not available

Thread 0 (crashed)
 0  impalad!google_breakpad::ExceptionHandler::WriteMinidump() + 0x57
    rax = 0x0000000002149a7e   rdx = 0x0000000000000000
    rcx = 0x000000000217a07f   rbx = 0x0000000000000000
    rsi = 0x0000000000000001   rdi = 0x00007ffed049f068
    rbp = 0x00007ffed049f770   rsp = 0x00007ffed049efd0
     r8 = 0x0000000000000000    r9 = 0x0000000000000024
    r10 = 0x0000000002288a89   r11 = 0x0000000000000000
    r12 = 0x00007ffed049f630   r13 = 0x0000000000d5cff0
    r14 = 0x0000000000000000   r15 = 0x00007ffed049f690
    rip = 0x000000000217a097
    Found by: given as instruction pointer in context
 1  impalad!google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, bool (*)(google_breakpad::MinidumpDescriptor const&, void*, bool), void*) + 0xf0
    rbx = 0x00007f92561325a0   rbp = 0x00007ffed049f770
    rsp = 0x00007ffed049f620   r12 = 0x00007ffed049f630
    r13 = 0x0000000000d5cff0   r14 = 0x0000000000000000
    r15 = 0x00007ffed049f690   rip = 0x000000000217a960
    Found by: call frame info
 2  libpthread-2.23.so + 0x11390
    rbx = 0x0000000000000000   rbp = 0x00007ffed049fdd0
    rsp = 0x00007ffed049f780   r12 = 0x0000000007ada458
    r13 = 0x0000000007ada480   r14 = 0x0000000000000000
    r15 = 0x00007ffed049fdf0   rip = 0x00007f92556fe390
    Found by: call frame info
 3  impalad!boost::thread::join_noexcept() + 0x5c
    rbp = 0x00007ffed049fdf0   rsp = 0x00007ffed049fde0
    rip = 0x0000000001334cec
    Found by: previous frame's frame pointer
 4  impalad!impala::ThriftServer::Join() [thread.hpp : 767 + 0x8]
    rbx = 0x000000000648b420   rbp = 0x00007ffed049fe80
    rsp = 0x00007ffed049fe40   r12 = 0x00007f91fef44700
    r13 = 0x00007ffed049ff20   r14 = 0x0000000006acbae0
    r15 = 0x0000000000000002   rip = 0x0000000000b34f4f
    Found by: call frame info
 5  impalad!impala::ImpalaServer::Join() [impala-server.cc : 2151 + 0xc]
    rbx = 0x0000000006621800   rbp = 0x00007ffed049feb0
    rsp = 0x00007ffed049fe90   r12 = 0x00007ffed049ffb0
    r13 = 0x00007ffed049ff20   r14 = 0x0000000006acbae0
    r15 = 0x0000000000000002   rip = 0x0000000000c28f8a
    Found by: call frame info
 6  impalad!ImpaladMain(int, char**) [impalad-main.cc : 98 + 0xc]
    rbx = 0x00007ffed049ff90   rbp = 0x00007ffed04a0130
    rsp = 0x00007ffed049fec0   r12 = 0x00007ffed049ffb0
    r13 = 0x00007ffed049ff20   r14 = 0x0000000006acbae0
    r15 = 0x0000000000000002   rip = 0x0000000000c238e1
    Found by: call frame info......

第一个线程 (Thread 0) 标记了 Crashed,但实际是在做 minidump 的线程,上面的 Crash reason 已经写了是 DUMP_REQUESTED。实际进程 crash 时,会有具体的原因的。解析的输出包含了很多寄存器的值,有点影响阅读,可以把它们去掉:

代码语言:javascript
复制
grep -v = /tmp/resolved.txt | grep -v 'Found by' | less

这样能看到比较舒服的堆栈:

代码语言:javascript
复制
Thread 119
 0  libpthread-2.23.so + 0xd360
 1  impalad!impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) [disk-io-mgr.cc : 977 + 0x5]
 2  impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*) [function_template.hpp : 767 + 0x7]
 3  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*), boost::_bi::list5<boost::_bi::value
<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long>*> > > >::run() [bind.hpp : 525 + 0x6]
 4  impalad!thread_proxy + 0xda
 5  libpthread-2.23.so + 0x76ba
 6  libc-2.23.so + 0x10741d

4. 操作错误示例

解析文件里如果没有函数名,则是 symbol 文件和 minidump 没有配对上,breakpad.log 里可能会有类似的日志:

代码语言:javascript
复制
2019-11-09 23:57:23: minidump_processor.cc:201: INFO: Looking at thread /var/log/impala-minidumps/impalad/9e41139b-a5b1-4f94-df3da8b6-c0c66040.dmp:0/155 id 0x73cd
2019-11-09 23:57:23: minidump.cc:473: INFO: MinidumpContext: looks like AMD64 context
2019-11-09 23:57:23: minidump.cc:473: INFO: MinidumpContext: looks like AMD64 context
2019-11-09 23:57:23: simple_symbol_supplier.cc:196: INFO: No symbol file at /tmp/syms/impalad/DD8351C4C1817BE1D142C187FA70CCAC0/impalad.sym
2019-11-09 23:57:23: stackwalker.cc:103: INFO: Couldn't load symbols for: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad|DD8351C4C1817BE1D142C187FA70CCAC0
2019-11-09 23:57:23: simple_symbol_supplier.cc:196: INFO: No symbol file at /tmp/syms/libpthread-2.23.so/23E017CE2254FC6511D9BC8F534BB4F00/libpthread-2.23.so.sym
2019-11-09 23:57:23: stackwalker.cc:103: INFO: Couldn't load symbols for: /lib/x86_64-linux-gnu/libpthread-2.23.so|23E017CE2254FC6511D9BC8F534BB4F00

最重要的是 "No symbol file at /tmp/syms/impalad/DD...C0/impalad.sym" 这句,表示找不到想要的 symbol 文件。查看 /tmp/syms/impalad 目录,确实这串字符串匹配不上,log里要的是DD8351C4C1817BE1D142C187FA70CCAC0:

代码语言:javascript
复制
$ ls /tmp/syms/impalad/
7F9EC4C10024BDC531665853311E9CCE0

这是因为我选择了错误的 impalad 文件来生成 symbol,其实要选择 impalad 进程使用的文件,即 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad

在 CDH parcel 目录里有多个 impalad 文件,切记不要选错了:

代码语言:javascript
复制
$ find /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8 -name impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-debug/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/debug/usr/lib/impala/sbin-retail/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/debug/usr/lib/impala/sbin-debug/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/bin/impalad

当然最好还是使用 deb 包来 dump symbol,这样得到的信息更全,详见 2.2.2。

5. 总结

操作步骤:

  1. 触发 Minidump: kill -s SIGUSR1 $PID
  2. 生成 Breakpad symbol 文件:bin/dump_breakpad_symbols.py -f impalad文件 -d /tmp/syms
  3. 解析 Minidump 文件: minidump_stackwalk minidump文件 /tmp/syms > /tmp/resolved.txt 2>/tmp/breakpad.log

环境配置步骤详见文章内容。

参考文档

https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Minidumps

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-11-11,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Hadoop实操 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 2.2.1 使用本地 parcel 里的可执行文件
  • 2.2.2 使用 deb 安装包里的可执行文件
相关产品与服务
专用宿主机
专用宿主机(CVM Dedicated Host,CDH)提供用户独享的物理服务器资源,满足您资源独享、资源物理隔离、安全、合规需求。专用宿主机搭载了腾讯云虚拟化系统,购买之后,您可在其上灵活创建、管理多个自定义规格的云服务器实例,自主规划物理资源的使用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档