前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >kvm环境下分析虚拟机crash

kvm环境下分析虚拟机crash

作者头像
惠伟
发布2021-02-24 11:24:30
1.9K0
发布2021-02-24 11:24:30
举报
文章被收录于专栏:虚拟化笔记虚拟化笔记

云上环境运行虚拟机有qemu crash,qemu进程本身代码异常或者被host OOM了,gdb看qemu core或者看host上log,但更多的是windows guest蓝屏和linux guest panic,guest crash后host上qemu进程正常,大概率是guest本身的问题或者guest和host配合的问题,都是开发人员先分析,确定是用户自己的应用导致的才能甩锅,要不然都是云的问题,所以不得不分析guest操作系统,做个云容易吗,慢了是云的问题,出问题都是云的问题,开发人员干了原来硬件公司和操作系统厂商的活,个人买个lenovo的笔记本,上面安装windows系统,用着蓝屏了,lenovo和miscrosoft会管吗?云上换虚拟机了,全都是云的问题了,我太难了。

不管是windows还是linux crash时保留好内存和CPU状态就可以用工具分析。

windows

windows蓝屏时会生成dmp文件,可以用windbg分析,但有时候用户不让进它的操作系统,自己本身有操守不复制用户磁盘,更懒得用nbd操作,或者磁盘满了,dmp文件没有写到磁盘,这时就得蓝屏时停止guest执行,然后导出guest内存和CPU状态,qemu是guest的大官家,掌握guest的一切信息。

代码语言:javascript
复制
#故意让windows蓝屏
virsh qemu-monitor-command instance-00000b2e --hmp  nmi
#导出guest内存
virsh qemu-monitor-command instance-00000b2e --hmp  dump-guest-memory  /home/qemu/instance-00000b2e.dump

qemu导出的内存windbg不能直接用,得用工具转换一下。

代码语言:javascript
复制
#安装工具
git clone https://github.com/volatilityfoundation/volatility
python setup.py install
#有inline失败的问题,把函数把inline变为非inline
git clone https://github.com/gdabah/distorm
python setup.py install
#复制库到python sys.path否则报错
cp build/lib.linux-x86_64-2.7/_distorm3.so  /usr/lib64/python2.7/

#分析guest内存布局
[root@rg1-ostack37 /home/huiwei]# file instance-00000b2e.dump
instance-00000b2e.dump: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style

[root@rg1-ostack37 /home/huiwei]# /bin/vol.py imageinfo -f instance-00000b2e.dump
Volatility Foundation Volatility Framework 2.6.1
INFO    : volatility.debug    : Determining profile based on KDBG search...
          Suggested Profile(s) : Win7SP1x64, Win7SP0x64, Win2008R2SP0x64, Win2008R2SP1x64_24000, Win2008R2SP1x64_23418, Win2008R2SP1x64, Win7SP1x64_24000, Win7SP1x64_23418
                     AS Layer1 : WindowsAMD64PagedMemory (Kernel AS)
                     AS Layer2 : QemuCoreDumpElf (Unnamed AS)
                     AS Layer3 : FileAddressSpace (/home/huiwei/instance-00000b2e.dump)
                      PAE type : No PAE
                           DTB : 0x187000L
                          KDBG : 0xf8000164f110L
          Number of Processors : 2
     Image Type (Service Pack) : 1
                KPCR for CPU 0 : 0xfffff80001650d00L
                KPCR for CPU 1 : 0xfffff880009c5000L
             KUSER_SHARED_DATA : 0xfffff78000000000L
           Image date and time : 2020-10-22 09:37:00 UTC+0000
     Image local date and time : 2020-10-22 17:37:00 +0800
#进行转换
[root@rg1-ostack37 /home/huiwei]# /bin/vol.py -f instance-00000b2e.dump --profile Win2008R2SP1x64  raw2dmp -O   instance-00000b2e.dmp
Volatility Foundation Volatility Framework 2.6.1
Writing data (5.00 MB chunks): |..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................|
[root@rg1-ostack37 /home/huiwei]# file instance-00000b2e.dmp
instance-00000b2e.dmp: MS Windows 64bit crash dump, full dump, 1310720 pages

然后把dmp文件导入windbg,运行“!analyze -v”指令,查看call stack和disassemble。

代码语言:javascript
复制
Microsoft (R) Windows Debugger Version 10.0.19041.1 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\huiwei\Downloads\instance-00000b2e.dmp]
Kernel Complete Dump File: Full address space is available

Comment: 'File was converted with Volatility'
Symbol search path is: srv*
Executable search path is: 
Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x64
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Built by: 7601.23714.amd64fre.win7sp1_ldr.170307-1800
Machine Name:
Kernel base = 0xfffff800`01462000 PsLoadedModuleList = 0xfffff800`016a4730
Debug session time: Thu Oct 22 17:37:00.093 2020 (UTC + 8:00)
System Uptime: 0 days 0:05:38.234
Loading Kernel Symbols
...............................................................
................................................................
..
Loading User Symbols
..............................
Loading unloaded module list
.....Unable to enumerate user-mode unloaded modules, NTSTATUS 0xC0000147
Unknown exception - code 45474150 (first/second chance not available)
For analysis of this file, run !analyze -v
0: kd> !analyze -v


STACK_TEXT:  
fffff800`027ddbf8 fffff800`0142b9b8 : 00000000`00000080 00000000`004f4454 00000000`00000000 00000000`00000000 : nt!KeBugCheckEx
fffff800`027ddc00 fffff800`015ebd5f : 00000000`00000001 fffff800`014422b0 00000000`00000000 00000000`0000005c : hal!HalBugCheckSystem+0x160
fffff800`027ddc40 fffff800`014257a1 : 00000000`000006c0 fffff800`027dde30 fffff800`027ddd30 fffff800`014422b0 : nt!WheaReportHwError+0x26f
fffff800`027ddca0 fffff800`0158df21 : fffff800`027dde70 00000000`00000001 00000000`00000001 00000000`00000000 : hal!HalHandleNMI+0x149
fffff800`027ddcd0 fffff800`014ce782 : 00000000`000000b8 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiProcessNMI+0x131
fffff800`027ddd30 fffff800`014ce5e3 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxNmiInterrupt+0x82
fffff800`027dde70 00000000`00aa2334 : 00000000`12ca36c0 00000000`12db3040 00000000`00602a00 00000000`00000000 : nt!KiNmiInterruptStart+0x163
00000000`12bffc10 00000000`12ca36c0 : 00000000`12db3040 00000000`00602a00 00000000`00000000 00000000`00000000 : 0xaa2334
00000000`12bffc18 00000000`12db3040 : 00000000`00602a00 00000000`00000000 00000000`00000000 00000000`00000000 : 0x12ca36c0
00000000`12bffc20 00000000`00602a00 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x12db3040
00000000`12bffc28 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x602a00

看汇编是哪条指令引起异常的,执行这条指令时guest有没有exit,如果kvm trap了大概率是kvm的问题,如果没有就是guest本身的问题了,windows大概能定位出是内核还是哪个驱动引起的问题就行,如果是驱动直接卸载驱动,有人就喜欢给windows guest里安装各种安全组件经常导致蓝屏或者性能大大下降,没有源码也就只能这样处理了。

linux

linux就是相对来说简单点,网上资料很多,看console输出或者找log,大概类似这样的,有没有人了解这一坨是怎么打印出来的?

代码语言:javascript
复制
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa02480f9>] hello_init+0x19/0x1158 [hello]
<4>PGD 375d4067 PUD 37bff067 PMD 0
<4>Oops: 0002 [#1] SMP
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 1
<4>Modules linked in: hello(+)(U) autofs4 sunrpc ipv6 microcode joydev virtio_balloon sg virtio_console virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 2608, comm: insmod Not tainted 2.6.32-642.1.1.el6.x86_64 #1 QEMU 08335973-8661-4206-927d-2853ebe22ade
<4>RIP: 0010:[<ffffffffa02480f9>]  [<ffffffffa02480f9>] hello_init+0x19/0x1158 [hello]
<4>RSP: 0018:ffff880037137cd8  EFLAGS: 00010296
<4>RAX: 000000000000000f RBX: ffffffffa02480e0 RCX: 0000000000000074
<4>RDX: 000000000000000e RSI: 00000000fffffffe RDI: 0000000000000000
<4>RBP: ffff880037137f08 R08: ffffffffa024b000 R09: 0000000000001967
<4>R10: 000000000000001a R11: 0000000000000000 R12: 0000000000000000
<4>R13: ffff880036cffb40 R14: 00007f6915670010 R15: ffffffff81a96950
<4>FS:  00007f69156b2700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000036c74000 CR4: 00000000001407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>Process insmod (pid: 2608, threadinfo ffff880037134000, task ffff880036f87520)
<4>Stack:
<4> ffff880036cffb40 ffff880036cffb4b ffff880037137db8 ffffffff812a3f63
<4><d> 0000000000000005 0000000affffffff ffffffffffffffff ffffea0000c146a0
<4><d> ffff8800000126c0 0000000000000297 ffff880037137d88 0000000000000297
<4>Call Trace:
<4> [<ffffffff812a3f63>] ? pointer+0x233/0x6a0
<4> [<ffffffff812a4978>] ? vsnprintf+0x198/0x5e0
<4> [<ffffffff81014ac9>] ? sched_clock+0x9/0x10
<4> [<ffffffff812a8710>] ? kvasprintf+0x70/0x90
<4> [<ffffffffa02480e0>] ? hello_init+0x0/0x1158 [hello]
<4> [<ffffffff812a8768>] ? kasprintf+0x38/0x40
<4> [<ffffffffa02480e0>] ? hello_init+0x0/0x1158 [hello]
<4> [<ffffffff810020d0>] do_one_initcall+0xc0/0x280
<4> [<ffffffff810c85f1>] sys_init_module+0xe1/0x250
<4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
<4>Code: 24 a0 31 c0 e8 9c ee 2f e1 c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 81 ec 08 02 00 00 0f 1f 44 00 00 <c6> 04 25 00 00 00 00 12 e8 fa fe ff ff 8b 3d 24 16 00 00 be 00
<1>RIP  [<ffffffffa02480f9>] hello_init+0x19/0x1158 [hello]
<4> RSP <ffff880037137cd8>
<4>CR2: 0000000000000000

其实要重点关注的就是IP寄存器,看它执行到哪条指令,如上面的RIP [<ffffffffa02480f9>] hello_init+0x19/0x1158 [hello],大概看一下代码,确定是vmlinuz还是ko的问题,如果是vmlinuz,把/boo/vmlinuz-2.6.32-642.1.1.el6.x86_64转换成EFL格式再反汇编,如果是ko直接反汇编,查看IP前面的指令,如果有各种sym文件或者重新编译内核打开DEBUG选项能对应到源码就更方便了。例子中是hello.ko,给空指针写了,反汇编看一下

代码语言:javascript
复制
objdump -S hello.ko  > hello.s

static int hello_init(void)
{
      e0:       55                      push   %rbp
      e1:       48 89 e5                mov    %rsp,%rbp
      e4:       41 57                   push   %r15
      e6:       41 56                   push   %r14
      e8:       41 55                   push   %r13
      ea:       41 54                   push   %r12
      ec:       53                      push   %rbx
      ed:       48 81 ec 08 02 00 00    sub    $0x208,%rsp
      f4:       e8 00 00 00 00          callq  f9 <init_module+0x19>
      f9:       c6 04 25 00 00 00 00    movb   $0x12,0x0 
      ......................................................
    1232:       e9 1b ef ff ff          jmpq   152 <init_module+0x72>
    1237:       90                      nop

函数helloinit指令从代码段偏移e0开始,IP显示执行到函数hello_init从头开始偏移0x19了,就是0xe0+0x19=0xf9,就是movb $0x12,0x0,对的,给0地址内存写了,IP中0x1158是函数hello_init的指令总长,0xe0+0x1158=0x1238,没问题。

总结

计算机底层就是CPU和指令了,再高一点就是操作系统,编程语言和编译器了,深刻理解了这几样才能自称懂计算机。做安全和逆向工程的人真牛逼,做安全的天天分析指令和操作系统内存管理,不断计算计算,在内存放指令,撞概率居然在guest里把host里数据读取了,逆向工程的人更牛了,看指令就把软件破解了,大大的佩服,做底层开发指令就是门槛。

参考文献

解压vmlinuz和解压initrd(initramfs) - 笑遍世界

Extracting Windows VM crash dumps

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • windows
  • linux
  • 总结
  • 参考文献
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档