前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >crash浅析tasklist_lock与进程释放

crash浅析tasklist_lock与进程释放

原创
作者头像
johnazhang
修改2021-07-05 14:33:47
1.4K0
修改2021-07-05 14:33:47
举报
文章被收录于专栏:Linux问题笔记

本文通过crash分析一个hang在tasklist_lock上的soft lockup,透过表象,来分析内核在有大量进程释放时的表现

栈是在tasklist_read_lock上

crash> bt

PID: 27250 TASK: ffff910d0c4c3f40 CPU: 4 COMMAND: "ZMQbg/1"

#0 [ffff91339f303d38] machine_kexec at ffffffffb8a60b2a

#1 [ffff91339f303d98] __crash_kexec at ffffffffb8b13402

#2 [ffff91339f303e68] panic at ffffffffb9107a75

#3 [ffff91339f303ee8] watchdog_timer_fn at ffffffffb8b3f5f1

#4 [ffff91339f303f20] __hrtimer_run_queues at ffffffffb8abeff6

#5 [ffff91339f303f78] hrtimer_interrupt at ffffffffb8abf58f

#6 [ffff91339f303fc0] local_apic_timer_interrupt at ffffffffb8a5782b

#7 [ffff91339f303fd8] smp_apic_timer_interrupt at ffffffffb91240a3

#8 [ffff91339f303ff0] apic_timer_interrupt at ffffffffb91207f2

--- <IRQ stack> ---

#9 [ffff9134e2a0bda8] apic_timer_interrupt at ffffffffb91207f2

[exception RIP: tasklist_read_lock+34]

RIP: ffffffffb8a8dcd2 RSP: ffff9134e2a0be50 RFLAGS: 00000206

RAX: 0000000000000050 RBX: ffffffffb8abe870 RCX: ffff910d8eb98000

RDX: ffff910d0ce76c00 RSI: 0000000000000246 RDI: 0000000000000246

RBP: ffff9134e2a0be50 R8: ffff910d8eb98028 R9: a000000000000000

R10: b3a8000000000000 R11: 00011a69df917675 R12: 000ba4a5b60d3a4b

R13: 0000000000000000 R14: 0000000000000000 R15: ffff9134e2a0bda8

ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018

#10 [ffff9134e2a0be58] do_wait at ffffffffb8a9703b

#11 [ffff9134e2a0bec0] sys_wait4 at ffffffffb8a98290

#12 [ffff9134e2a0bf50] system_call_fastpath at ffffffffb911f7d5

RIP: 00007f11052ef199 RSP: 00007ffffee17100 RFLAGS: 00000202

RAX: 000000000000003d RBX: 00007f11056770d8 RCX: 0000000000000000

RDX: 0000000000000001 RSI: 00007ffffee16f94 RDI: 0000000000007895

RBP: 0000558346cefd70 R8: 0000000000000000 R9: 0000000000000001

R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffffee16f94

R13: 0000000000007895 R14: 0000000000000001 R15: 000055834721d068

ORIG_RAX: 000000000000003d CS: 0033 SS: 002b

找到对应的代码,是在等tasklist_waiters

看到tasklist_waiters是80,所以会导致进入cpu_relax太久,进而触发soft lockup

crash> p tasklist_waiters

tasklist_waiters = $1 = {

counter = 80

}

看到大量进程都是21845起来的(由于涉及业务进程相关信息,不放截图了)

栈都在释放proc下的dentry上重复释放

crash> bt 6840

PID: 6840 TASK: ffff910d83a1bf40 CPU: 17 COMMAND: "xxx"

#0 [ffff910d03933b18] __schedule at ffffffffb91128d4

#1 [ffff910d03933ba8] __cond_resched at ffffffffb8acb9e6

#2 [ffff910d03933bc0] _cond_resched at ffffffffb91131ea

#3 [ffff910d03933bd0] dput at ffffffffb8c33261

#4 [ffff910d03933c00] proc_flush_task at ffffffffb8c95f24

#5 [ffff910d03933c78] release_task at ffffffffb8a95fff

#6 [ffff910d03933ce0] do_exit at ffffffffb8a97b0c

#7 [ffff910d03933d78] do_group_exit at ffffffffb8a97fcf

#8 [ffff910d03933da8] get_signal_to_deliver at ffffffffb8aa887e

#9 [ffff910d03933e40] do_signal at ffffffffb8a2a527

#10 [ffff910d03933f30] do_notify_resume at ffffffffb8a2ac22

#11 [ffff910d03933f50] int_signal at ffffffffb911fb1d

RIP: 00007f23194019f5 RSP: 00007f1afeff5720 RFLAGS: 00000246

RAX: fffffffffffffe00 RBX: 00007f1afeff57c0 RCX: ffffffffffffffff

RDX: 0000000000000001 RSI: 0000000000000080 RDI: 00007f1afeff57ec

RBP: 00007f1afeff57e8 R8: 00007f1afeff5700 R9: 0000000000000000

R10: 0000000000000000 R11: 0000000000000246 R12: 00007f115ef4e0b0

R13: 00007f115ef4e0a8 R14: 00007f115ef4e0a0 R15: 00007f115ef4e088

ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b

crash> bt 6844

PID: 6844 TASK: ffff910d83a1eeb0 CPU: 58 COMMAND: "xxx"

#0 [ffff910d08babb18] __schedule at ffffffffb91128d4

#1 [ffff910d08babba8] __cond_resched at ffffffffb8acb9e6

#2 [ffff910d08babbc0] _cond_resched at ffffffffb91131ea

#3 [ffff910d08babbd0] dput at ffffffffb8c33261

#4 [ffff910d08babc00] proc_flush_task at ffffffffb8c95f24

#5 [ffff910d08babc78] release_task at ffffffffb8a95fff

#6 [ffff910d08babce0] do_exit at ffffffffb8a97b0c

#7 [ffff910d08babd78] do_group_exit at ffffffffb8a97fcf

#8 [ffff910d08babda8] get_signal_to_deliver at ffffffffb8aa887e

#9 [ffff910d08babe40] do_signal at ffffffffb8a2a527

#10 [ffff910d08babf30] do_notify_resume at ffffffffb8a2ac22

#11 [ffff910d08babf50] int_signal at ffffffffb911fb1d

RIP: 00007f23194019f5 RSP: 00007f10148f5720 RFLAGS: 00000246

RAX: fffffffffffffe00 RBX: 00007f10148f57c0 RCX: ffffffffffffffff

RDX: 0000000000000001 RSI: 0000000000000080 RDI: 00007f10148f57ec

RBP: 00007f10148f57e8 R8: 00007f10148f5700 R9: 0000000000000000

R10: 0000000000000000 R11: 0000000000000246 R12: 00007f102fbfb530

R13: 00007f102fbfb528 R14: 00007f102fbfb520 R15: 00007f102fbfb508

ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b

而21845在do_wait,这里即使不分析代码,直觉也会告诉我们,children太多,根本等不过来

crash> bt 21845

PID: 21845 TASK: ffff910d3eef1fa0 CPU: 4 COMMAND: "sh"

#0 [ffff910d039dbdb8] __schedule at ffffffffb91128d4

#1 [ffff910d039dbe48] schedule at ffffffffb9112f49

#2 [ffff910d039dbe58] do_wait at ffffffffb8a97176

#3 [ffff910d039dbec0] sys_wait4 at ffffffffb8a98290

#4 [ffff910d039dbf50] system_call_fastpath at ffffffffb911f7d5

RIP: 00007f12dad4f41c RSP: 00007ffecd89efe0 RFLAGS: 00010246

RAX: 000000000000003d RBX: 0000000000000000 RCX: 00007f12dacc05e0

RDX: 0000000000000000 RSI: 00007ffecd89ee50 RDI: ffffffffffffffff

RBP: 00000000012bdf90 R8: 00000000012bdf90 R9: 0000000000000000

R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000

R13: 0000000000000001 R14: 00000000012bfe60 R15: 0000000000000000

ORIG_RAX: 000000000000003d CS: 0033 SS: 002b

tasklist_lock的情况一样是不容乐观:

crash> p tasklist_lock

tasklist_lock = $1 = {

raw_lock = {

cnts = {

counter = 255

},

lock = {

val = {

counter = 34144513

}

}

}

}

为什么21845会起这么多任务,需要相关业务自查

到这里对问题的定位已经完成了

后面分析一下起的这些线程的表现来解析tasklist_lock和任务释放:

先看看这个栈,从他的状态可以得知他已经进入了do_exit(), 但是在dput proc里的dentry里被调度走了?

crash> bt 6840

PID: 6840 TASK: ffff910d83a1bf40 CPU: 17 COMMAND: "xxx"

#0 [ffff910d03933b18] __schedule at ffffffffb91128d4

#1 [ffff910d03933ba8] __cond_resched at ffffffffb8acb9e6

#2 [ffff910d03933bc0] _cond_resched at ffffffffb91131ea

#3 [ffff910d03933bd0] dput at ffffffffb8c33261

#4 [ffff910d03933c00] proc_flush_task at ffffffffb8c95f24

#5 [ffff910d03933c78] release_task at ffffffffb8a95fff

#6 [ffff910d03933ce0] do_exit at ffffffffb8a97b0c

#7 [ffff910d03933d78] do_group_exit at ffffffffb8a97fcf

#8 [ffff910d03933da8] get_signal_to_deliver at ffffffffb8aa887e

#9 [ffff910d03933e40] do_signal at ffffffffb8a2a527

#10 [ffff910d03933f30] do_notify_resume at ffffffffb8a2ac22

#11 [ffff910d03933f50] int_signal at ffffffffb911fb1d

RIP: 00007f23194019f5 RSP: 00007f1afeff5720 RFLAGS: 00000246

RAX: fffffffffffffe00 RBX: 00007f1afeff57c0 RCX: ffffffffffffffff

RDX: 0000000000000001 RSI: 0000000000000080 RDI: 00007f1afeff57ec

RBP: 00007f1afeff57e8 R8: 00007f1afeff5700 R9: 0000000000000000

R10: 0000000000000000 R11: 0000000000000246 R12: 00007f115ef4e0b0

R13: 00007f115ef4e0a8 R14: 00007f115ef4e0a0 R15: 00007f115ef4e088

ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b

找下对应代码是哪里?

/usr/src/debug/kernel-3.10.0-862.el7/linux-3.10.0-862.el7.x86_64/fs/proc/base.c: 3038

0xffffffffb8c95f1c <proc_flush_task+364>: mov %r14,%rdi

0xffffffffb8c95f1f <proc_flush_task+367>: callq 0xffffffffb8c33240 <dput>

/usr/src/debug/kernel-3.10.0-862.el7/linux-3.10.0-862.el7.x86_64/fs/proc/base.c: 3077

0xffffffffb8c95f24 <proc_flush_task+372>: mov -0x58(%rbp),%rax

可以看到,是dput(leader)没成功?

从栈上找一下dentry,看看leader是什么

crash> dis dput

0xffffffffb8c33240 <dput>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]

0xffffffffb8c33245 <dput+5>: push %rbp

0xffffffffb8c33246 <dput+6>: test %rdi,%rdi

0xffffffffb8c33249 <dput+9>: mov %rsp,%rbp

0xffffffffb8c3324c <dput+12>: push %r14

0xffffffffb8c3324e <dput+14>: push %r13

0xffffffffb8c33250 <dput+16>: push %r12

0xffffffffb8c33252 <dput+18>: push %rbx

0xffffffffb8c33253 <dput+19>: mov %rdi,%rbx

#3 [ffff910d03933bd0] dput at ffffffffb8c33261

ffff910d03933bd8: 0000000000000000 ffff910d03933c33

ffff910d03933be8: 0000000000001ab8 ffff91346198efc0

ffff910d03933bf8: ffff910d03933c70 ffffffffb8c95f24

crash> dentry ffff91346198efc0

struct dentry {

d_flags = 16777356,

d_seq = {

sequence = 2

},

d_hash = {

next = 0x0,

pprev = 0xffffb09559b1ac20

},

d_parent = 0xffff910d7f810180,

d_name = {

{

{

hash = 892875108,

len = 5

},

hash_len = 22367711588

},

name = 0xffff91346198eff8 "21852"

crash> inode.i_nlink,__i_nlink 0xffff91351dd6b380

i_nlink = 9

__i_nlink = 9

可以看到,他的leader是21852

至于为什么要释放leader,源码的注释里就给出了答案,是tgid下的dentry:

看下leader 21852的栈:

crash> bt 21852

PID: 21852 TASK: ffff91347f28dee0 CPU: 23 COMMAND: "xxx"

#0 [ffff910d0767bbe8] __schedule at ffffffffb91128d4

#1 [ffff910d0767bc70] __cond_resched at ffffffffb8acb9e6

#2 [ffff910d0767bc88] _cond_resched at ffffffffb91131ea

#3 [ffff910d0767bc98] down_read at ffffffffb9112062

#4 [ffff910d0767bcb0] exit_signals at ffffffffb8aa8cc5

#5 [ffff910d0767bce0] do_exit at ffffffffb8a975d4

#6 [ffff910d0767bd78] do_group_exit at ffffffffb8a97fcf

#7 [ffff910d0767bda8] get_signal_to_deliver at ffffffffb8aa887e

#8 [ffff910d0767be40] do_signal at ffffffffb8a2a527

#9 [ffff910d0767bf30] do_notify_resume at ffffffffb8a2ac22

#10 [ffff910d0767bf50] int_signal at ffffffffb911fb1d

RIP: 00007f23179d4337 RSP: 00007ffd18584c68 RFLAGS: 00000206

RAX: 0000000000000000 RBX: 00007f1e942feec8 RCX: ffffffffffffffff

RDX: 0000000000000006 RSI: 000000000000555c RDI: 000000000000555c

RBP: 00007f2317d66868 R8: 00007f2324ea43d0 R9: 00007f2324ea3ac0

R10: 0000000000000008 R11: 0000000000000206 R12: 00007f1e942fef00

R13: 00007ffd18584e20 R14: 00007ffd18585108 R15: 00007ffd18585107

ORIG_RAX: 00000000000000ea CS: 0033 SS: 002b

21582已经到exit_signals了,但是parent 在sys_wait太多所以处理不过来了?

先看看这个锁是什么东西

0xffffffffb8aa8c9c <exit_signals+12>: mov %rdi,%rbx

0xffffffffb8aa8c9f <exit_signals+15>: sub $0x10,%rsp

0xffffffffb8aa8ca3 <exit_signals+19>: mov %gs:0x28,%rax

0xffffffffb8aa8cac <exit_signals+28>: mov %rax,-0x18(%rbp)

0xffffffffb8aa8cb0 <exit_signals+32>: xor %eax,%eax

0xffffffffb8aa8cb2 <exit_signals+34>: mov 0x768(%rdi),%rax

0xffffffffb8aa8cb9 <exit_signals+41>: lea 0x3f0(%rax),%rdi

0xffffffffb8aa8cc0 <exit_signals+48>: callq 0xffffffffb9112050 <down_read>

0xffffffffb9112050 <down_read>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]

0xffffffffb9112055 <down_read+5>: push %rbp

0xffffffffb9112056 <down_read+6>: mov %rsp,%rbp

0xffffffffb9112059 <down_read+9>: push %rbx

#3 [ffff910d0767bc98] down_read at ffffffffb9112062

ffff910d0767bca0: ffff91347f28dee0 ffff910d0767bcd8

ffff910d0767bcb0: ffffffffb8aa8cc5

crash> eval 0xffff91347f28dee0 + 0x768

hexadecimal: ffff91347f28e648

decimal: 18446622253390554696 (-121820318996920)

octal: 1777774423217712163110

binary: 1111111111111111100100010011010001111111001010001110011001001000

crash> rd ffff91347f28e648

ffff91347f28e648: ffff9134f8bb1680 ....4...

crash> eval 0xffff9134f8bb1680 + 0x3f0

hexadecimal: ffff9134f8bb1a70

decimal: 18446622255430179440 (-121818279372176)

octal: 1777774423237056615160

binary: 1111111111111111100100010011010011111000101110110001101001110000

crash> rw_semaphore ffff9134f8bb1a70

struct rw_semaphore {

{

count = {

counter = 0

},

__UNIQUE_ID_rh_kabi_hide4 = {

count = 0

},

{<No data fields>}

},

wait_lock = {

raw_lock = {

val = {

counter = 0

}

}

},

osq = {

tail = {

counter = 0

}

},

wait_list = {

next = 0xffff9134f8bb1a80

},

owner = 0x1

}

看看有多少人在等这个锁

crash> list -l rwsem_waiter.list -s rwsem_waiter -h 0xffff9134f8bb1a80

ffff9134f8bb1a80

struct rwsem_waiter {

list = {

next = 0xffff9134f8bb1a80,

prev = 0x1

},

task = 0x0,

type = RWSEM_WAITING_FOR_READ

}

并没有人在等这个rwsem!

crash> task -R signal 21852

PID: 21852 TASK: ffff91347f28dee0 CPU: 23 COMMAND: "xxx"

signal = 0xffff9134f8bb1680,

crash> signal_struct.group_rwsem 0xffff9134f8bb1680

group_rwsem = {

{

count = {

counter = 0

},

__UNIQUE_ID_rh_kabi_hide1 = {

count = 0

},

{<No data fields>}

},

wait_lock = {

raw_lock = {

val = {

counter = 0

}

}

},

osq = {

tail = {

counter = 0

}

},

wait_list = {

next = 0xffff9134f8bb1a80

},

owner = 0x1

}

确认和task_struct里也是一个锁

从代码里,我们知道down_read是might_sleep的:

所以他这里只是单纯的被调度走了,因为进程太多。

同时注意到,exit_signals函数后面也是要拿tasklist_lock的,他必拿不到,所以即使调度回来也没用,还是会导致前面的6840进程释放不了dentry。

所以进程太多,都在退出,且这些进程是有关系的(在一个group里)情况下,释放的时候是有依赖的,会存在忙不过而无法成功释放资源(释放特别慢)的现象。

最后分析一下他们共同的进程parent 21845在sys_wait4的问题:

crash> bt 21845

PID: 21845 TASK: ffff910d3eef1fa0 CPU: 4 COMMAND: "sh"

#0 [ffff910d039dbdb8] __schedule at ffffffffb91128d4

#1 [ffff910d039dbe48] schedule at ffffffffb9112f49

#2 [ffff910d039dbe58] do_wait at ffffffffb8a97176

#3 [ffff910d039dbec0] sys_wait4 at ffffffffb8a98290

#4 [ffff910d039dbf50] system_call_fastpath at ffffffffb911f7d5

RIP: 00007f12dad4f41c RSP: 00007ffecd89efe0 RFLAGS: 00010246

RAX: 000000000000003d RBX: 0000000000000000 RCX: 00007f12dacc05e0

RDX: 0000000000000000 RSI: 00007ffecd89ee50 RDI: ffffffffffffffff

RBP: 00000000012bdf90 R8: 00000000012bdf90 R9: 0000000000000000

R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000

R13: 0000000000000001 R14: 00000000012bfe60 R15: 0000000000000000

ORIG_RAX: 000000000000003d CS: 0033 SS: 002b

/usr/src/debug/kernel-3.10.0-862.el7/linux-3.10.0-862.el7.x86_64/kernel/exit.c: 1569

0xffffffffb8a97171 <do_wait+497>: callq 0xffffffffb9112f20 <schedule>

/usr/src/debug/kernel-3.10.0-862.el7/linux-3.10.0-862.el7.x86_64/kernel/exit.c: 1570

0xffffffffb8a97176 <do_wait+502>: jmpq 0xffffffffb8a96ff3 <do_wait+115>

逻辑也比较简单,没事干的时候就调度走,回来就是repeat逻辑回收children,但是由于回收的时候也需要tasklist_lock,所以并不能回收成功。

所以到这里总结一下,说白了就是,进程太多了,需要回收的也太多了,但是得排队等tasklist_lock锁,导致回收不过来了。然后这时候如果有新的进程起来,情况就更加恶化,甚至如果打到内存水位线,导致都去回收内存了,那就是死局了。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档