前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Postgresql进程卡住无法退出原因和解决方法

Postgresql进程卡住无法退出原因和解决方法

作者头像
mingjie
发布2022-09-23 10:21:21
1.8K0
发布2022-09-23 10:21:21
举报

前言

Postgresql进程卡住无法退出怎么办?例如以下场景:

  • kill 进程:无效
  • kill -2/-12 进程:无效
  • pg_terminate_backend(pid):无效

1 模拟卡住

代码语言:javascript
复制
-- pg14下测试通过

cd `pg_config --libdir`/postgresql

cat << EOF > loop.c
#include "postgres.h"
#include "fmgr.h"
#include <unistd.h>
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(loop);
Datum loop(PG_FUNCTION_ARGS)
{
    /* an endless loop */
    while(1) 
    {
        // 注意这里没有加:CHECK_FOR_INTERRUPTS();
        sleep(2); 
    }
}
EOF

gcc -I /data01/bin/pg9000/include/postgresql/server -fPIC -shared -o loop.so loop.c

-- psql
CREATE FUNCTION loop() RETURNS void LANGUAGE c AS 'loop';

-- 卡住
postgres=# 
postgres=# select loop();

2 尝试解决失败的方法

psql执行ctrl+c失败

代码语言:javascript
复制
postgres=# select loop();
^CCancel request sent
^CCancel request sent

kill 失败/kill -2失败/kill -12失败

代码语言:javascript
复制
$ kill 11699
$ kill -2 11699
$ kill -12 11699

pg_terminate_backend失败

代码语言:javascript
复制
postgres=# select pg_cancel_backend(11699);
 pg_cancel_backend 
-------------------
 t
(1 row)

postgres=# select pg_terminate_backend(11699);
 pg_terminate_backend 
----------------------
 t
(1 row)

3 进程在干什么

stracp -p

代码语言:javascript
复制
$ strace -p 11699
Process 11699 attached
restart_syscall(<... resuming interrupted call ...>) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x7fbbe4fcc630}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0
nanosleep({2, 0}, 
0x7fff60c0bb80)       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x7fbbe4fcc630}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff60c0bb80)       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0

gstack

代码语言:javascript
复制
$ gstack 11699
#0  0x00007fbbe45a68d0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x00007fbbe45a6784 in sleep () from /lib64/libc.so.6
#2  0x00007fbbe585e145 in loop () from /data01/bin/pg9000/lib/postgresql/loop.so
#3  0x0000000000728b0f in ExecInterpExpr (state=0x2bf0100, econtext=0x2befe00, isnull=0x7fff60c0c037) at execExprInterp.c:725
#4  0x000000000072a965 in ExecInterpExprStillValid (state=0x2bf0100, econtext=0x2befe00, isNull=0x7fff60c0c037) at execExprInterp.c:1824
#5  0x000000000077eec1 in ExecEvalExprSwitchContext (state=0x2bf0100, econtext=0x2befe00, isNull=0x7fff60c0c037) at ../../../src/include/executor/executor.h:339
#6  0x000000000077ef2a in ExecProject (projInfo=0x2bf00f8) at ../../../src/include/executor/executor.h:373
#7  0x000000000077f118 in ExecResult (pstate=0x2befce8) at nodeResult.c:136
#8  0x000000000073eefa in ExecProcNodeFirst (node=0x2befce8) at execProcnode.c:463
#9  0x00000000007338aa in ExecProcNode (node=0x2befce8) at ../../../src/include/executor/executor.h:257
#10 0x0000000000736130 in ExecutePlan (estate=0x2befab0, planstate=0x2befce8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x2bee790, execute_once=true) at execMain.c:1551
#11 0x0000000000733f15 in standard_ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361
#12 0x00007fbbe589ac5e in pgss_ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at pg_stat_statements.c:1003
#13 0x0000000000733d27 in ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:303
#14 0x000000000097cd6d in PortalRunSelect (portal=0x2ba3f90, forward=true, count=0, dest=0x2bee790) at pquery.c:921
#15 0x000000000097ca2c in PortalRun (portal=0x2ba3f90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2bee790, altdest=0x2bee790, qc=0x7fff60c0c4e0) at pquery.c:765
#16 0x00000000009766d3 in exec_simple_query (query_string=0x2b064f0 "select loop();") at postgres.c:1213
#17 0x000000000097abf1 in PostgresMain (argc=1, argv=0x7fff60c0c770, dbname=0x2b32c00 "postgres", username=0x2b32bd8 "mingjiegao") at postgres.c:4494
#18 0x00000000008b6de6 in BackendRun (port=0x2b2b720) at postmaster.c:4530
#19 0x00000000008b6765 in BackendStartup (port=0x2b2b720) at postmaster.c:4252
#20 0x00000000008b2bdd in ServerLoop () at postmaster.c:1745
#21 0x00000000008b24af in PostmasterMain (argc=1, argv=0x2b000e0) at postmaster.c:1417
#22 0x00000000007b4d2b in main (argc=1, argv=0x2b000e0) at main.c:209

4 原因&解决方案

4.1 原因

原因是PG当前堆栈没有进入信号相应处理函数,一般就是进入死等堆栈了:

  • 一般能cacnel的堆栈:超时后检查中断,然后继续sleep
代码语言:javascript
复制
while
  sleep(timeout)
  CHECK_FOR_INTERRUPTS();
  • 死等堆栈
代码语言:javascript
复制
while
  sleep()

4.2 解决方案

注意:千万不要kill -9,SIGKILL没有信号处理函数,OS会直接停掉进程;PG父进程发现子进程异常退出,会停掉所有进程,释放共享内存,在重新申请共享内存,拉起所有进程。效果就等于异常重启,启动时肯定会需要时间redo,可能造成几分钟的停止服务。(除非后果可以接受,否则不要kill -9)

执行kill -2或kill -12后或者执行pg_terminate_backend后,主动调用信号处理函数,让PG正常退出。

代码语言:javascript
复制
$ gdb attach 11699
(gdb) p ProcessInterrupts()
[Inferior 1 (process 11699) exited with code 01]
The program being debugged exited while in a function called from GDB.
Evaluation of the expression containing the function
(ProcessInterrupts) will be abandoned.
(gdb) q

服务端效果

代码语言:javascript
复制
postgres=# 
postgres=# select loop();
^CCancel request sent
^CCancel request sent

(GDB调用后)

FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# \q

服务端日志:server没有其他报错、server没有重启

代码语言:javascript
复制
11699 [local] mingjiegao postgres 2022-08-19 09:41:43 UTC 42723STATEMENT:  CREATE FUNCTION loop() RETURNS void LANGUAGE c AS 'loop';
11699 [local] mingjiegao postgres 2022-08-19 09:42:24 UTC 00000LOG:  statement: select loop();
13083 [local] mingjiegao postgres 2022-08-19 09:45:39 UTC 00000LOG:  statement: select pg_cancel_backend(11699);
13083 [local] mingjiegao postgres 2022-08-19 09:45:47 UTC 00000LOG:  statement: select pg_terminate_backend(11699);
11699 [local] mingjiegao postgres 2022-08-19 09:52:44 UTC 57P01FATAL:  terminating connection due to administrator command
11699 [local] mingjiegao postgres 2022-08-19 09:52:44 UTC 57P01STATEMENT:  select loop();
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2022-08-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 前言
  • 1 模拟卡住
  • 2 尝试解决失败的方法
    • psql执行ctrl+c失败
      • kill 失败/kill -2失败/kill -12失败
        • pg_terminate_backend失败
        • 3 进程在干什么
          • stracp -p
            • gstack
            • 4 原因&解决方案
              • 4.1 原因
                • 4.2 解决方案
                领券
                问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档