原文链接:https://www.jianshu.com/u/21add3dce532
测试代码Defunct.java
import java.util.concurrent.TimeUnit;
public class Defunct {
public static void main(String[] args) {
while (true) {
System.out.println("test defunct");
try {
TimeUnit.SECONDS.sleep(30);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
启动脚本start.sh
#!/bin/bash
nohup java -cp defunct.jar Defunct &
echo "$!"
echo "$!" > pid
启动脚本start_tail.sh 使用了tail
#!/bin/bash
nohup java -cp defunct.jar Defunct &
echo "$!"
echo "$!" > pid
tail -f nohup.out
关服脚本stop.sh 这里使用kill关服
#!/bin/bash
pid=`cat pid`
echo $pid
kill $pid
分别用两个脚本测试,得出下面几个结论:
结论:竟然无法在游戏服务器复现
首先从僵尸进程的产生原因入手,猜测是否是sh这个父进程没有调用waitpid去回收java子进程。
查询网上类似的tomcat tail -f问题,思考当初1个多月以前的情形,其中有一个很重要的当初情形是shutdown的时候ctrl卡住了。灵光一现,难道是当初操作失误了,没有按下ctrl+c而是按下了ctrl+z。
启动start_tail.sh 然后ctrl+z
[xx@achilles deploy_defunct]$ sh start_tail.sh
3974
nohup: appending output to `nohup.out'
defunct2
^Z
[2]+ Stopped sh start_tail.sh
启动stop.sh 发现进程(3974)无法被stop
[xx@achilles deploy_defunct]$ sh stop.sh
3974
[xx@achilles deploy_defunct]$ jps
4146 Jps
3974 Defunct2
12790 SpursLauncher
3726 SpursLauncher
使用kill -9 尝试杀死进程 此时发现进程已经是defunct了
[xx@achilles deploy_defunct]$ kill -9 3974
[xx@achilles deploy_defunct]$ jps
3974 Defunct2
12790 SpursLauncher
4314 Jps
3726 SpursLauncher
[xx@achilles deploy_defunct]$ ps -el | grep 3974
0 Z 500 3974 3973 0 80 0 - 0 exit pts/4 00:00:00 java <defunct>
此时只要使用fg命令从后台调到前台然后按下ctrl+c 则僵尸进程自动消失
[xx@achilles deploy_defunct]$ ps -el | grep 3974
0 Z 500 3974 3973 0 80 0 - 0 exit pts/4 00:00:00 java <defunct>
[xx@achilles deploy_defunct]$ fg
sh start_tail.sh
^C
[xx@achilles deploy_defunct]$ ps -el | grep 3974
启动脚本(有tail) 等待一段时间(将所有服务器全部开启) 并ctrl+z
[xx@achilles spurs-2]$ sh start.sh
......
^Z
[1]+ Stopped sh start.sh
此时执行shutdown.sh 发现没有任何反应(卡住) 无奈ctrl+c
[xx@achilles spurs-2]$ sh shutdown.sh
^C
[xx@achilles spurs-2]$ jps
9667 SpursLauncher
9796 Jps
[xx@achilles spurs-2]$ ll /proc/9667 | grep cwd
lrwxrwxrwx 1 xx xx 0 Dec 5 17:32 cwd -> /data/home/user00/xx/achilles/backend/spurs-2
[xx@achilles spurs-2]$ ps -el | grep 9667
0 T 500 9667 9666 7 80 0 - 1442848 signal pts/6 00:00:07 java
[xx@achilles spurs-2]$ ps -el | grep 9666
0 T 500 9666 8959 0 80 0 - 26521 signal pts/6 00:00:00 sh
0 T 500 9667 9666 7 80 0 - 1442848 signal pts/6 00:00:07 java
0 T 500 9669 9666 0 80 0 - 25241 signal pts/6 00:00:00 tail
此时执行jstack 也发现没有任何反应(卡住) 无奈ctrl+c
[xx@achilles spurs-2]$ jstack 9667
^C
此时执行kill -9 此时java进程已经变为了僵尸进程
[xx@achilles spurs-2]$ kill -9 9667
[xx@achilles spurs-2]$ ps -el | grep 9667
0 Z 500 9667 9666 1 80 0 - 0 exit pts/6 00:00:07 java <defunct>
此时用fg将暂停的脚本恢复 然后ctrl+c 则僵尸进程消失 顺利被回收
[xx@achilles spurs-2]$ fg
sh start.sh
^C
[xx@achilles spurs-2]$ ps -el | grep 9666
[xx@achilles spurs-2]$ ps -el | grep 9667