业务一切正常,突然收到一堆告警,发现全是
java.util.concurrent.RejectedExecutionException
异常报错。
具体看了下代码,里面的执行逻辑也不难,没有外部依赖都是内存多线程cpu类型计算的逻辑。
下面是报错的线程池状态变化,由上到下,按照时间增长,最后达到饱和。
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 99, active threads = 75, queued tasks = 5000, completed tasks = 1595259]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 101, active threads = 74, queued tasks = 4999, completed tasks = 1595298]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 90, active threads = 70, queued tasks = 4998, completed tasks = 1595354]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 93, active threads = 75, queued tasks = 5000, completed tasks = 1595378]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 85, active threads = 72, queued tasks = 5000, completed tasks = 1595443]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 83, active threads = 70, queued tasks = 5000, completed tasks = 1595447]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 90, active threads = 81, queued tasks = 5000, completed tasks = 1595476]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 85, active threads = 74, queued tasks = 4994, completed tasks = 1595574]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 91, active threads = 80, queued tasks = 4999, completed tasks = 1595678]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 95, active threads = 85, queued tasks = 5000, completed tasks = 1595696]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 95, active threads = 87, queued tasks = 5000, completed tasks = 1595721]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 93, active threads = 85, queued tasks = 4999, completed tasks = 1595727]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 94, active threads = 85, queued tasks = 4998, completed tasks = 1595737]
...
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 109, active threads = 96, queued tasks = 4999, completed tasks = 1596150]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 113, active threads = 102, queued tasks = 4998, completed tasks = 1596191]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 109, queued tasks = 5000, completed tasks = 1596355]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 110, queued tasks = 5000, completed tasks = 1596517]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 119, active threads = 110, queued tasks = 4999, completed tasks = 1596519]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 113, active threads = 105, queued tasks = 5000, completed tasks = 1596553]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 122, active threads = 109, queued tasks = 4999, completed tasks = 1596602]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
java.util.concurrent.ThreadPoolExecutor@57b5ea0f[Running, pool size = 120, active threads = 120, queued tasks = 5000, completed tasks = 1597061]
因为cpu4核,所以coresize = 40,maxsize = 120,都正常。
private static BlockingQueue<Runnable> executorQueue = new LinkedBlockingQueue<>(5000);
private static ExecutorService executorService = new ThreadPoolExecutor(
Runtime.getRuntime().availableProcessors() * 10,
Runtime.getRuntime().availableProcessors() * 30,
60, TimeUnit.SECONDS, executorQueue);
通过对ThreadPoolExecutor类分析,引发java.util.concurrent.RejectedExecutionException主要有两种原因:
因为全机房、只有一个节点报错,而其他机器都正常,在这个一定保证正确的前提下。 1、看了线上的请求量、并没有突增,所以排除外部因素 2、看了代码逻辑,逻辑内部并没有打印报错日志,说明不是线程执行耗时导致后面的其他线程排队 3、排除线程池是否提前关闭。并没有,因为手动没有显示关闭,另外看日志也知道里面的线程数还在变化,所以不存在关闭的说法。
考虑是否是机器本身的原因,后面经过排查,看到那个点线上的cpu使用率突然升高、系统负载突然飙升,网卡流出、流入报文数目、tcp连接数也突然为0。具体干了什么,已经让运营去看了。至少说明我们代码没有问题。
如下图,可以感受下:
线上第一时间出现这个问题,没想2秒,看了下报错代码逻辑,然后重启了这个节点实例(当时重启完后,观察又没问题了,带着疑惑然后才去排查)