我们在JDK 17.0.1上使用ZGC。应用程序成功地运行,没有意外的GC暂停了一段时间,之后它进入了ZGC在完成GC时没有释放多少内存的状态。这将导致从那时起应用程序中的多个周期性分配暂停。
应用程序本身在大约35毫秒的周期间隔内进行大量分配。这大约是2GB/s分配时,它是在高峰负载。在一个典型的上午开始,大约有30分钟的负荷是低的,直到一个点的外部数据开始到达,从那时起负荷是高的。因此,当应用程序运行时,负载配置文件确实会发生变化。由于加载期间的可预测分配,我们使用ZCollectionInterval将集合设置为一个间隔。
即使在负载非常相似的不同机器之间,我们也不能始终如一地复制分配档。奇怪的是,如果应用程序在负载已经很高时重新启动,那么它通常不会遇到问题。
我们以前也玩过UseDynamicNumberOfGCThreads,但是它似乎产生了一些线程,即2。附加的图表使用SoftMaxHeapSize=1G。这有点咄咄逼人,但我们在没有设置SofMaxHeapSize的情况下确实看到了同样的问题。我们还注意到GC的并发标记阶段似乎会增加,从而导致这个问题。在JDK版本的发行说明中,我们也没有看到任何关于ZGC的注意事项,直到17.0.5。
以前有没有人遇到过类似的问题?我们不确定为什么会发生这种情况,或者在设置标志时,这是否是预期的行为。由于还没有设置很多问题,而且发生这种情况的可能性很小,我们希望有人能给出一些建议,看看这是否是一个已知的现有修复问题,还是我们设置的错误。
这些是相关的旗帜。
-XX:-OmitStackTraceInFastThrow
-XX:CICompilerCount=2
-XX:+DisableExplicitGC
-XX:ParallelGCThreads=4
-XX:ConcGCThreads=4
-Xmx8192m
-Xms8192m
-Xss8m
-Xlog:gc*=info:file=${LT_APP_LOG_PATH_FORMAT}.gc:time,level,tags:filecount=32,filesize=64M
-XX:+UseZGC
-XX:+AlwaysPreTouch
-XX:+UseLargePages
-XX:+UseTransparentHugePages
-XX:ZCollectionInterval=4
-Djdk.nio.maxCachedBufferSize=262144
-XX:+HeapDumpOnOutOfMemoryError
-XX:SoftMaxHeapSize=1G
[2022-11-24T09:21:12.378+0530][info][gc,phases ] GC(669) Concurrent Select Relocation Set 8.713ms
[2022-11-24T09:21:12.378+0530][info][gc,phases ] GC(669) Pause Relocate Start 0.008ms
[2022-11-24T09:21:12.420+0530][info][gc,phases ] GC(669) Concurrent Relocate 41.564ms
[2022-11-24T09:21:12.420+0530][info][gc,load ] GC(669) Load: 36.38/31.86/25.10
[2022-11-24T09:21:12.420+0530][info][gc,mmu ] GC(669) MMU: 2ms/99.1%, 5ms/99.6%, 10ms/99.8%, 20ms/99.9%, 50ms/99.9%, 100ms/100.0%
[2022-11-24T09:21:12.420+0530][info][gc,marking ] GC(669) Mark: 4 stripe(s), 2 proactive flush(es), 1 terminate flush(es), 0 completion(s), 0 continuation(s)
[2022-11-24T09:21:12.420+0530][info][gc,marking ] GC(669) Mark Stack Usage: 32M
[2022-11-24T09:21:12.420+0530][info][gc,nmethod ] GC(669) NMethods: 10353 registered, 328 unregistered
[2022-11-24T09:21:12.420+0530][info][gc,metaspace] GC(669) Metaspace: 44M used, 46M committed, 1072M reserved
[2022-11-24T09:21:12.420+0530][info][gc,ref ] GC(669) Soft: 2311 encountered, 98 discovered, 0 enqueued
[2022-11-24T09:21:12.420+0530][info][gc,ref ] GC(669) Weak: 2559 encountered, 652 discovered, 0 enqueued
[2022-11-24T09:21:12.420+0530][info][gc,ref ] GC(669) Final: 56 encountered, 7 discovered, 0 enqueued
[2022-11-24T09:21:12.420+0530][info][gc,ref ] GC(669) Phantom: 27425 encountered, 27313 discovered, 27030 enqueued
[2022-11-24T09:21:12.420+0530][info][gc,reloc ] GC(669) Small Pages: 5764 / 11528M, Empty: 4752M, Relocated: 58M, In-Place: 0
[2022-11-24T09:21:12.420+0530][info][gc,reloc ] GC(669) Medium Pages: 3 / 96M, Empty: 64M, Relocated: 0M, In-Place: 0
[2022-11-24T09:21:12.420+0530][info][gc,reloc ] GC(669) Large Pages: 0 / 0M, Empty: 0M, Relocated: 0M, In-Place: 0
[2022-11-24T09:21:12.420+0530][info][gc,reloc ] GC(669) Forwarding Usage: 29M
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Min Capacity: 16384M(100%)
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Max Capacity: 16384M(100%)
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Soft Max Capacity: 1024M(6%)
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Mark Start Mark End Relocate Start Relocate End High Low
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Capacity: 16384M (100%) 16384M (100%) 16384M (100%) 16384M (100%) 16384M (100%) 16384M (100%)
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Free: 4760M (29%) 0M (0%) 4788M (29%) 5392M (33%) 5394M (33%) 0M (0%)
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Used: 11624M (71%) 16384M (100%) 11596M (71%) 10992M (67%) 16384M (100%) 10990M (67%)
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Live: - 5860M (36%) 5860M (36%) 5860M (36%) - -
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Allocated: - 4760M (29%) 4788M (29%) 4858M (30%) - -
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Garbage: - 5763M (35%) 947M (6%) 273M (2%) - -
[2022-11-24T09:21:12.420+0530][info][gc,heap ] GC(669) Reclaimed: - - 4816M (29%) 5490M (34%) - -
[2022-11-24T09:21:12.420+0530][info][gc ] GC(669) Garbage Collection (Allocation Stall) 11624M(71%)->10992M(67%)
[2022-11-24T09:21:12.479+0530][info][gc,start ] GC(670) Garbage Collection (Allocation Stall)
[2022-11-24T09:21:12.479+0530][info][gc,ref ] GC(670) Clearing All SoftReferences
[2022-11-24T09:21:12.479+0530][info][gc,task ] GC(670) Using 4 workers
[2022-11-24T09:21:12.479+0530][info][gc,phases ] GC(670) Pause Mark Start 0.009ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] === Garbage Collection Statistics =======================================================================================================================
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Last 10s Last 10m Last 10h Total
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Avg / Max Avg / Max Avg / Max Avg / Max
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Collector: Garbage Collection Cycle 3410.686 / 3486.475 844.670 / 3486.475 464.952 / 3486.475 464.952 / 3486.475 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Contention: Mark Segment Reset Contention 7 / 15 3 / 30 1 / 30 1 / 30 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Contention: Mark SeqNum Reset Contention 0 / 1 0 / 3 0 / 3 0 / 3 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Critical: Allocation Stall 1 / 5 0 / 7 0 / 7 0 / 7 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Critical: Allocation Stall 677.602 / 900.858 303.367 / 900.858 303.367 / 900.858 303.367 / 900.858 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Critical: GC Locker Stall 0 / 0 0 / 0 0 / 1 0 / 1 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Critical: GC Locker Stall 0.000 / 0.000 0.000 / 0.000 0.033 / 0.058 0.033 / 0.058 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Critical: Relocation Stall 0 / 0 0 / 0 0 / 0 0 / 0 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Critical: Relocation Stall 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Allocation Rate 1479 / 1860 1003 / 2264 329 / 2264 329 / 2264 MB/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Out Of Memory 0 / 0 0 / 0 0 / 0 0 / 0 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Page Cache Flush 0 / 0 0 / 32 0 / 32 0 / 32 MB/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Page Cache Hit L1 198 / 827 177 / 901 59 / 901 59 / 901 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Page Cache Hit L2 541 / 930 316 / 1052 103 / 1052 103 / 1052 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Page Cache Hit L3 1 / 15 12 / 657 4 / 657 4 / 657 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Page Cache Miss 0 / 0 0 / 1 0 / 1 0 / 1 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Uncommit 0 / 0 0 / 0 0 / 0 0 / 0 MB/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Undo Object Allocation Failed 0 / 0 0 / 1 0 / 1 0 / 1 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Undo Object Allocation Succeeded 0 / 3 4 / 855 1 / 855 1 / 855 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Memory: Undo Page Allocation 0 / 3 0 / 4 0 / 4 0 / 4 ops/s
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Concurrent Mark 3356.120 / 3437.684 812.574 / 3437.684 445.012 / 3437.684 445.012 / 3437.684 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Concurrent Mark Continue 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Concurrent Mark Free 0.001 / 0.001 0.001 / 0.002 0.001 / 0.002 0.001 / 0.002 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Concurrent Process Non-Strong References 8.650 / 8.848 12.746 / 47.124 8.067 / 47.124 8.067 / 47.124 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Concurrent Relocate 36.550 / 41.564 14.436 / 55.995 8.357 / 55.995 8.357 / 55.995 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Concurrent Reset Relocation Set 0.067 / 0.080 0.023 / 0.080 0.013 / 0.080 0.013 / 0.080 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Concurrent Select Relocation Set 7.214 / 8.713 3.382 / 29.365 2.579 / 29.365 2.579 / 29.365 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Pause Mark End 0.011 / 0.011 0.011 / 0.018 0.009 / 0.018 0.009 / 0.018 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Pause Mark Start 0.008 / 0.009 0.008 / 0.016 0.008 / 0.016 0.008 / 0.016 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Phase: Pause Relocate Start 0.008 / 0.008 0.007 / 0.013 0.006 / 0.013 0.006 / 0.013 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Classes Purge 0.266 / 0.497 0.411 / 5.255 0.251 / 5.255 0.251 / 5.255 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Classes Unlink 5.018 / 5.358 9.216 / 28.725 5.991 / 28.725 5.991 / 28.725 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Mark 3352.212 / 3437.123 855.661 / 3437.123 454.341 / 3437.123 454.341 / 3437.123 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Mark Try Flush 0.030 / 0.055 0.760 / 42.368 0.442 / 42.368 0.442 / 42.368 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Mark Try Terminate 0.703 / 1.059 1.393 / 27.549 1.126 / 27.549 1.126 / 27.549 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent References Enqueue 0.003 / 0.004 0.003 / 0.013 0.002 / 0.013 0.002 / 0.013 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent References Process 3.055 / 3.524 1.183 / 9.454 0.655 / 9.454 0.655 / 9.454 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Roots ClassLoaderDataGraph 0.090 / 0.223 0.135 / 12.400 0.105 / 12.400 0.105 / 12.400 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Roots CodeCache 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Roots JavaThreads 0.411 / 1.162 0.520 / 12.391 0.407 / 12.391 0.407 / 12.391 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Roots OopStorageSet 0.018 / 0.038 0.051 / 12.043 0.030 / 12.043 0.030 / 12.043 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Concurrent Weak Roots OopStorageSet 0.259 / 0.263 0.449 / 4.890 0.331 / 4.890 0.331 / 4.890 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] Subphase: Pause Mark Try Complete 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 0.000 / 0.000 ms
[2022-11-24T09:21:13.180+0530][info][gc,stats ] System: Java Threads 28 / 28 26 / 29 21 / 29 21 / 29 threads
[2022-11-24T09:21:13.180+0530][info][gc,stats ] =========================================================================================================================================================
[2022-11-24T09:21:15.944+0530][info][gc,phases ] GC(670) Concurrent Mark 3465.071ms
[2022-11-24T09:21:15.944+0530][info][gc,phases ] GC(670) Pause Mark End 0.011ms
[2022-11-24T09:21:15.944+0530][info][gc,phases ] GC(670) Concurrent Mark Free 0.001ms
[2022-11-24T09:21:15.953+0530][info][gc,phases ] GC(670) Concurrent Process Non-Strong References 8.537ms
[2022-11-24T09:21:15.953+0530][info][gc,phases ] GC(670) Concurrent Reset Relocation Set 0.062ms
[2022-11-24T09:21:15.953+0530][info][gc ] Allocation Stall (lt-primary-pool-2) 704.524ms
[2022-11-24T09:21:15.953+0530][info][gc ] Allocation Stall (lt-primary-pool-0) 704.168ms
[2022-11-24T09:21:15.953+0530][info][gc ] Allocation Stall (lt-secondary-pool-0) 711.668ms
[2022-11-24T09:21:15.955+0530][info][gc ] Allocation Stall (lt-primary-pool-1) 707.109ms
[2022-11-24T09:21:15.955+0530][info][gc ] Allocation Stall (lt-secondary-pool-1) 713.196ms
[2022-11-24T09:21:15.957+0530][info][gc,phases ] GC(670) Concurrent Select Relocation Set 4.082ms
java version "17.0.1" 2021-10-19 LTS
Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)
在GC日志上用GCViewer创建的图形。
ZGC清理问题
当一切都很好的时候,没有问题分配,
发布于 2022-12-02 08:11:57
我们没有找到解决这些问题的理由,但是,我们确实在更仔细地审查这一问题方面取得了一些进展。
当从github检查OpenJDK的ZGC代码时。
它有代码来计算分配率。我们还启用了ZGC的调试日志记录。
-Xlog:gc*=debug:file=gc.log:time,level,tags
与下面最后一行类似的日志记录预测,在应用程序启动的较长时间内,分配率为0。我们怀疑这可能是这个问题的原因之一。然而,在查看GCGraphs时,它并没有完全解释为什么当正常函数中有足够的内存时,它会陷入某种恐慌状态。
我们设置-XX:ZAllocationSpikeTolerance=10
,以确保估计的内存使用量接近2000 be /s的正常分配速率。从那时起,ZGC的内存使用量预测似乎是正确的。到目前为止,我们已经在我们的服务上运行这个设置一段时间了。我们还删除了-XX:SoftMaxHeapSize=
,因为GC所需的频率似乎是由XX:ZAllocationSpikeTolerance
正确估计的。
示例调试日志记录的预测分配率。
[2022-12-02T12:45:29.825+0530][debug][gc,director ] Rule: Allocation Rate (Dynamic GC Workers), MaxAllocRate: 9624.6MB/s (+/-0.6%), Free: 3412MB, GCCPUTime: 0.136, GCDuration: 0.136s, TimeUntilOOM: 0.352s, TimeUntilGC: 0.116s, GCWorkers: 1 -> 1
[2022-12-02T12:45:29.825+0530][debug][gc,director ] Rule: High Usage, Free: 3412MB(83.3%)
[2022-12-02T12:45:29.825+0530][debug][gc,director ] Rule: Proactive, AcceptableGCInterval: 3.394s, TimeSinceLastGC: 0.473s, TimeUntilGC: 2.921s
[2022-12-02T12:45:29.925+0530][debug][gc,alloc ] Allocation Rate: 958.0MB/s, Predicted: 960.9MB/s, Avg: 954.4(+/-5.7)MB/s
[2022-12-02T12:45:29.925+0530][debug][gc,director ] Rule: Timer, Interval: 4.000s, TimeUntilGC: 3.427s
[2022-12-02T12:45:29.925+0530][debug][gc,director ] Select GC Workers (Normal), AvoidLongGCWorkers: 0.013, AvoidOOMGCWorkers: 0.393, LastGCWorkers: 1.000, GCWorkers: 0.393
https://stackoverflow.com/questions/74563986
复制相似问题