JVM的Finalization Delay引起的OOM

今天在压力测试环境某一个服务出现crash了,经过一番检查,终于发现是由于JVM的Finalization Delay引起的,这个问题比较特殊,这里记录一下。

这个服务是用Java写的,主要完成的功能是根据特定的指令文件生成mp4文件,用到的java库主要有javacv,这个库底层其实是使用JNI调用操作系统里安装的ffmpeg。

检查日志文件

首先检查日志文件,发现日志里出现了OOM的报错

java.lang.OutOfMemoryError: null
	at sun.misc.Unsafe.allocateMemory(Native Method) ~[na:1.7.0_79]
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127) ~[na:1.7.0_79]
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_79]
	at org.bytedeco.javacv.Frame.<init>(Frame.java:105) ~[javacv-1.1.jar:1.1]
	at org.bytedeco.javacv.Java2DFrameConverter.getFrame(Java2DFrameConverter.java:712) ~[javacv-1.1.jar:1.1]
	at org.bytedeco.javacv.Java2DFrameConverter.getFrame(Java2DFrameConverter.java:679) ~[javacv-1.1.jar:1.1]
	at org.bytedeco.javacv.Java2DFrameConverter.getFrame(Java2DFrameConverter.java:673) ~[javacv-1.1.jar:1.1]
	at org.bytedeco.javacv.Java2DFrameConverter.convert(Java2DFrameConverter.java:62) ~[javacv-1.1.jar:1.1]

很明显这里是申请DirectByteBuffer出现了OOM,所以应该是Direct Memory申请太多了。为了确认问题,将服务跑起来,使用jconsole看了下JVM的堆内存使用情况,发现堆内存使用一直都是比较稳定的,但使用top -p ${pid}查看进程占用的内存,发现RES字段的值一直是在增长的,而且增长得很快,不到半个小时值就从原来的500M增长到1.6G。

分析代码

接下来看一下关键代码

public void encodeFrame(BufferedImage image, long frameTime) {
    try {
        long t = frameTime * 1000L;
        if(t>recorder.getTimestamp()) {
            recorder.setTimestamp(t);
        }
        Frame frame = java2dConverter.convert(image);
        recorder.record(frame);
    } catch (FrameRecorder.Exception e) {
        log.error("JavaCVMp4Encoder encode frame error.", e);
    }
}

业务层是不停地调用encodeFrame将每一张图片编码到mp4文件里。

java2dConverter.convert(image);这句代码的实现里会申请一个DirectByteBuffer。如下面的代码。

public Frame(int width, int height, int depth, int channels) {
    int pixelSize = Math.abs(depth) / 8;
    this.imageWidth = width;
    this.imageHeight = height;
    this.imageDepth = depth;
    this.imageChannels = channels;
    this.imageStride = ((imageWidth * imageChannels * pixelSize + 7) & ~7) / pixelSize; // 8-byte aligned
    this.image = new Buffer[1];

    ByteBuffer buffer = ByteBuffer.allocateDirect(imageHeight * imageStride * pixelSize).order(ByteOrder.nativeOrder());
    switch (imageDepth) {
        case DEPTH_BYTE:
        case DEPTH_UBYTE:  image[0] = buffer;                  break;
        case DEPTH_SHORT:
        case DEPTH_USHORT: image[0] = buffer.asShortBuffer();  break;
        case DEPTH_INT:    image[0] = buffer.asIntBuffer();    break;
        case DEPTH_LONG:   image[0] = buffer.asLongBuffer();   break;
        case DEPTH_FLOAT:  image[0] = buffer.asFloatBuffer();  break;
        case DEPTH_DOUBLE: image[0] = buffer.asDoubleBuffer(); break;
        default: throw new UnsupportedOperationException("Unsupported depth value: " + imageDepth);
    }
}

这里ByteBuffer.allocateDirect方法申请的DirectByteBuffer并不是Java堆内存,而是直接在C堆上申请的。而DirectByteBuffer申请的C堆内存释放很特殊,并不是简单地由JVM GC完成的。

先看一下DirectByteBuffer的定义

class DirectByteBuffer extends MappedByteBuffer implements DirectBuffer {
  ...
  protected static final Unsafe unsafe = Bits.unsafe();
  ...
  private static class Deallocator
        implements Runnable
    {

        private static Unsafe unsafe = Unsafe.getUnsafe();

        private long address;
        private long size;
        private int capacity;

        private Deallocator(long address, long size, int capacity) {
            assert (address != 0);
            this.address = address;
            this.size = size;
            this.capacity = capacity;
        }

        public void run() {
            if (address == 0) {
                // Paranoia
                return;
            }
            unsafe.freeMemory(address);
            address = 0;
            Bits.unreserveMemory(size, capacity);
        }

    }

    private final Cleaner cleaner;
    ...

    DirectByteBuffer(int cap) {                   // package-private

        super(-1, 0, cap, cap);
        boolean pa = VM.isDirectMemoryPageAligned();
        int ps = Bits.pageSize();
        long size = Math.max(1L, (long)cap + (pa ? ps : 0));
        Bits.reserveMemory(size, cap);

        long base = 0;
        try {
            base = unsafe.allocateMemory(size);
        } catch (OutOfMemoryError x) {
            Bits.unreserveMemory(size, cap);
            throw x;
        }
        unsafe.setMemory(base, size, (byte) 0);
        if (pa && (base % ps != 0)) {
            // Round up to page boundary
            address = base + ps - (base & (ps - 1));
        } else {
            address = base;
        }
        cleaner = Cleaner.create(this, new Deallocator(base, size, cap));
        att = null;



    }
    ...
}

可以看到创建DirectByteBuffer对象时实际上使用unsafe.allocateMemory申请一块C堆内存的。DirectByteBuffer对象内部有一个Cleaner cleaner,看样子应该是这个东东负责对申请申请的C堆内存进行释放的。看一下Cleaner的定义:

public class Cleaner extends PhantomReference<Object> {
...
    private static final ReferenceQueue<Object> dummyQueue = new ReferenceQueue();
        private Cleaner(Object var1, Runnable var2) {
        super(var1, dummyQueue);
        this.thunk = var2;
    }

    public static Cleaner create(Object var0, Runnable var1) {
        return var1 == null?null:add(new Cleaner(var0, var1));
    }

    public void clean() {
        if(remove(this)) {
            try {
                this.thunk.run();
            } catch (final Throwable var2) {
                AccessController.doPrivileged(new PrivilegedAction() {
                    public Void run() {
                        if(System.err != null) {
                            (new Error("Cleaner terminated abnormally", var2)).printStackTrace();
                        }

                        System.exit(1);
                        return null;
                    }
                });
            }

        }
    }
...
}

原来Cleaner实际上是一个PhantomReference

PhantomReference虚引用主要用来跟踪对象被垃圾回收器回收的活动。虚引用与软引用和弱引用的一个区别在于:虚引用必须和引用队列 (ReferenceQueue)联合使用。当垃圾回收器准备回收一个对象时,如果发现它还有虚引用,就会在回收对象的内存之前,把这个虚引用加入到与之关联的引用队列中。

在这个场景里也就是说当JVM垃圾回收器准备回收某个DirectByteBuffer对象时,发现这个DirectByteBuffer对象有虚引用,就会将虚引用加入到与之关联的引用队列中。将虚引用加入到与之关联的引用队列中有什么作用?看一下Reference的实现代码

public abstract class Reference<T> {
  ...
  private T referent;
  ...

  static private class Lock { }
    private static Lock lock = new Lock();


    /* List of References waiting to be enqueued.  The collector adds
     * References to this list, while the Reference-handler thread removes
     * them.  This list is protected by the above lock object. The
     * list uses the discovered field to link its elements.
     */
    private static Reference<Object> pending = null;

    /* High-priority thread to enqueue pending References
     */
    private static class ReferenceHandler extends Thread {

        private static void ensureClassInitialized(Class<?> clazz) {
            try {
                Class.forName(clazz.getName(), true, clazz.getClassLoader());
            } catch (ClassNotFoundException e) {
                throw (Error) new NoClassDefFoundError(e.getMessage()).initCause(e);
            }
        }

        static {
            // pre-load and initialize InterruptedException and Cleaner classes
            // so that we don't get into trouble later in the run loop if there's
            // memory shortage while loading/initializing them lazily.
            ensureClassInitialized(InterruptedException.class);
            ensureClassInitialized(Cleaner.class);
        }

        ReferenceHandler(ThreadGroup g, String name) {
            super(g, name);
        }

        public void run() {
            while (true) {
                tryHandlePending(true);
            }
        }
    }

    /**
     * Try handle pending {@link Reference} if there is one.<p>
     * Return {@code true} as a hint that there might be another
     * {@link Reference} pending or {@code false} when there are no more pending
     * {@link Reference}s at the moment and the program can do some other
     * useful work instead of looping.
     *
     * @param waitForNotify if {@code true} and there was no pending
     *                      {@link Reference}, wait until notified from VM
     *                      or interrupted; if {@code false}, return immediately
     *                      when there is no pending {@link Reference}.
     * @return {@code true} if there was a {@link Reference} pending and it
     *         was processed, or we waited for notification and either got it
     *         or thread was interrupted before being notified;
     *         {@code false} otherwise.
     */
    static boolean tryHandlePending(boolean waitForNotify) {
        Reference<Object> r;
        Cleaner c;
        try {
            synchronized (lock) {
                if (pending != null) {
                    r = pending;
                    // 'instanceof' might throw OutOfMemoryError sometimes
                    // so do this before un-linking 'r' from the 'pending' chain...
                    c = r instanceof Cleaner ? (Cleaner) r : null;
                    // unlink 'r' from 'pending' chain
                    pending = r.discovered;
                    r.discovered = null;
                } else {
                    // The waiting on the lock may cause an OutOfMemoryError
                    // because it may try to allocate exception objects.
                    if (waitForNotify) {
                        lock.wait();
                    }
                    // retry if waited
                    return waitForNotify;
                }
            }
        } catch (OutOfMemoryError x) {
            // Give other threads CPU time so they hopefully drop some live references
            // and GC reclaims some space.
            // Also prevent CPU intensive spinning in case 'r instanceof Cleaner' above
            // persistently throws OOME for some time...
            Thread.yield();
            // retry
            return true;
        } catch (InterruptedException x) {
            // retry
            return true;
        }

        // Fast path for cleaners
        if (c != null) {
            c.clean();
            return true;
        }

        ReferenceQueue<? super Object> q = r.queue;
        if (q != ReferenceQueue.NULL) q.enqueue(r);
        return true;
    }

    static {
        ThreadGroup tg = Thread.currentThread().getThreadGroup();
        for (ThreadGroup tgn = tg;
             tgn != null;
             tg = tgn, tgn = tg.getParent());
        Thread handler = new ReferenceHandler(tg, "Reference Handler");
        /* If there were a special system-only priority greater than
         * MAX_PRIORITY, it would be used here
         */
        handler.setPriority(Thread.MAX_PRIORITY);
        handler.setDaemon(true);
        handler.start();

        // provide access in SharedSecrets
        SharedSecrets.setJavaLangRefAccess(new JavaLangRefAccess() {
            @Override
            public boolean tryHandlePendingReference() {
                return tryHandlePending(false);
            }
        });
    }
    ...
    Reference(T referent, ReferenceQueue<? super T> queue) {
        this.referent = referent;
        this.queue = (queue == null) ? ReferenceQueue.NULL : queue;
    }

}

这里代码看着有些糊涂,并没有代码给pending这个类变量赋值,为啥ReferenceHandler这个线程执行体里又在读取它的值,但看了看private static Reference<Object> pending = null;这一行上面的注释,想了想终于明白了,原来JVM垃圾回收器将将虚引用加入到与之关联的引用队列后,JVM垃圾回收器又负责逐个将引用队列中的引用拿出来赋于pending,然后通知ReferenceHandler线程,ReferenceHandler线程拿到引用后,发现如果是Cleaner,则调用其clean方法。然后终于与DirectByteBuffer里的Deallocator接上了,最终DirectByteBuffer申请的C堆内存被释放。

既然DirectByteBuffer申请的C堆内存释放是自动的,为啥在这个场景里会出现OOM呢?查阅java的bug记录,终于找到原因。http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4857305http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4469299

意思是如果DirectByteBuffer创建得过于频繁,服务器的CPU太繁忙,C堆内存还是会OOM的,原因是JVM来不及进行GC及Finalization,大量对象的销毁工作被推后,最终C堆内存无法得到释放。

解决方案

bug记录提到了3个解决方案:

Insert occasional explicit System.gc() invocations to ensure that direct buffers are reclaimed.

Reduce the size of the young generation to force more frequent GCs.

Explicitly pool direct buffers at the application level.

我这里采用了第一个解决方案,代码如下:

public void encodeFrame(BufferedImage image, long frameTime) {
    try {
        long t = frameTime * 1000L;
        if(t>recorder.getTimestamp()) {
            recorder.setTimestamp(t);
        }
        Frame frame = java2dConverter.convert(image);
        recorder.record(frame);
        if(System.currentTimeMillis() - lastGCTime > 60000){
            System.gc();
            System.runFinalization();
            lastGCTime = System.currentTimeMillis();
            Thread.yield();
            TimeUnit.SECONDS.sleep(3);
        }
    } catch (FrameRecorder.Exception e) {
        log.error("JavaCVMp4Encoder encode frame error.", e);
    }
}

意思是说每隔1分钟显式地调用System.gc();System.runFinalization();,并让出CPU休息3秒钟。经过长达10几个小时的测试,目前一切都正常了。

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏我的小碗汤

golang 设置 http response 响应头与坑

之前遇到个问题,在一段代码中这样设置WriteHeader,最后在header中取Name时怎么也取不到。

852
来自专栏数据结构与算法

BZOJ1269: [AHOI2006]文本编辑器editor

Descriptio 这些日子,可可不和卡卡一起玩了,原来可可正废寝忘食的想做一个简单而高效的文本编辑器。你能帮助他吗? 为了明确任务目标,可可对“文本编辑器...

2757
来自专栏mini188

java中的锁

java中有哪些锁 这个问题在我看了一遍<java并发编程>后尽然无法回答,说明自己对于锁的概念了解的不够。于是再次翻看了一下书里的内容,突然有点打开脑门的感觉...

3279
来自专栏恰同学骚年

Hadoop学习笔记—7.计数器与自定义计数器

  在上图所示中,计数器有19个,分为四个组:File Output Format Counters、FileSystemCounters、File Input...

1022
来自专栏向治洪

JNI动态注册native方法及JNI数据使用

前言 或许你知道了jni的简单调用,其实不算什么百度谷歌一大把,虽然这些jni绝大多数情况下都不会让我们安卓工程师来弄,毕竟还是有点难,但是我们还是得打破砂锅知...

2126
来自专栏Hongten

JSP 九大内置对象

① out - javax.servlet.jsp.jspWriter    out对象用于把结果输出到网页上。

3712
来自专栏喔家ArchiSelf

从构造函数看线程安全

线程是编程中常用而且强大的手段,在使用过程中,我们经常面对的就是线程安全问题了。对于Java中常见的数据结构而言,一般的,ArrayList是非线程安全的,Ve...

1032
来自专栏Spark生态圈

[Spark SQL] 源码解析之Analyzer

Analyzer模块将Unresolved LogicalPlan结合元数据catalog进行绑定,最终转化为Resolved LogicalPlan。跟着代码...

1232
来自专栏冰霜之地

高效的序列化/反序列化数据方式 Protobuf

上篇文章中其实已经讲过了 encode 的过程,这篇文章以 golang 为例,从代码实现的层面讲讲序列化和反序列化的过程。

4915
来自专栏coolblog.xyz技术专栏

MyBatis 源码分析 - SQL 的执行过程

本篇文章较为详细的介绍了 MyBatis 执行 SQL 的过程。该过程本身比较复杂,牵涉到的技术点比较多。包括但不限于 Mapper 接口代理类的生成、接口方法...

4182

扫码关注云+社区