ContentProvider 引发闪退之谜

ContentProvider(以下简称CP)是Android的四大组件之一,提供类似数据库增删查改的数据操作方式,同时还支持跨进程。CP在跨进程调用的场景中,作为数据提供的进程称作Server进程,请求数据的进程称作Client进程。当我们享受它在跨进程场景下带来的便利时,可能未曾想到Client进程存在被杀的隐患。

一、日志分析

06-06 21:57:52.892   916  2275 I ActivityManager: Start proc 26931:com.example.music/u0a103 for content provider com.example.music/.sharedfileaccessor.ContentProviderImpl
06-06 21:57:53.393   916   941 I ActivityManager: Process com.example.music (pid 26931) has died
06-06 21:57:53.423   916   941 I ActivityManager: Killing 16141:com.example.music:service/u0a103 (adj 2): depends on provider com.example.music/.sharedfileaccessor.ContentProviderImpl in dying proc com.example.music

这是Client进程被ActivityManagerService(以下简称AMS)杀死的3行关键日志:

  1. 第一行,CP的Server进程没有启动,所以AMS会先启动它;
  2. 第二行,Server进程启动后,某些原因(实际场景可能是LowMemoryKill)死掉了;
  3. 第三行,AMS因为CP的Server进程死了,所以杀死了CP的Client进程。

那么是什么样的工作原理,使得无辜的Client进程会被AMS杀死呢?这需要结合源代码进行分析。

二、清理已死进程的CP

首先,我们深入到Android源码(下文基于6.x版本),从”has died”的日志来看AMS1对于已经死亡的进程会做什么善后工作。

final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
        boolean fromBinderDied) {
    // Clean up already done if the process has been re-started.
    if (app.pid == pid && app.thread != null &&
            app.thread.asBinder() == thread.asBinder()) {
        boolean doLowMem = app.instrumentationClass == null;
        boolean doOomAdj = doLowMem;
        if (!app.killedByAm) {
            // The "has died" log is printed here!!!
            Slog.i(TAG, "Process " + app.processName + " (pid " + pid
                    + ") has died");
            mAllowLowerMemLevel = true;
        } else {
            // Note that we always want to do oom adj to update our state with the
            // new number of procs.
            mAllowLowerMemLevel = false;
            doLowMem = false;
        }
        EventLog.writeEvent(EventLogTags.AM_PROC_DIED, app.userId, app.pid, app.processName);
        if (DEBUG_CLEANUP) Slog.v(TAG_CLEANUP,
            "Dying app: " + app + ", pid: " + pid + ", thread: " + thread.asBinder());
        handleAppDiedLocked(app, false, true);
        if (doOomAdj) {
            updateOomAdjLocked();
        }
        if (doLowMem) {
            doLowMemReportIfNeededLocked(app);
        }
    } else if (app.pid != pid) {
        // A new process has already been started.
        Slog.i(TAG, "Process " + app.processName + " (pid " + pid
                + ") has died and restarted (pid " + app.pid + ").");
        EventLog.writeEvent(EventLogTags.AM_PROC_DIED, app.userId, app.pid, app.processName);
    } else if (DEBUG_PROCESSES) {
        Slog.d(TAG_PROCESSES, "Received spurious death notification for thread "
                + thread.asBinder());
    }
}

在AMS的appDiedLocked()方法中,找到了”has died”日志的打印输出。然后,代码会运行到handleAppDiedLocked()方法。

private final void handleAppDiedLocked(ProcessRecord app,
        boolean restarting, boolean allowRestart) {
    int pid = app.pid;
    boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1);
}

private final boolean cleanUpApplicationRecordLocked(ProcessRecord app,
        boolean restarting, boolean allowRestart, int index) {
    // Take care of any launching providers waiting for this process.
    if (cleanupAppInLaunchingProvidersLocked(app, false)) {
        restart = true;
    }
}

boolean cleanupAppInLaunchingProvidersLocked(ProcessRecord app, boolean alwaysBad) {
    // Look through the content providers we are waiting to have launched,
    // and if any run in this process then either schedule a restart of
    // the process or kill the client waiting for it if this process has
    // gone bad.
    boolean restart = false;
    for (int i = mLaunchingProviders.size() - 1; i >= 0; i--) {
        ContentProviderRecord cpr = mLaunchingProviders.get(i);
        if (cpr.launchingApp == app) {
            if (!alwaysBad && !app.bad && cpr.hasConnectionOrHandle()) {
                restart = true;
            } else {
                removeDyingProviderLocked(app, cpr, true);
            }
        }
    }
    return restart;
}

private final boolean removeDyingProviderLocked(ProcessRecord proc,
        ContentProviderRecord cpr, boolean always) {
    for (int i = cpr.connections.size() - 1; i >= 0; i--) {
        ContentProviderConnection conn = cpr.connections.get(i);
        if (conn.waiting) {
            // If this connection is waiting for the provider, then we don't
            // need to mess with its process unless we are always removing
            // or for some reason the provider is not currently launching.
            if (inLaunching && !always) {
                continue;
            }
        }
        //Got the information of the Client Process of this ContentProvider!!!
        ProcessRecord capp = conn.client;
        conn.dead = true;
        //This is an important checking, stableCount must large than 0.
        if (conn.stableCount > 0) {
            if (!capp.persistent && capp.thread != null
                    && capp.pid != 0
                    && capp.pid != MY_PID) {

                //This is exactly where the Client Process is killed!!!
                capp.kill("depends on provider "
                        + cpr.name.flattenToShortString()
                        + " in dying proc " + (proc != null ? proc.processName : "??")
                        + " (adj " + (proc != null ? proc.setAdj : "??") + ")", true);
            }
        } else if (capp.thread != null && conn.provider.provider != null) {
        }
    }
}

经过一层一层的方法调用链条:handleAppDiedLocked() -> cleanUpApplicationRecordLocked() -> cleanupAppInLaunchingProvidersLocked() -> removeDyingProviderLocked(),终于找到了Client进程被杀死了的地方,并且打印输出的日志也完全吻合。

不过,即使在最终的 removeDyingProviderLocked() 方法里面,要走到杀死Client进程的代码,也是要经过一层层的条件判断。其中最关键的是,conn.stableCount > 0。那么,ContentProviderConnection(以下简称CPC)的stableCount什么时候增,什么时候减?

三、CPC的stableCount计数增加

stableCount的增加在AMS的incProviderCountLocked()方法。在AMS中,方法调用链是AMS.getContentProvide() -> AMS.getContentProviderImpl() -> AMS.incProviderCountLocked():

ContentProviderConnection incProviderCountLocked(ProcessRecord r,
        final ContentProviderRecord cpr, IBinder externalProcessToken, boolean stable) {
    if (r != null) {
        for (int i=0; i<r.conProviders.size(); i++) {
            ContentProviderConnection conn = r.conProviders.get(i);
            if (conn.provider == cpr) {
                if (stable) {
                    //The stableCount is increased here!!!
                    conn.stableCount++;
                    conn.numStableIncs++;
                }
            }
        }
        ContentProviderConnection conn = new ContentProviderConnection(cpr, r);
        if (stable) {
            //If there is no target ContentProvider found in conProviders, then create a new instance. And initialize the stableCount to 1.
            conn.stableCount = 1;
            conn.numStableIncs = 1;
        }
        cpr.connections.add(conn);
        r.conProviders.add(conn);
    }
}

private ContentProviderHolder getContentProviderImpl(IApplicationThread caller,
        String name, IBinder token, boolean stable, int userId) {
    synchronized(this) {
        boolean providerRunning = cpr != null && cpr.proc != null && !cpr.proc.killed;
        if (providerRunning) {
            conn = incProviderCountLocked(r, cpr, token, stable);
        }

        if (!providerRunning) {
            conn = incProviderCountLocked(r, cpr, token, stable);
        }
        checkTime(startTime, "getContentProviderImpl: done!");
    }
}

@Override
public final ContentProviderHolder getContentProvider(
        IApplicationThread caller, String name, int userId, boolean stable) {
    return getContentProviderImpl(caller, name, null, stable, userId);
}

AMS.getContentProvider()方法会在ActivityThread2(以下简称AT)里面被调用到。

public final IContentProvider acquireProvider(
        Context c, String auth, int userId, boolean stable) {
    try {
        holder = ActivityManagerNative.getDefault().getContentProvider(
                getApplicationThread(), auth, userId, stable);
    } catch (RemoteException ex) {
        throw ex.rethrowFromSystemServer();
    }
    return holder.provider;
}

AT.acquireProvider()方法会在ContextImpl3里面被调用。

private static final class ApplicationContentResolver extends ContentResolver {
    private final ActivityThread mMainThread;
    private final UserHandle mUser;

    public ApplicationContentResolver(
            Context context, ActivityThread mainThread, UserHandle user) {
        super(context);
        mMainThread = Preconditions.checkNotNull(mainThread);
        mUser = Preconditions.checkNotNull(user);
    }

    @Override
    protected IContentProvider acquireProvider(Context context, String auth) {
        return mMainThread.acquireProvider(context,
                ContentProvider.getAuthorityWithoutUserId(auth),
                resolveUserIdFromAuthority(auth), true);
    }
}

private ContextImpl(ContextImpl container, ActivityThread mainThread,
        LoadedApk packageInfo, IBinder activityToken, UserHandle user, int flags,
        Display display, Configuration overrideConfiguration, int createDisplayWithId) {
    //Create a new instance in the constructor.
    mContentResolver = new ApplicationContentResolver(this, mainThread, user);
}

@Override
public ContentResolver getContentResolver() {
    return mContentResolver;
}

ContextImpl正是Android应用开发经常打交道的Context的实现类。在它的构造方法中,会实例化一个mContentResolver,用于getContentResolver()方法调用的时候返回,而这个方法是我们使用ContentProvider的时候,一定会用到的。

mContentResolver的类型是ApplicationContentResolver(以下简称ACR),它是ContentResolver4(以下简称CR)的实现类。在ACR实现的acquireProvider()方法,直接返回的是AT.acquireProvider()。ACR.acquireProvider()方法在CR.acquireProvider()方法中会被调用:

public final IContentProvider acquireProvider(Uri uri) {
    if (!SCHEME_CONTENT.equals(uri.getScheme())) {
        return null;
    }
    final String auth = uri.getAuthority();
    if (auth != null) {
        // calls the abstract method, which is implemented in the ApplicationContentResolver
        return acquireProvider(mContext, auth);
    }
    return null;
}

CR的每一个增删查改的方法里面,acquireProvider()方法和releaseProvider()方法都是成对出现的。

public final @Nullable Uri insert(@RequiresPermission.Write @NonNull Uri url,
            @Nullable ContentValues values) {
    IContentProvider provider = acquireProvider(url);
    try {
    } catch (RemoteException e) {
    } finally {
        releaseProvider(provider);
    }
}

public final int delete(@RequiresPermission.Write @NonNull Uri url, @Nullable String where,
        @Nullable String[] selectionArgs) {
    IContentProvider provider = acquireProvider(url);
    try {
    } catch (RemoteException e) {
    } finally {
        releaseProvider(provider);
    }
}

public final @Nullable Cursor query(final @RequiresPermission.Read @NonNull Uri uri,
        @Nullable String[] projection, @Nullable String selection,
        @Nullable String[] selectionArgs, @Nullable String sortOrder,
        @Nullable CancellationSignal cancellationSignal) {
    try {
        try {
            qCursor = unstableProvider.query(mPackageName, uri, projection,
                    selection, selectionArgs, sortOrder, remoteCancellationSignal);
        } catch (DeadObjectException e) {
            stableProvider = acquireProvider(uri);
        }
    } catch (RemoteException e) {
        // Arbitrary and not worth documenting, as Activity
        // Manager will kill this process shortly anyway.
        return null;
    } finally {
        if (stableProvider != null) {
            releaseProvider(stableProvider);
        }
    }
}

四、CPC的stableCount计数减少

CR.releaseProvider()方法的调用很有可能会使得stableCount的计数减少,下面我们要继续追踪代码来作证明。 ACR.releaseProvider()方法是直接调用了AT.releaseProvider()方法。

private static final class ApplicationContentResolver extends ContentResolver {
    @Override
    public boolean releaseProvider(IContentProvider provider) {
        return mMainThread.releaseProvider(provider, true);
    }
}

果然,在AT.releaseProvider()方法里面,stableCount被减1了。

public final boolean releaseProvider(IContentProvider provider, boolean stable) {
    synchronized (mProviderMap) {
        if (stable) {
            prc.stableCount -= 1;
        }
    }
}

至此,我们已经了解,AMS杀死CP的Client进程的工作原理:CR的方法调用过程中,Server进程死了,那么AMS在清理Server进程的CP时候,对于stableCount > 0的CP的Client进程会被kill掉。

五、另一种AMS杀死Client进程的场景

AMS.getContentProviderImpl()方法里,如果发现CP的Server进程未启动,会调用startProcessLocked()启动Server进程。然后调用incProviderCountLocked()方法,增加stableCount的计数。当Server进程启动,会调用AMS.attachApplicationLocked()方法。我们主要关注它里面2个逻辑:

  • 发送一个延迟10秒的CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG的消息给mHandler;
  • 调用ActivityThread.bindApplication();
private ContentProviderHolder getContentProviderImpl(IApplicationThread caller,
        String name, IBinder token, boolean stable, int userId) {
    synchronized(this) {
        boolean providerRunning = cpr != null && cpr.proc != null && !cpr.proc.killed;
        if (!providerRunning) {
            // If the provider is not already being launched, then get it
            // started.
            if (i >= N) {
                try {
                    ProcessRecord proc = getProcessRecordLocked(
                            cpi.processName, cpr.appInfo.uid, false);
                    if (proc != null && proc.thread != null && !proc.killed) {
                    } else {
                        proc = startProcessLocked(cpi.processName,
                                cpr.appInfo, false, 0, "content provider",
                                new ComponentName(cpi.applicationInfo.packageName,
                                        cpi.name), false, false, false);
                    }
                }
            }
            //increase stableCount
            conn = incProviderCountLocked(r, cpr, token, stable);
        }
    }
}

private final boolean attachApplicationLocked(IApplicationThread thread,
        int pid) {
    if (providers != null && checkAppInLaunchingProvidersLocked(app)) {
        Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG);
        msg.obj = app;
        mHandler.sendMessageDelayed(msg, CONTENT_PROVIDER_PUBLISH_TIMEOUT);
    }
    try {
        ProfilerInfo profilerInfo = profileFile == null ? null
                : new ProfilerInfo(profileFile, profileFd, samplingInterval, profileAutoStop);
        thread.bindApplication(processName, appInfo, providers, app.instrumentationClass,
                profilerInfo, app.instrumentationArguments, app.instrumentationWatcher,
                app.instrumentationUiAutomationConnection, testMode,
                mBinderTransactionTrackingEnabled, enableTrackAllocation,
                isRestrictedBackupMode || !normalMode, app.persistent,
                new Configuration(mConfiguration), app.compat,
                getCommonServicesLocked(app.isolated),
                mCoreSettingsObserver.getCoreSettingsLocked());
    }
}

CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息的处理,会调用AMS.processContentProviderPublishTimedOutLocked() -> AMS.cleanupAppInLaunchingProvidersLocked()。后者这个方法,前文介绍过了,当判断CPC.stableCount > 0,AMS会杀死Client进程。

final class MainHandler extends Handler {
    public MainHandler(Looper looper) {
        super(looper, null, true);
    }

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
        case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: {
            ProcessRecord app = (ProcessRecord)msg.obj;
            synchronized (ActivityManagerService.this) {
                processContentProviderPublishTimedOutLocked(app);
            }
        } break;
        }
    }
}

private final void processContentProviderPublishTimedOutLocked(ProcessRecord app) {
    cleanupAppInLaunchingProvidersLocked(app, true);
    removeProcessLocked(app, false, true, "timeout publishing content providers");
}

只有10秒钟的倒计时,我们看看AT.bindApplication()如何救赎。bindApplication()方法会发送一个BIND_APPLICATION的消息个Handler。

private class ApplicationThread extends ApplicationThreadNative {
    public final void bindApplication(String processName, ApplicationInfo appInfo,
            List<ProviderInfo> providers, ComponentName instrumentationName,
            ProfilerInfo profilerInfo, Bundle instrumentationArgs,
            IInstrumentationWatcher instrumentationWatcher,
            IUiAutomationConnection instrumentationUiConnection, int debugMode,
            boolean enableBinderTracking, boolean trackAllocation,
            boolean isRestrictedBackupMode, boolean persistent, Configuration config,
            CompatibilityInfo compatInfo, Map<String, IBinder> services, Bundle coreSettings) {
        sendMessage(H.BIND_APPLICATION, data);
    }
}

AT的Handler处理消息,会调用handleBindApplication()方法。

            case BIND_APPLICATION:
                handleBindApplication(data);
                break;

AT.handleBindApplication() -> AT.installContentProviders(),后者方法里面,有2处我们需要关注的:

  • installProvider()调用了ContentProvider.attachInfo();
  • 它调用了AMS.publishContentProviders()方法;
private void handleBindApplication(AppBindData data) {
    try {
        if (!data.restrictedBackupMode) {
            if (!ArrayUtils.isEmpty(data.providers)) {
                installContentProviders(app, data.providers);
            }
        }
    }
}

private void installContentProviders(
        Context context, List<ProviderInfo> providers) {
            final ArrayList<IActivityManager.ContentProviderHolder> results =
        new ArrayList<IActivityManager.ContentProviderHolder>();

    for (ProviderInfo cpi : providers) {
        IActivityManager.ContentProviderHolder cph = installProvider(context, null, cpi,
                false /*noisy*/, true /*noReleaseNeeded*/, true /*stable*/);
        if (cph != null) {
            cph.noReleaseNeeded = true;
            results.add(cph);
        }
    }

    try {
        ActivityManagerNative.getDefault().publishContentProviders(
            getApplicationThread(), results);
    } catch (RemoteException ex) {
        throw ex.rethrowFromSystemServer();
    }
}

private IActivityManager.ContentProviderHolder installProvider(Context context,
        IActivityManager.ContentProviderHolder holder, ProviderInfo info,
        boolean noisy, boolean noReleaseNeeded, boolean stable) {
    ContentProvider localProvider = null;
    IContentProvider provider;
    if (holder == null || holder.provider == null) {
        try {
            //attachInfo calls the ContentProvider.onCreate() method
            localProvider.attachInfo(c, info);
        }
    }
}

我们看CP.attachInfo()方法,会调用到CP.onCreate()方法,这是一个抽象方法。当我们自定义实现CP的时候,需要实现这个方法。

public void attachInfo(Context context, ProviderInfo info) {
    attachInfo(context, info, false);
}

private void attachInfo(Context context, ProviderInfo info, boolean testing) {
    mNoPerms = testing;

    /*
     * Only allow it to be set once, so after the content service gives
     * this to us clients can't change it.
     */
    if (mContext == null) {
        ContentProvider.this.onCreate();
    }
}

public abstract boolean onCreate();

在AMS.publishContentProviders()方法,终于找到了拆解定时炸弹的钥匙。CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG被remove了!

public final void publishContentProviders(IApplicationThread caller,
        List<ContentProviderHolder> providers) {
    synchronized (this) {
        final int N = providers.size();
        for (int i = 0; i < N; i++) {
            if (dst != null) {
                if (wasInLaunchingProviders) {
                mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r);
                }
            }
        }
    }
}

所以,AMS留了10秒钟,给CP的Server进程启动和CP做准备工作(其中包括了CP.onCreate()),否则Client进程避免不了被杀的命运。

六、总结

我们选择ContentProvider作为跨进程通信的方案时,要把Client进程被杀死的情况考虑在内,因为这看似不可完全避免。

七、参考

理解ContentProvider原理: http://gityuan.com/2016/07/30/content-provider/ ContentProvider引用计数: http://gityuan.com/2016/05/03/content_provider_release/

八、源码

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java frameworks/base/core/java/android/app/ActivityThread.java frameworks/base/core/java/android/app/ContextImpl.java frameworks/base/core/java/android/content/ContentResolver.java

原创声明,本文系作者授权云+社区-专栏发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

编辑于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏杨建荣的学习笔记

关于oracle后台启用的schedule job(r2笔记65天)

在昨天晚上10点开始,数据库的性能开始下降,出现了一些j00开头的进程。 而且持续了比较长的时间,简单分析了一下,对应的进程执行的sql语句如下。 ######...

2046
来自专栏数据和云

Oracle Hints - 先知的提示

在上周恩墨微信大讲堂的讨论中,几个有趣的视图跃入我们的视野,可以分享给大家。 在Oracle 11g中,新增的视图V$SQL_HINT记录了Oracle数据库中...

2686
来自专栏杨建荣的学习笔记

关于trigger过滤最大值的问题(54天)

今天碰到一个问题,开发有一个比较紧的需要,想问问我数据库这边能不能帮上忙。 如果开发那边来做,需要改代码,如果数据库这边能临时支持,代码就可以多做些测试,然后再...

2665
来自专栏数据和云

【动手实践】Oracle 12.2 新特性:只读分区的使用和维护

在12.2的分区新特性中引入了只读分区的特性,可以帮助我们将某些分区的数据进行静态化保护。 这一特性通过将表或者分区设置为READ ONLY或者READ WRI...

2737
来自专栏杨建荣的学习笔记

聚簇因子和执行计划的联系(r3笔记第90天)

在平时的工作中,可能会碰到一种很奇怪的问题,本来在生产环境中有些sql语句执行没有问题,一个很普通的查询预期走了索引扫面,但是拷贝数据到其它环境之后,就发现却走...

2635
来自专栏杨建荣的学习笔记

通过使用hint unnest调优sql语句(r4笔记第38天)

生产环境中有一条sql语句通过sql_monitor看到执行的时间实在是太惊人了,竟然达到了13个小时,而且还没有执行完。 SessionAPPC (20015...

2678
来自专栏杨建荣的学习笔记

物化视图相关的性能改进 (r7笔记第58天)

今天早上开发的一个同事找到我说他早上做了一个统计查询,但是感觉速度很慢,已经过了一个小时了还没有反应。想让我看看是什么情况。 我通过v$session查到有一个...

3265
来自专栏乐沙弥的世界

Oracle 11G统计信息自动收集及调整

750
来自专栏杨建荣的学习笔记

通过pl/sql来格式化sql(r4笔记第63天)

在之前的一篇博文中分享了通过java来格式化sql,http://blog.itpub.net/23718752/viewspace-1444910/ 今天突然...

3184
来自专栏数据分析

SQL Server 性能优化之——T-SQL 临时表、表变量、UNION

这次看一下临时表,表变量和Union命令方面是否可以被优化呢? 一、临时表和表变量 很多数据库开发者使用临时表和表变量将代码分解成小块代码来简化复杂的逻辑。但是...

3504

扫码关注云+社区