前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >死锁问题分析的利器——valgrind的DRD和Helgrind

死锁问题分析的利器——valgrind的DRD和Helgrind

作者头像
方亮
发布2019-01-16 17:14:19
1.6K0
发布2019-01-16 17:14:19
举报
文章被收录于专栏:方亮方亮

在《DllMain中不当操作导致死锁问题的分析--死锁介绍》一文中,我们介绍了死锁产生的原因。一般来说,如果我们对线程同步技术掌握不牢,或者同步方案混乱,极容易导致死锁。本文我们将介绍如何使用valgrind排查死锁问题。(转载请指明出于breaksoftware的csdn博客)

        构造一个场景

代码语言:javascript
复制
#include <pthread.h>

pthread_mutex_t s_mutex_a;
pthread_mutex_t s_mutex_b;
pthread_barrier_t s_barrier;

void lock() {
    pthread_mutex_lock(&s_mutex_b);
    {
        pthread_barrier_wait(&s_barrier);

        pthread_mutex_lock(&s_mutex_a);
        pthread_mutex_unlock(&s_mutex_a);
    }
    pthread_mutex_unlock(&s_mutex_b);
}

static void* thread_routine(void* arg) {
    pthread_mutex_lock(&s_mutex_a);
    {
        pthread_barrier_wait(&s_barrier);

        pthread_mutex_lock(&s_mutex_b);
        pthread_mutex_unlock(&s_mutex_b);
    }
    pthread_mutex_unlock(&s_mutex_a);
}

int main(int argc, char** argv) {
    pthread_t tid;

    pthread_mutex_init(&s_mutex_a, 0);
    pthread_mutex_init(&s_mutex_b, 0);
    pthread_barrier_init(&s_barrier, 0, 2);

    pthread_create(&tid, 0, &thread_routine, 0);

    lock();

    pthread_join(tid, 0);
    pthread_cancel(tid);

    pthread_barrier_destroy(&s_barrier);
    pthread_mutex_destroy(&s_mutex_a);
    pthread_mutex_destroy(&s_mutex_b);

    return 0;
}

        这段代码我们只要关注lock和thread_routine两个方法。

        lock方法在主线程中执行,它先给s_mutex_b上锁,然后通过屏障s_barrier等待线程也执行到屏障处(第21行)。

        thread_routine是线程函数,它先给s_mutex_a上锁,然后通过屏障s_barrier等待主线程也执行到屏障处(第10行)。

        主线程和子线程都执行到屏障处后,屏障被打开,它们继续向下执行:主线程执行到第12行试图获取s_mutex_a;子线程执行到第23行试图获取s_mutex_b。由于这两个互斥量已经被占用,所以产生死锁。

        这是通过代码分析出来的,但是对于比较大的工程项目,我们则需要通过工具来分析。下面我们使用valgrind来分析

代码语言:javascript
复制
valgrind --tool=drd --trace-mutex=yes ./dead_lock

        我们使用上面指令,让valgrind把互斥量相关的信息给打印出来

代码语言:javascript
复制
==4749== [1] mutex_init      mutex 0x30a040
==4749== [1] mutex_init      mutex 0x30a0a0
==4749== [1] mutex_init      mutex 0x1ffefffe10
==4749== [1] mutex_ignore_ordering mutex 0x1ffefffe10
==4749== [1] mutex_trylock   mutex 0x1ffefffe10 rc 0 owner 0
==4749== [1] post_mutex_lock mutex 0x1ffefffe10 rc 0 owner 0
==4749== [1] mutex_unlock    mutex 0x1ffefffe10 rc 1
==4749== [2] mutex_trylock   mutex 0x1ffefffe10 rc 0 owner 1
==4749== [2] post_mutex_lock mutex 0x1ffefffe10 rc 0 owner 1
==4749== [2] mutex_unlock    mutex 0x1ffefffe10 rc 1
==4749== [2] mutex_trylock   mutex 0x30a040 rc 0 owner 0
==4749== [2] post_mutex_lock mutex 0x30a040 rc 0 owner 0
==4749== [1] cond_post_wait  mutex 0x1ffefffe10 rc 0 owner 2
==4749== [1] mutex_unlock    mutex 0x1ffefffe10 rc 1
==4749== [1] mutex_destroy   mutex 0x1ffefffe10 rc 0 owner 1
==4749== [1] mutex_trylock   mutex 0x30a0a0 rc 0 owner 0
==4749== [1] post_mutex_lock mutex 0x30a0a0 rc 0 owner 0
==4749== [1] mutex_trylock   mutex 0x30a040 rc 1 owner 2
==4749== [2] mutex_trylock   mutex 0x30a0a0 rc 1 owner 1

        第18行显示线程1试图给0x30a040互斥量上锁,但是该互斥量的所有者(owner)是线程2。

        第19行显示线程2试图给0x30a0a0互斥量上锁,但是该互斥量的所有者(owner)是线程1。

        如此我们便可以确定这段程序卡住是因为死锁导致的。

        但是DRD有个问题,不能指出发生死锁的位置。这个时候Helgrind该出场了。

代码语言:javascript
复制
valgrind --tool=helgrind ./dead_lock 

        helgrind执行时,如果发生死锁,需要ctrl+C来终止运行,于是可以得到如下结果

代码语言:javascript
复制
==5373== Process terminating with default action of signal 2 (SIGINT)
==5373==    at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373==    by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373==    by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108A11: lock (dead_lock.c:12)
==5373==    by 0x108AF4: main (dead_lock.c:38)
==5373== ---Thread-Announcement------------------------------------------
==5373== 
==5373== Thread #2 was created
==5373==    at 0x518287E: clone (clone.S:71)
==5373==    by 0x4E49EC4: create_thread (createthread.c:100)
==5373==    by 0x4E49EC4: pthread_create@@GLIBC_2.2.5 (pthread_create.c:797)
==5373==    by 0x4C36A27: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108AEA: main (dead_lock.c:36)
==5373== 
==5373== ----------------------------------------------------------------
==5373== 
==5373== Thread #2: Exiting thread still holds 1 lock
==5373==    at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373==    by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373==    by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108A5C: thread_routine (dead_lock.c:23)
==5373==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x4E496DA: start_thread (pthread_create.c:463)
==5373==    by 0x518288E: clone (clone.S:95)
==5373== 
==5373== ---Thread-Announcement------------------------------------------
==5373== 
==5373== Thread #1 is the program's root thread
==5373== 
==5373== ----------------------------------------------------------------
==5373== 
==5373== Thread #1: Exiting thread still holds 1 lock
==5373==    at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373==    by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373==    by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108A11: lock (dead_lock.c:12)
==5373==    by 0x108AF4: main (dead_lock.c:38)

        第22和37行分别显示子线程和主线程在中断之前,都锁在哪行,这样就更容易定位问题了。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018年08月02日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
腾讯云代码分析
腾讯云代码分析(内部代号CodeDog)是集众多代码分析工具的云原生、分布式、高性能的代码综合分析跟踪管理平台,其主要功能是持续跟踪分析代码,观测项目代码质量,支撑团队传承代码文化。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档