前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >互斥量、读写锁长占时分析的利器——valgrind的DRD

互斥量、读写锁长占时分析的利器——valgrind的DRD

作者头像
方亮
发布2019-01-16 16:58:29
1.4K0
发布2019-01-16 16:58:29
举报
文章被收录于专栏:方亮方亮

在进行多线程编程时,我们可能会存在同时操作(读、写)同一份内存的可能性。为了保证数据的正确性,我们往往会使用互斥量、读写锁等同步方法。(转载请指明出于breaksoftware的csdn博客)

        互斥量的用法如下

代码语言:javascript
复制
  pthread_mutex_lock(&mutex);
  // do something
  pthread_mutex_unlock(&mutex);

        我们在第2行处填充业务代码。这样一个线程上锁成功后,其他线程必须等待这个锁被释放(第3行)。这也就意味着其他线程必须等着第2行业务代码执行完毕才能继续执行。

        如果业务代码非常耗时,就会导致整个程序执行的效率大打折扣。因为大量的线程都处在等待状态,没有充分利用CPU资源。这与多线程编程的初衷是相违背的。于是控制锁粒度是个非常重要的优化设计方案。

        但是,对于一个庞大的项目,可能使用互斥量加锁的地方很多,我们如何排查出是哪个锁的效率低呢?这个使用valgrind就该出场了。

        我们设计一个例子

代码语言:javascript
复制
/** Hold several types of synchronization objects locked as long as specified.
 */

#define _GNU_SOURCE 1

#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

static void delay_ms(const int ms) {
  struct timespec ts;
  assert(ms >= 0);
  ts.tv_sec = ms / 1000;
  ts.tv_nsec = (ms % 1000) * 1000 * 1000;
  nanosleep(&ts, 0);
}

void double_lock_mutex(const int ms) {
  pthread_mutex_t     mutex;
  pthread_mutexattr_t mutexattr;

  fprintf(stderr, "Locking mutex ...\n");

  pthread_mutexattr_init(&mutexattr);
  pthread_mutexattr_settype(&mutexattr, PTHREAD_MUTEX_RECURSIVE);
  pthread_mutex_init(&mutex, &mutexattr);
  pthread_mutexattr_destroy(&mutexattr);
  pthread_mutex_lock(&mutex);
  delay_ms(ms);
  pthread_mutex_lock(&mutex);
  pthread_mutex_unlock(&mutex);
  pthread_mutex_unlock(&mutex);
  pthread_mutex_destroy(&mutex);
}

int main(int argc, char** argv) {
  int interval = 0;
  int optchar;

  while ((optchar = getopt(argc, argv, "i:")) != EOF) {
    switch (optchar) {
    case 'i':
      interval = atoi(optarg);
      break;
    default:
      fprintf(stderr, "Usage: %s [-i <interval time in ms>].\n", argv[0]);
      break;
    }
  }

  double_lock_mutex(interval);

  fprintf(stderr, "Done.\n");

  return 0;
}

        delay_ms方法接受程序传入的参数,然后休眠相应的毫秒数。这个操作用于模拟业务代码,当我们希望业务代码执行较快时,则把该时间调低;当我们希望业务代码非常耗时时,则把该时间调大。

        使用下面指令编译

代码语言:javascript
复制
gcc hold_lock.c -g -lpthread -o hold_lock

        对于产出,我们可以这么调用

代码语言:javascript
复制
./hold_lock -i 10000

        程序将在10000ms(10s)后执行完毕。相当于一个复杂的业务代码执行了10秒。

        然后我们使用下面的valgrind指令来检查锁占用的时间

代码语言:javascript
复制
valgrind --tool=drd --exclusive-threshold=10 ./hold_lock -i 20

        这次我们让业务代码只执行20ms,但是使用--exclusive-threshold=10参数的意思是:检查所有独占锁占用10ms已上的场景。

代码语言:javascript
复制
==4000== drd, a thread error detector
==4000== Copyright (C) 2006-2017, and GNU GPL'd, by Bart Van Assche.
==4000== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4000== Command: ./hold_lock -i 20
==4000== 
Locking mutex ...
==4000== Acquired at:
==4000==    at 0x4C39193: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4000==    by 0x108E1E: double_lock_mutex (hold_lock.c:31)
==4000==    by 0x109029: main (hold_lock.c:80)
==4000== Lock on mutex 0x1ffefffe60 was held during 22 ms (threshold: 10 ms).
==4000==    at 0x4C3A123: pthread_mutex_unlock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4000==    by 0x108E4C: double_lock_mutex (hold_lock.c:35)
==4000==    by 0x109029: main (hold_lock.c:80)
==4000== mutex 0x1ffefffe60 was first observed at:
==4000==    at 0x4C385F0: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4000==    by 0x108E06: double_lock_mutex (hold_lock.c:29)
==4000==    by 0x109029: main (hold_lock.c:80)
==4000== 
Done.
==4000== 
==4000== For counts of detected and suppressed errors, rerun with: -v
==4000== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

        第11行显示,这个互斥量占用了22ms。它是在hold_lock.c的第29行(第17行显示)第一次被使用的,在第31行(第9行显示)第一次被上锁,在第35行(第13行显示)最后一次被解锁。如此我们便能找到耗时超过10ms的独占锁了。

        我们再改下执行指令,让业务代码执行(休眠)9ms。这是处在比较靠近边界10ms的时间,所以我们多执行几次下面命令,可以看到有时候可能检测到超过10ms的,有时候也没有。

代码语言:javascript
复制
valgrind --tool=drd --exclusive-threshold=10 ./hold_lock -i 9
代码语言:javascript
复制
==4026== Command: ./hold_lock -i 9
==4026== 
Locking mutex ...
Done.
==4026== 

        上面是不超过10ms的场景,下面是超过的场景。

代码语言:javascript
复制
==4027== 
Locking mutex ...
==4027== Acquired at:
==4027==    at 0x4C39193: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4027==    by 0x108E1E: double_lock_mutex (hold_lock.c:31)
==4027==    by 0x109029: main (hold_lock.c:80)
==4027== Lock on mutex 0x1ffefffe60 was held during 11 ms (threshold: 10 ms).
==4027==    at 0x4C3A123: pthread_mutex_unlock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4027==    by 0x108E4C: double_lock_mutex (hold_lock.c:35)
==4027==    by 0x109029: main (hold_lock.c:80)
==4027== mutex 0x1ffefffe60 was first observed at:
==4027==    at 0x4C385F0: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4027==    by 0x108E06: double_lock_mutex (hold_lock.c:29)
==4027==    by 0x109029: main (hold_lock.c:80)
==4027== 
Done.
==4027== 

        除了互斥量,这个方式还可以检测读写锁。

        读写锁又称共享-独占锁。当写锁被设置,其他获取锁的操作都会进入等待状态(独占);当读锁被设置,其他线程仍然可以获取读锁(共享),但是写锁需要等待所有读锁释放后才可以获得。

        我们看个写锁耗时长的例子

代码语言:javascript
复制
void write_lock(const int ms) {
  pthread_rwlock_t    rwlock;

  fprintf(stderr, "Locking rwlock exclusively ...\n");

  pthread_rwlock_init(&rwlock, 0);
  pthread_rwlock_wrlock(&rwlock);
  delay_ms(ms);
  pthread_rwlock_unlock(&rwlock);
  pthread_rwlock_destroy(&rwlock);
}

        仍然使用exclusive-threshold参数去检测

代码语言:javascript
复制
valgrind --tool=drd --exclusive-threshold=10 ./hold_lock -i 20

        可以得到如下结果。其解读方式和之前一致(注意此处的代码行号是我文件中的行号,而非csdn显示的局部代码行号)。

代码语言:javascript
复制
==4074== 
Locking rwlock exclusively ...
==4074== Acquired at:
==4074==    at 0x4C41404: pthread_rwlock_wrlock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4074==    by 0x108EC6: write_lock (hold_lock.c:45)
==4074==    by 0x109033: main (hold_lock.c:81)
==4074== Lock on rwlock 0x1ffefffe50 was held during 22 ms (threshold: 10 ms).
==4074==    at 0x4C428D5: pthread_rwlock_unlock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4074==    by 0x108EDC: write_lock (hold_lock.c:47)
==4074==    by 0x109033: main (hold_lock.c:81)
==4074== rwlock 0x1ffefffe50 was first observed at:
==4074==    at 0x4C40685: pthread_rwlock_init (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4074==    by 0x108EBA: write_lock (hold_lock.c:44)
==4074==    by 0x109033: main (hold_lock.c:81)
==4074== 
Done.

        最后看一个读锁耗时长的场景

代码语言:javascript
复制
void read_lock(const int ms) {
  pthread_rwlock_t    rwlock;

  fprintf(stderr, "Locking rwlock shared ...\n");

  pthread_rwlock_init(&rwlock, 0);
  pthread_rwlock_rdlock(&rwlock);
  delay_ms(ms);
  pthread_rwlock_rdlock(&rwlock);
  pthread_rwlock_unlock(&rwlock);
  pthread_rwlock_unlock(&rwlock);
  pthread_rwlock_destroy(&rwlock);
}

        由于读锁不是独占锁,所以我们不能使用exclusive-threshold去分析,而是要使用shared-threshold

代码语言:javascript
复制
valgrind --tool=drd --shared-threshold=10 ./hold_lock -i 20

        其结果的解读和前面一致

代码语言:javascript
复制
Locking rwlock shared ...
==4122== Acquired at:
==4122==    at 0x4C40FB4: pthread_rwlock_rdlock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4122==    by 0x108F56: read_lock (hold_lock.c:57)
==4122==    by 0x10903D: main (hold_lock.c:82)
==4122== Lock on rwlock 0x1ffefffe50 was held during 21 ms (threshold: 10 ms).
==4122==    at 0x4C428D5: pthread_rwlock_unlock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4122==    by 0x108F84: read_lock (hold_lock.c:61)
==4122==    by 0x10903D: main (hold_lock.c:82)
==4122== rwlock 0x1ffefffe50 was first observed at:
==4122==    at 0x4C40685: pthread_rwlock_init (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==4122==    by 0x108F4A: read_lock (hold_lock.c:56)
==4122==    by 0x10903D: main (hold_lock.c:82)
==4122== 
Done.
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018年08月02日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档