Q1 gdb调试多线程 如何解死锁问题?
A1
说明:排版不是很好可以直接查看原文链接
gdb基本用法
info threads(show all thread) thread thread number (switch ) thread apply all break demo.cpp:42(all)
eg: 同一个功能A,创建N个线程 同一个功能B,创建M个线程 来抢夺和释放资源C,D
不清楚那个线程 有限占用或者释放资源
产生问题1 跟踪那个线程ID
代码实现顺序实际执行顺序是不一致的, 一般无法通过查看代码快速定位 thread ID id用那个呀?
thread apply all break demo.cpp:19 thread apply all break demo.cpp:42
pthread_mutex_t mymutex 公共资源:
两个线程同时抢占
p mymutex
显示了当前占用的线程(上图) 当前目前线程位于5882 但是被 __owner = 5883锁住
* 1 Thread 0x7ffff7fe1780 (LWP 5882。
2 Thread 0x7ffff6d6d700 (LWP 5883)
thread ID找到啦
总结:
pthread_mutex_t.data.owner
is a TID. pthread_t (frompthread_self()).
typedef union
{
struct __pthread_mutex_s
{
int __lock;
unsigned int __count;
int __owner;
#if __WORDSIZE == 64
unsigned int __nusers;
#endif
/* KIND must stay at this position in the structure to maintain
binary compatibility. */
int __kind;
#if __WORDSIZE == 64
int __spins;
__pthread_list_t __list;
# define __PTHREAD_MUTEX_HAVE_PREV 1
#else
unsigned int __nusers;
__extension__ union
{
int __spins;
__pthread_slist_t __list;
};
#endif
} __data;
char __size[__SIZEOF_PTHREAD_MUTEX_T];
long int __align;
} pthread_mutex_t;
产生问题2 gdb默认调试当前主线程
thread apply all command 表示 all 所有线程中相应的行上设置断点 你发现一个问题 调试期间(next)不断的不同线程来回切换, (如果谁发现不是麻烦告知) 线程是cpu调度的最小单位 因为分片原因 cpu不断在不同线程之间切换 注意不是进程进程可以理解为一个主线程
set scheduler-locking on
只调试当前线程
产生问题3 如果进程有fork 如何办?
If you need to debug the child process, after the start gdb:
(Gdb) set follow-fork-mode child off
查询正在调试的进程:info inferiors 切换调试的进程:inferior id
可通过日志或者其他方式打印超时锁 然后pstack +进程ID 查看堆栈信息
给出了详细的例子和说明
1.#include <pthread.h>2.int var = 0;3.void* child_fn ( void* arg ) {4. var++; /* Unprotected relative to parent */ /* this is line 6 */5. return NULL;6.}7.8.int main ( void ) {9. pthread_t child;10. pthread_create(&child, NULL, child_fn, NULL);11. var++; /* Unprotected relative to child */ /* this is line 13 */12. pthread_join(child, NULL);13. return 0;14.}
变量var没有加锁
1. ==7066== Possible data race during read of size 4 at 0x601040 by thread #12.==7066== Locks held: none3.==7066== at 0x4006C1: main (lock.cpp:13)4.==7066== 5.==7066== This conflicts with a previous write of size 4 by thread #26.==7066== Locks held: none7.==7066== at 0x400691: child_fn(void*) (lock.cpp:6)8.==7066== by 0x4C3094E: mythread_wrapper (hg_intercepts.c:389)9.==7066== by 0x50B2DF4: start_thread (in /usr/lib64/libpthread-2.17.so)10.==7066== by 0x5BDD1AC: clone (in /usr/lib64/libc-2.17.so)11.==7066== Address 0x601040 is 0 bytes inside data symbol "var"
参考
http://www.cnblogs.com/zhuyp1015/p/3618863.html kill -11不可取 用gcore http://blog.csdn.net/pbymw8iwm/article/details/7876797
pthread_mutex_t struct: What does lock stand for http://stackoverflow.com/questions/23449508/pthread-mutex-t-struct-what-does-lock-stand-for
Understanding deadlock behavior with gdb http://stackoverflow.com/questions/21017794/understanding-deadlock-behavior-with-gdb
Helgrind: a thread error detector http://valgrind.org/docs/manual/hg-manual.html