Update, 4/10 2012: Fixed by libc patch
更新,2012年4月10日:libc补丁修复
I have a problem canceling threads in pthread_cond_wait
, that use mutexes with the PTHREAD_PRIO_INHERIT
attribute set. This only happens on certain platforms though.
我有一个问题,在pthread_cond_wait中取消使用mutexes和pthread_prio_inheritance属性集的线程。
The following minimal example demonstrates this: (compile with g++ <filename>.cpp -lpthread
)
下面的示例演示了这一点:(用g++
#include <pthread.h>
#include <iostream>
pthread_mutex_t mutex;
pthread_cond_t cond;
void clean(void *arg) {
std::cout << "clean: Unlocking mutex..." << std::endl;
pthread_mutex_unlock((pthread_mutex_t*)arg);
std::cout << "clean: Mutex unlocked..." << std::endl;
}
void *threadFunc(void *arg) {
int ret = 0;
pthread_mutexattr_t mutexAttr;
ret = pthread_mutexattr_init(&mutexAttr); std::cout << "ret = " << ret << std::endl;
//Comment out the following line, and everything works
ret = pthread_mutexattr_setprotocol(&mutexAttr, PTHREAD_PRIO_INHERIT); std::cout << "ret = " << ret << std::endl;
ret = pthread_mutex_init(&mutex, &mutexAttr); std::cout << "ret = " << ret << std::endl;
ret = pthread_cond_init(&cond, 0); std::cout << "ret = " << ret << std::endl;
std::cout << "threadFunc: Init done, entering wait..." << std::endl;
pthread_cleanup_push(clean, (void *) &mutex);
ret = pthread_mutex_lock(&mutex); std::cout << "ret = " << ret << std::endl;
while(1) {
ret = pthread_cond_wait(&cond, &mutex); std::cout << "ret = " << ret << std::endl;
}
pthread_cleanup_pop(1);
return 0;
}
int main() {
pthread_t thread;
int ret = 0;
ret = pthread_create(&thread, 0, threadFunc, 0); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Thread created, waiting a bit..." << std::endl;
sleep(2);
std::cout << "main: Cancelling threadFunc..." << std::endl;
ret = pthread_cancel(thread); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Joining threadFunc..." << std::endl;
ret = pthread_join(thread, NULL); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Joined threadFunc, done!" << std::endl;
return 0;
}
Every time I run it, main()
hangs on pthread_join()
. A gdb backtrace shows the following:
每次运行它时,main()都挂在pthread_join()上。gdb回溯显示如下:
Thread 2 (Thread 0xb7d15b70 (LWP 257)):
#0 0xb7fde430 in __kernel_vsyscall ()
#1 0xb7fcf362 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xb7fcc9f9 in __condvar_w_cleanup () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:434
#3 0x08048fbe in threadFunc (arg=0x0) at /home/pthread_cond_wait.cpp:22
#4 0xb7fc8ca0 in start_thread (arg=0xb7d15b70) at pthread_create.c:301
#5 0xb7de73ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
Thread 1 (Thread 0xb7d166d0 (LWP 254)):
#0 0xb7fde430 in __kernel_vsyscall ()
#1 0xb7fc9d64 in pthread_join (threadid=3083950960, thread_return=0x0) at pthread_join.c:89
#2 0x0804914a in main () at /home/pthread_cond_wait.cpp:41
If PTHREAD_PRIO_INHERIT
isn't set on the mutex, everything works as it should, and the program exits cleanly.
如果在互斥对象上没有设置pthread_prio_inheritance,那么所有事情都将按其应有的方式运行,程序将干净地退出。
Platforms with problems:
平台的问题:
- Embedded AMD Fusion board, running a PTXDist based 32-bit Linux 3.2.9-rt16 (with RTpatch 16). We are using the newest OSELAS i686 cross toolchain (2011.11.1), using gcc 4.6.2, glibc 2.14.1, binutils 2.21.1a, kernel 2.6.39.
- 嵌入式AMD Fusion board,运行基于PTXDist的32位Linux 3.2.9-rt16(带有RTpatch 16)。我们使用最新的OSELAS i686交叉工具链(2011.11.1),使用gcc 4.6.2, glibc 2.14.1, binutils 2.21.1a,内核2.6.39。
- Same board with the 2011.03.1 toolchain also (gcc 4.5.2 / glibc 2.13 / binutils 2.18 / kernel 2.6.36).
- 与2011.03.1工具链相同的还有(gcc 4.5.2 / glibc 2.13 / binutils 2.18 / kernel 2.6.36)。
Platforms with no problems:
平台没有问题:
- Our own ARM-board, also running a PTXDist Linux (32-bit 2.6.29.6-rt23), using OSELAS arm-v4t cross toolchain (1.99.3) with gcc 4.3.2 / glibc 2.8 / binutils 2.18 / kernel 2.6.27.
- 我们自己的arm-v4t,也运行PTXDist Linux(32位2.6.29.6-rt23),使用OSELAS arm-v4t交叉工具链(1.99.3)和gcc 4.3.2 / glibc 2.8 / binutils 2.18 / kernel 2.6.27。
- My laptop (Intel Core i7), running 64-bit Ubuntu 11.04 (virtualized / kernel 2.6.38.15-generic), gcc 4.5.2 / eglibc 2.13-0ubuntu13.1 / binutils 2.21.0.20110327.
- 我的笔记本(Intel Core i7),运行64位Ubuntu 11.04(虚拟化/内核2.6.38.15通用),gcc 4.5.2 / eglibc 2.13-0ubuntu13.1 / binutils 2.21.0.20110327。
I have been looking around the net for solutions, and have come across a few patches that I've tried, but without any effect:
我一直在网上寻找解决方案,也遇到过一些我尝试过的补丁,但是没有任何效果:
- Making the condition variables priority inheritance aware.
- 让条件变量优先级继承意识。
- Handling EAGAIN from FUTEX_WAIT_REQUEUE_PI
- 处理从FUTEX_WAIT_REQUEUE_PI EAGAIN
Are we doing something wrong in our code, which just happens to work on certain platforms, or is this a bug in the underlying systems? If anyone has any idea about where to look, or knows of any patches or similar to try out, I'd be happy to hear about it.
我们是否在代码中做错了什么,而这恰好在某些平台上工作,或者这是底层系统中的错误?如果有人知道去哪里看,或者知道任何补丁或类似的尝试,我很高兴听到这个消息。
Thanks!
谢谢!
Updates:
更新:
- libc-help mailing list discussion
- libc-help邮件列表讨论
- glibc bug report
- glibc错误报告
1 个解决方案
#1
0
This has been fixed by a libc patch. I've confirmed it to work on my own problematic platform (our custom AMD Fusion board), patched onto glibc-2.14.1.
这已经被一个libc补丁修复了。我已经确认它在我自己的有问题的平台上工作(我们的定制AMD融合板),修补到glibc-2.14.1。
Thanks go out to Siddhesh Poyarekar for the fix!
谢谢你到悉达什·波亚雷卡尔那里去修理!
#1
0
This has been fixed by a libc patch. I've confirmed it to work on my own problematic platform (our custom AMD Fusion board), patched onto glibc-2.14.1.
这已经被一个libc补丁修复了。我已经确认它在我自己的有问题的平台上工作(我们的定制AMD融合板),修补到glibc-2.14.1。
Thanks go out to Siddhesh Poyarekar for the fix!
谢谢你到悉达什·波亚雷卡尔那里去修理!