代码本来在Windows上开发的,功能基本完毕迁移到Linux上,结果一跑,乱象重重。这里只列出两个。
一崩溃:
/mnt/diskc/db/app/bin/mysqld: double free or corruption (out): 0x00007f09cc00c900 ***
======= Backtrace: =========
/lib64/libc.so.6[0x31434760e6]
/lib64/libc.so.6[0x3143478c13]
/lib64/libc.so.6(fclose+0x14d)[0x314346674d]
/mnt/diskc/db/app/bin/mysqld(……内部代码……)[0x781d59] …… /lib64/libpthread.so.0[0x3143c07851]
/lib64/libc.so.6(clone+0x6d)[0x31434e890d]
二挂死:
查询一张表时正常返回结果,然后正常shutdown关闭程序,就挂住了。用pstack可以打出调用栈来。
Thread 2 (Thread 0x7f8a6c074700 (LWP 1506)):
#0 0x00000031434f806e in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x0000003143466815 in _L_lock_36 () from /lib64/libc.so.6
#2 0x0000003143466652 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#3 0x00000000007828f6 in …………内部代码………… ()
#5 0x00000000005e9e7d in closefrm(TABLE*, bool) ()
#6 0x0000000000543122 in free_cache_entry(TABLE*) ()
#7 0x000000000054be7c in close_cached_tables(THD*, TABLE_LIST*, bool, unsigned long) ()
#8 0x000000000054c1a4 in table_def_start_shutdown() ()
#9 0x0000000000512f6c in clean_up(bool) ()
#10 0x00000000005131ee in unireg_end() ()
#11 0x0000000000514b82 in kill_server ()
#12 0x0000000000514c3e in kill_server_thread ()
#13 0x0000003143c07851 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031434e890d in clone () from /lib64/libc.so.6
一头乱麻,索性就从最容易重现的第二个挂死现象查起了。
谷歌一搜fclose __lll_lock_wait_private ,神啊,多是说程序clone之后再fclose有过这样的问题,有的还提供了对glibc的patch,确认了一下,我们的glibc库已经很新了。而且我们的程序虽然有多线程,也对一个文件打开过两次,但这两次操作是在一个线程内部的,所以跟网上的信息不太相符。
后来自己写了一个测试程序,在一个程序内两次fopen,再依次关闭,也都正常。又改成主线程里开一次,子线程里开一次,再关,仍都正常。
最后没办法仔细查代码,发现关于文件打开我们的流程是:
OpenFile(FILE* fp)
{
if (fp != NULL)
fclose(fp); fopen(fp,,,,,);
}
而在某些处理后需要关闭时,操作为:
fclose(fp);
虽然不太确定,但做事总要干净些,于是把关闭操作改为:
fclose(fp);
fp = NULL;
搞完了,一切正常了。说明我们有良好的编码习惯是多么重要。另外出Bug时还是要多查自己的代码,基础库的问题没那么容易被我们撞上的。
查查man fclose,说了两次关闭的结果然是不确定的。
RETURN VALUE
Upon successful completion 0 is returned. Otherwise, EOF is returned and errno is
set to indicate the error. In either case any further access (including
another call to fclose()) to the stream results in undefined behavior.
再附上我自己的测试多次fopen的代码。
#include <stdio.h>
#include <pthread.h> char szPath[312] = {0};
char readbuf[312] = {0}; void* thread_func(void* p)
{
printf("\n\n....hello from son thread %d\n", pthread_self());
FILE *fp_r = NULL;
fp_r = fopen(szPath, "rb+");
printf("fp_r=%d after fopen rb+ \n", fp_r); if (fp_r == NULL)
return NULL; if (fread(readbuf, 10, 1, fp_r) < 0 )
{
printf("read fp_r error!!\n");
return NULL;
}
printf("read fp_r ok!!\n");
sleep(10); fclose(fp_r);
printf("close fp_r ok!!\n"); pthread_exit(NULL);
} int main()
{
FILE *fp_w = NULL; pthread_t t_id ; sprintf(szPath, "/disk1/tmp"); fp_w = fopen(szPath, "wb+");
printf("fp_w=%d after fopen wb+ \n", fp_w); if (fp_w == NULL)
{
return 0;
} if (fwrite(szPath, sizeof(szPath), 1, fp_w) != 1)
{
return -1;
} fclose(fp_w);
printf("fp_w=%d after fclose wb+ \n", fp_w);
fp_w = NULL; fp_w = fopen(szPath, "rb+");
printf("fp_w=%d after fopen rb+ \n", fp_w); if (fp_w == NULL)
return -1; if (fread(readbuf, 10, 1, fp_w) < 0)
{
printf("read fp_w error!!\n");
return -1;
}
printf("read fp_w ok!!\n"); if (pthread_create(&t_id, NULL, &thread_func, (void*)&t_id) != 0)
{
printf(" pthread_create fail!! \n");
return -1;
}
printf("create thread done.\n");
sleep(3); fclose(fp_w);
printf("close fp_w ok!!\n");
pthread_join(t_id,NULL);
printf("-----------------------\n"); return 0;
}
编译命令为:gcc tfclose.c -lpthread