linux 一个读写锁的异常导致的故障

时间:2021-09-01 23:48:07

环境信息:

WARNING: kernel version inconsistency between vmlinux and dumpfile

      KERNEL: vmlinux-47.90
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS:
DATE: Wed Nov ::
UPTIME: ::
LOAD AVERAGE: 484.39, 481.11, 385.18
TASKS:
NODENAME: ycby25-3kh_2
RELEASE: 3.0.-0.47.-default
VERSION: # SMP Wed Oct :: UTC (56c73f1)
MACHINE: x86_64 ( Mhz)
MEMORY: 255.6 GB
PANIC: "[18477.566692] Kernel panic - not syncing: hung_task: blocked tasks"
PID:
COMMAND: "khungtaskd"
TASK: ffff881fc096e080 [THREAD_INFO: ffff881fc0970000]
CPU:
STATE: TASK_RUNNING (PANIC)

dmesg最后的log信息:

[17013.334105] show_signal_msg:  callbacks suppressed
[17013.334110] CMoniterThread[]: segfault at ip 00007f13a100c699 sp 00007f138371fc30 error in libCommonUtilitiesLib.so[7f13a0fdd000+
4d000]
[18477.566473] INFO: task dev_rdwriter: blocked for more than seconds.
[18477.566475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18477.566477] dev_rdwriter D 0x00000000
[18477.566479] ffff88173e599d70 ffff88173e598010
[18477.566483] ffff88173e599fd8
[18477.566486] ffff88173e599fd8 ffff88173e5964c0 ffff881fc3378580
[18477.566489] Call Trace:
[18477.566499] [<ffffffff81467485>] rwsem_down_failed_common+0xb5/0x160
[18477.566505] [<ffffffff81264d13>] call_rwsem_down_write_failed+0x13/0x20
[18477.566509] [<ffffffff8146679c>] down_write+0x1c/0x20
[18477.566541] [<ffffffffa05ddb6c>] xfs_ilock+0xec/0x100 [xfs]
[18477.566629] [<ffffffffa0604e47>] xfs_file_fallocate+0xc7/0x190 [xfs]
[18477.566665] [<ffffffff8115d629>] do_fallocate+0x129/0x130
[18477.566669] [<ffffffff8115d676>] sys_fallocate+0x46/0x70
[18477.566673] [<ffffffff8146f5f2>] system_call_fastpath+0x16/0x1b
[18477.566690] [<00007f344662b010>] 0x7f344662b00f
[18477.566692] Kernel panic - not syncing: hung_task: blocked tasks
[18477.566698] Pid: , comm: khungtaskd Tainted: G ENX 3.0.-0.47.-default #
[18477.566701] Call Trace:
[18477.566707] [<ffffffff81004b95>] dump_trace+0x75/0x300
[18477.566712] [<ffffffff81464663>] dump_stack+0x69/0x6f
[18477.566717] [<ffffffff8146471f>] panic+0xb6/0x224
[18477.566722] [<ffffffff810c8731>] check_hung_uninterruptible_tasks+0x1e1/0x1f0
[18477.566726] [<ffffffff810c8787>] watchdog+0x47/0x50
[18477.566730] [<ffffffff810845f6>] kthread+0x96/0xa0
[18477.566735] [<ffffffff81470764>] kernel_thread_helper+0x4/0x10
23683 阻塞了很长时间,这个时间我们当时设置的是1200s。下面分析下阻塞的原因:
crash> bt
PID: TASK: ffff88173e5964c0 CPU: COMMAND: "dev_rdwriter"
# [ffff88173e599c30] schedule at ffffffff814652b9
# [ffff88173e599d78] rwsem_down_failed_common at ffffffff81467485
# [ffff88173e599dd8] call_rwsem_down_write_failed at ffffffff81264d13
# [ffff88173e599e18] down_write at ffffffff8146679c
# [ffff88173e599e20] xfs_ilock at ffffffffa05ddb6c [xfs]
# [ffff88173e599e50] xfs_file_fallocate at ffffffffa0604e47 [xfs]
# [ffff88173e599f20] do_fallocate at ffffffff8115d629
# [ffff88173e599f50] sys_fallocate at ffffffff8115d676
# [ffff88173e599f80] system_call_fastpath at ffffffff8146f5f2
RIP: 00007f344662b010 RSP: 00007f2db9d0a2a0 RFLAGS:
RAX: 000000000000011d RBX: ffffffff8146f5f2 RCX:
RDX: RSI: RDI:
RBP: R8: 00007f340a0ecb60 R9: 0000000000005c83
R10: R11: R12:
R13: 0000000009b20a20 R14: 00007f3408e01328 R15:
ORIG_RAX: 000000000000011d CS: SS: 002b

看堆栈是在循环等信号量:

static struct rw_semaphore __sched *
rwsem_down_failed_common(struct rw_semaphore *sem,
unsigned int flags, signed long adjustment)
{
struct rwsem_waiter waiter;
struct task_struct *tsk = current;
signed long count; set_task_state(tsk, TASK_UNINTERRUPTIBLE); /* set up my own style of waitqueue */
spin_lock_irq(&sem->wait_lock);
waiter.task = tsk;
waiter.flags = flags;
get_task_struct(tsk); if (list_empty(&sem->wait_list))
adjustment += RWSEM_WAITING_BIAS;
list_add_tail(&waiter.list, &sem->wait_list); /* we're now waiting on the lock, but no longer actively locking */
count = rwsem_atomic_update(adjustment, sem); /* If there are no active locks, wake the front queued process(es) up.
*
* Alternatively, if we're called from a failed down_write(), there
* were already threads queued before us and there are no active
* writers, the lock must be read owned; so we try to wake any read
* locks that were queued ahead of us. */
if (count == RWSEM_WAITING_BIAS)
sem = __rwsem_do_wake(sem, RWSEM_WAKE_NO_ACTIVE);
else if (count > RWSEM_WAITING_BIAS &&
adjustment == -RWSEM_ACTIVE_WRITE_BIAS)
sem = __rwsem_do_wake(sem, RWSEM_WAKE_READ_OWNED); spin_unlock_irq(&sem->wait_lock); /* wait to be given the lock */
for (;;) {--------------------进入了这个循环,一直没出来,判断的条件就是看等待的waiter的task是否为空。
if (!waiter.task)
break;
schedule();
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
} tsk->state = TASK_RUNNING; return sem;
}

由于处于uninterruptible时间超过阈值,所以最终hung的检测触发了crash。为什么判断waiter.task是否为NULL,是因为读写锁释放的时候,会查看等待队列,如果有waiter,则会

将waiter从等待队列中摘下,并且将 waiter->task = NULL; 然后唤醒等待者,等待者查看到自己 waiter.task是否为NULL来判断是否需要继续等待。

看看等待的信号量是什么:

crash> bt -f
PID: TASK: ffff88173e5964c0 CPU: COMMAND: "dev_rdwriter"
# [ffff88173e599c30] schedule at ffffffff814652b9
ffff88173e599c38: ffff88173e598010
ffff88173e599c48:
ffff88173e599c58:
ffff88173e599c68: ffff88173e599fd8 ffff88173e599fd8
ffff88173e599c78: ffff88173e5964c0
ffff88173e599c88: ffff881fc3378580 ffff881f22341000
ffff88173e599c98: 00000000ffffff9c ffffffff81169f93
ffff88173e599ca8: ffff88173e599d58 ffff88173e599de8
ffff88173e599cb8: ffff881a8a7b21c0 ffff881a8a7b21c0
ffff88173e599cc8: ffff88173d8aa0c0 ffff88173d8afd90
ffff88173e599cd8: ffff88173e599dd8
ffff88173e599ce8: ffff8805b87fa240 ffffffff811695d8
ffff88173e599cf8: ffff88173e599d48
ffff88173e599d08: ffff88173e599dd8 ffffffff8116db50
ffff88173e599d18: ffff88173e599e58 ffffffff8117661e
ffff88173e599d28: ffff88173e599de8 ffff88173e599e58
ffff88173e599d38: ffff8805b848f990 ffff88173e599e58
ffff88173e599d48: ffff8805b848f8a8
ffff88173e599d58: ffff8805b848f8b0 ffffffffffffffff
ffff88173e599d68: ffff88173e5964c0
ffff88173e599d78: ffffffff81467485
# [ffff88173e599d78] rwsem_down_failed_common at ffffffff81467485
ffff88173e599d80: ffff881706047ca8 ffff8805b848f8b8---------------------ffff881706047ca8就是waiter
ffff88173e599d90: ffff88173e5964c0 ffff881f00000002
ffff88173e599da0: ffff881f22341000
ffff88173e599db0: ffffffffffffffa1
ffff88173e599dc0: ffff8805b848f800 ffff8805b848f990
ffff88173e599dd0: ffffffff81264d13
# [ffff88173e599dd8] call_rwsem_down_write_failed at ffffffff81264d13
ffff88173e599de0:
ffff88173e599df0: ffff88171291ff20 ffff881ae2c64bc0
ffff88173e599e00:
ffff88173e599e10: ffff8805b848f8a8 ffffffff8146679c------------------ffff8805b848f8a8 就是信号量
# [ffff88173e599e18] down_write at ffffffff8146679c ffff88173e599e20: ffffffffa05ddb6c

根据反汇编,

0xffffffff81467476 <rwsem_down_failed_common+>:      cmpq   $0x0,0x10(%rsp)
0xffffffff8146747c <rwsem_down_failed_common+>: je 0xffffffff814674a9 <rwsem_down_failed_common+>
0xffffffff8146747e <rwsem_down_failed_common+>: xchg %ax,%ax
/usr/src/linux-3.0.-0.47./lib/rwsem.c:
0xffffffff81467480 <rwsem_down_failed_common+>: callq 0xffffffff81465600 <schedule>

可以确定rsp就是我们的waiter,也就是 ffff881706047ca8 。

由于call_rwsem_down_write_failed  不是c,所以继续往上回溯到xfs_ilock,找到xfs_inode_t为0xffff8805b848f800,读写信号量为:0xffff8805b848f8a8

crash> struct -x rw_semaphore 0xffff8805b848f8a8
struct rw_semaphore {
count = 0xffffffff00000001,
wait_lock = {
{
rlock = {
raw_lock = {
slock = 0x70007
}
}
}
},
wait_list = {
next = 0xffff88173e599d80,
prev = 0xffff881ec528bd80
}
}

根据rw_semaphore的结构,可以确定wait_list的地址为 ffff8805b848f8a8 + 0x10,也就是 ffff8805b848f8b8。

crash> list  rwsem_waiter.list -H ffff8805b848f8b8 -s rwsem_waiter.task
ffff88173e599d80
task = 0xffff88173e5964c0
ffff881706047ca8
task = 0xffff8817060442c0
ffff881706057ca8
task = 0xffff8817060543c0
ffff88170605bca8
task = 0xffff881706058400
ffff883ec2437d80
task = 0xffff883e23c2c380
ffff883e54a0fd80
task = 0xffff883998b36200
ffff881ec528bd80
task = 0xffff881cd766e300

因为添加到等待队列的时候,是加到队列尾,

    if (list_empty(&sem->wait_list))
adjustment += RWSEM_WAITING_BIAS;
list_add_tail(&waiter.list, &sem->wait_list);

所以第一个等待的task是list的头的next,也就是 ffff88173e599d80,对应的task是:

crash> task 0xffff88173e5964c0
PID: TASK: ffff88173e5964c0 CPU: COMMAND: "dev_rdwriter"
struct task_struct {
state = ,-----------------------------TASK_UNINTERRUPTIBLE
  stack = 0xffff88173e598000,
usage = {
counter =
},

然后对应的堆栈是:

crash> bt
PID: TASK: ffff88173e5964c0 CPU: COMMAND: "dev_rdwriter"
# [ffff88173e599c30] schedule at ffffffff814652b9
# [ffff88173e599d78] rwsem_down_failed_common at ffffffff81467485
# [ffff88173e599dd8] call_rwsem_down_write_failed at ffffffff81264d13
# [ffff88173e599e18] down_write at ffffffff8146679c
# [ffff88173e599e20] xfs_ilock at ffffffffa05ddb6c [xfs]
# [ffff88173e599e50] xfs_file_fallocate at ffffffffa0604e47 [xfs]
# [ffff88173e599f20] do_fallocate at ffffffff8115d629
# [ffff88173e599f50] sys_fallocate at ffffffff8115d676
# [ffff88173e599f80] system_call_fastpath at ffffffff8146f5f2
RIP: 00007f344662b010 RSP: 00007f2db9d0a2a0 RFLAGS:
RAX: 000000000000011d RBX: ffffffff8146f5f2 RCX:
RDX: RSI: RDI:
RBP: R8: 00007f340a0ecb60 R9: 0000000000005c83
R10: R11: R12:
R13: 0000000009b20a20 R14: 00007f3408e01328 R15:
ORIG_RAX: 000000000000011d CS: SS: 002b

很明显,这个第一个等待的进程是获取写锁,写锁是排他性的,不管是读优先还是写优先,正常情况下都不应该等待这么长时间,所以唯一的可能是,谁占了锁,没释放。

在遍历其他几个等待的task的时候,发现了如下信息:

crash> task 0xffff8817060442c0
PID: TASK: ffff8817060442c0 CPU: COMMAND: "21-IFileSender"
struct task_struct {
state = ,
stack = 0xffff881706046000,
usage = {
counter =
}, crash> bt
PID: TASK: ffff8817060442c0 CPU: COMMAND: "21-IFileSender"
# [ffff881706047b58] schedule at ffffffff814652b9
# [ffff881706047ca0] rwsem_down_failed_common at ffffffff81467485
# [ffff881706047d00] call_rwsem_down_read_failed at ffffffff81264ce4
# [ffff881706047d48] down_read at ffffffff8146677e
# [ffff881706047d50] xfs_ilock at ffffffffa05ddb3c [xfs]
# [ffff881706047d80] caq_xfs_file_splice_read at ffffffffa06a1850 [pagecachelimit]--------这个是我加的代码
# [ffff881706047dc0] splice_direct_to_actor at ffffffff81188c6c
# [ffff881706047e30] do_sendfile at ffffffffa07119ab [witdriver]
# [ffff881706047ec0] my_sendfile at ffffffffa071db3a [witdriver]
# [ffff881706047f80] system_call_fastpath at ffffffff8146f5f2
RIP: 00007f13992bf3c9 RSP: 00007f136ab5f3e8 RFLAGS:
RAX: 00000000000000b5 RBX: ffffffff8146f5f2 RCX: 00007f136ab5f42c
RDX: RSI: 00007f136ab5e730 RDI:
RBP: 00007f136ab5f800 R8: 00007f136ab5f800 R9:
R10: 0000000000049fff R11: R12:
R13: R14: R15: 0000000000042e28
ORIG_RAX: 00000000000000b5 CS: SS: 002b

看到了一个等待路径是自己加的代码,所以赶紧走查一下自己的代码,发现如下:

static ssize_t
caq_xfs_file_splice_read(
struct file *infilp,
loff_t *ppos,
struct pipe_inode_info *pipe,
size_t count,
unsigned int flags)
{
loff_t isize, left;
int ret; struct xfs_inode *ip = XFS_I(infilp->f_mapping->host); xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);------------------------------拿锁了 isize = i_size_read(infilp->f_mapping->host);
if (unlikely(*ppos >= isize))//读的起始位置居然超过文件大小
return ;------------------------------------------------------就这个地方出问题,没放锁就return 了 left = isize - *ppos;
if (unlikely(left < count))//保证读到文件结尾就结束
count = left; ret = caq___generic_file_splice_read(infilp, ppos, pipe, count,
flags);
if (ret > ) {
*ppos += ret;//文件当前偏移
} xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED); return ret;
}

代码bug很明显,在一个异常分支流程,拿读锁之后,就退出了,没有放锁。而触发这个bug的业务场景就是,多个线程写一个文件,多个线程读一个文件,读的传入的参数,超过了

文件的大小,从而触发。

linux 一个读写锁的异常导致的故障的更多相关文章

  1. linux中读写锁的rwlock介绍-nk&lowbar;ysg-ChinaUnix博客

    linux中读写锁的rwlock介绍-nk_ysg-ChinaUnix博客 linux中读写锁的rwlock介绍 2013-02-26 13:59:35 分类: C/C++   http://yaro ...

  2. linux使用读写锁pthread&lowbar;rwlock&lowbar;t

    转自:http://blog.csdn.net/onlyou930/article/details/6755593 使用读写锁 配置读写锁的属性之后,即可初始化读写锁.以下函数用于初始化或销毁读写锁. ...

  3. c&num; 写着玩的,两个Task并发&comma;一个写队列一个读队列的异常情况

    class Program { class TestEnqueue { static Queue<string> str = new Queue<string>(); publ ...

  4. 笔记2 linux多线程 读写锁

    //read write lock #include<stdio.h> #include<unistd.h> #include<pthread.h> struct ...

  5. Linux系统编程 —读写锁rwlock

    读写锁是另一种实现线程间同步的方式.与互斥量类似,但读写锁将操作分为读.写两种方式,可以多个线程同时占用读模式的读写锁,这样使得读写锁具有更高的并行性. 读写锁的特性为:写独占,读共享:写锁优先级高. ...

  6. Linux线程同步之读写锁(rwlock)

    读写锁和互斥量(互斥锁)很类似,是另一种线程同步机制,但不属于POSIX标准,可以用来同步同一进程中的各个线程.当然如果一个读写锁存放在多个进程共享的某个内存区中,那么还可以用来进行进程间的同步, 和 ...

  7. Linux程序设计学习笔记----多线程编程线程同步机制之相互排斥量&lpar;锁&rpar;与读写锁

    相互排斥锁通信机制 基本原理 相互排斥锁以排他方式防止共享数据被并发訪问,相互排斥锁是一个二元变量,状态为开(0)和关(1),将某个共享资源与某个相互排斥锁逻辑上绑定之后,对该资源的訪问操作例如以下: ...

  8. Linux:使用读写锁使线程同步

    基础与控制原语 读写锁 与互斥量类似,但读写锁允许更高的并行性.其特性为:写独占,读共享. 读写锁状态: 一把读写锁具备三种状态:     1. 读模式下加锁状态 (读锁)     2. 写模式下加锁 ...

  9. Linux环境编程之同步&lpar;三&rpar;:读写锁

    概述 相互排斥锁把试图进入我们称之为临界区的全部其它线程都堵塞住.该临界区通常涉及对由这些线程共享一个或多个数据的訪问或更新.读写锁在获取读写锁用于读某个数据和获取读写锁用于写直接作差别. 读写锁的分 ...

随机推荐

  1. 开机时进入 grub rescue&gt&semi;的解决方法

    本机是centOS7和win8的双系统 之前在win上把一个空的磁盘空间释放了 可能造成了grub的一些问题 具体还没有研究过 开机后无法正常进入grub引导画面 而是跳出一串英文+ grub res ...

  2. Source Insight使用技巧

    1. source insight必设: option-->key assignments-->    Edit: Delete Line ---- Assign New Key: Ctr ...

  3. mui对话框事件

    mui.confirm('生成成功,是否跳转到订单页面?','',['跳转','取消'],function(e){ if(e.index==0){ //点击跳转 }else if(e.index==1 ...

  4. Python内置函数&lpar;37&rpar;——sorted

    英文文档: sorted(iterable[, key][, reverse]) Return a new sorted list from the items in iterable. Has tw ...

  5. python3 stack&sol; queue和deque模块

    '''栈stack 先进后出FILO (first in last out)'''lst = []lst.append("张一山")lst.append("杨紫&quot ...

  6. &lpar;6&rpar;tcp-socket

    (1)client端口: import socket# 产生一个socket对象sk = socket.socket()# 建立连接sk.connect( ("127.0.0.1" ...

  7. AngulaJs -- 隔离作用域

    具有隔离作用域的指令最主要的使用场景是创建可复用的组件 创建具有隔离作用域的指令需要将scope属性设置为一个空对象{}.如果这样做了,指令的 模板就无法访问外部作用域了: <div ng-co ...

  8. PYTHON 写入list并换行的方法

    f.writelines(lists) 是不换行的写入,可用以下方法在写入时换行. 方法一: for line in lists: f.write(line+'\n') 方法二: lists=[lin ...

  9. Flutter - 下载别人的Flutter项目,本地编译不过

    如果直接下载了别人的Flutter项目,点击运行基本会不通过的,这是gradle版本差异造成的. 你需要修改android/gradle/wrapper/gradle-wrapper.properti ...

  10. 【链接】SpringBoot启动错误

    [错误解决]SpringBoot启动错误 https://blog.csdn.net/Small_Mouse0/article/details/78551900