share_pool: Fix concurrency problem when a process adding sp_group is killed (34116d14) · 提交 · openeuler / Kernel

提交 34116d14 编写于 10月 30, 2021 作者: T Tang Yizhou 提交者： Yang Yingliang 10月 30, 2021

share_pool: Fix concurrency problem when a process adding sp_group is killed

ascend inclusion
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4EUVI
CVE: NA

-------------------------------------------------

We encounter a problem as follows:

[ 3057. 75094] share pool: task add group failed, current thread is killed
[ 3057. 75152] [ascend] [drv_buff] [buff_mv_pid_node_to_recycle_list 872] <rosnode:12273,12273> release empty list node pid 12273, group_id 1
[ 3057. 76380] [ascend] [ERROR] [drv_buff] [buff_req_ioctl_pid_add_group 443] <rosnode:12297,12297> pid add group failed, pid:12297, grp_id:1, ret -512
[ 3057. 76382] [ascend] [drv_buff] [buff_ioctl 841] <rosnode:12297,12297> buff_req_ioctl_handlers failed. ret:-512
[ 3057. 76452] Unable to handle kernel paging request at virtual address dead000000000108
[ 3057. 76454] Mem abort info:
[ 3057. 76456]   ESR = 0x96000044
[ 3057. 76457]   Exception class = DABT (current EL), IL = 32 bits
[ 3057. 76458]   SET = 0, FnV = 0
[ 3057. 76459]   EA = 0, S1PTW = 0
[ 3057. 76460] Data abort info:
[ 3057. 76461]   ISV = 0, ISS = 0x00000044
[ 3057. 76462]   CM = 0, WnR = 1
[ 3057. 76463] [dead000000000108] address between user and kernel address ranges
[ 3057. 76466] Internal error: Oops: 96000044 [#1] SMP
[ 3057. 76469] Process rosnode (pid: 12308, stack limit = 0x0000000012aa85df)
[ 3057. 76473] CPU: 10 PID: 12308 Comm: rosnode Tainted: P         C O      4.19.95-1.h1.AOS2.0.aarch64 #1
[ 3057. 76474] Hardware name: evb (DT)
[ 3057. 76476] pstate: 20400009 (nzCv daif +PAN -UAO)
[ 3057. 76483] pc : sp_group_exit+0x94/0x130
[ 3057. 76486] lr : sp_group_exit+0x48/0x130
[ 3057. 76486] sp : ffff00001a163c10
[ 3057. 76487] pmr_save: 000000e0
[ 3057. 76489] x29: ffff00001a163c10 x28: ffff800887e2a940
[ 3057. 76491] x27: 0000000000000000 x26: ffff800d8098ca40
[ 3057. 76492] x25: ffff80089a879168 x24: ffff00001a163dd0
[ 3057. 76494] x23: 0000000000000000 x22: 0000000000000002
[ 3057. 76495] x21: ffff800896e73088 x20: ffff80089a879100
[ 3057. 76496] x19: ffff800896e73000 x18: ffff7e002ca9a4f4
[ 3057. 76498] x17: 0000000000000001 x16: 0000000000000001
[ 3057. 76499] x15: 0400000000000000 x14: ffff800bd5d0d050
[ 3057. 76500] x13: 0000000000000001 x12: 0000000000000000
[ 3057. 76502] x11: 0000000000000000 x10: 00000000000009e0
[ 3057. 76503] x9 : ffff00001a163a90 x8 : ffff800887e2b380
[ 3057. 76505] x7 : 00000000000000b4 x6 : 0000001b5b9081bb
[ 3057. 76506] x5 : dead000000000100 x4 : dead000000000200
[ 3057. 76507] x3 : dead000000000100 x2 : dead000000000200
[ 3057. 76508] x1 : ffff800d81365400 x0 : ffff800896e73088
[ 3057. 76510] Call trace:
[ 3057. 76513]  sp_group_exit+0x94/0x130
[ 3057. 76517]  mmput+0x20/0x170
[ 3057. 76519]  do_exit+0x338/0xb38
[ 3057. 76520]  do_group_exit+0x3c/0xe8
[ 3057. 76522]  get_signal+0x14c/0x7d8
[ 3057. 76524]  do_signal+0x88/0x290
[ 3057. 76525]  do_notify_resume+0x150/0x3c8
[ 3057. 76528]  work_pending+0x8/0x10
[ 3057. 76530] Code: d2804004 f2fbd5a5 f2fbd5a4 aa1503e0 (f9000462)
[ 3057. 76534] [kbox] unable to set sctrl register, 				maybe the domain is not SD, continue
[ 3057. 76535] [kbox] catch die event on cpu 10
[ 3057. 76537] [kbox] catch die event, start logging
[ 3057. 76540] [kbox] die info:Oops:0044
[ 3057. 76540] [kbox] start to collect

If process A adds process B into an sp_group and B is killed at the
mean time, then the calling of sp_group_add_task for B is failed and

list_del(&mm->sp_node);

is executed. Notice there is also an execution of this code in
sp_group_exit for B, so mm->sp_node is double freed.

The addr of sp_node->next is LIST_POISON1, which is dead000000000108
in arm64.
Signed-off-by: NTang Yizhou <tangyizhou@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

上级 cc18ead6

隐藏空白更改

内联并排

浏览文件 @ 34116d14

@@ -3089,22 +3089,22 @@ void sp_group_exit(struct mm_struct *mm)
 	 * because the last owner of this mm is in exiting procedure:
 	 * do_exit() -> exit_mm() -> mmput() -> THIS function.
 	 */
 	down_write(&spg->rw_lock);
 	if (spg_valid(spg) && atomic_read(&mm->mm_users) == MM_WOULD_FREE) {
 	if (atomic_read(&mm->mm_users) == MM_WOULD_FREE) {
 		down_write(&spg->rw_lock);
 		/* a dead group should NOT be reactive again */
 		if (list_is_singular(&spg->procs))
 		if (spg_valid(spg) && list_is_singular(&spg->procs))
 			is_alive = spg->is_alive = false;
 		list_del(&mm->sp_node);   /* affect spg->procs */
 		if (mm->sp_group)  /* concurrency handle of sp_group_add_task */
 			list_del(&mm->sp_node);   /* affect spg->procs */
 		up_write(&spg->rw_lock);
 		if (!is_alive)
 			blocking_notifier_call_chain(&sp_notifier_chain, 0,
 						     mm->sp_group);
 		/* match with get_task_mm() in sp_group_add_task() */
 		atomic_dec(&mm->mm_users);
 		return;
+	}
 	up_write(&spg->rw_lock);
+}
 void sp_group_post_exit(struct mm_struct *mm)
-...

想要评论请注册或

openeuler / Kernel 大约 2 年 前同步成功

share_pool: Fix concurrency problem when a process adding sp_group is killed

openeuler / Kernel
大约 2 年前同步成功