- 28 6月, 2022 6 次提交
-
-
由 Takashi Iwai 提交于
stable inclusion from stable-v5.10.109 commit db03abd0dae07396559fd94b1a8ef54903be2073 bugzilla: https://gitee.com/openeuler/kernel/issues/I574AE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=db03abd0dae07396559fd94b1a8ef54903be2073 -------------------------------- commit 455c5653 upstream. This is essentially a revert of the commit dc865fb9 ("ASoC: sti: Use snd_pcm_stop_xrun() helper"), which converted the manual snd_pcm_stop() calls with snd_pcm_stop_xrun(). The commit above introduced a deadlock as snd_pcm_stop_xrun() itself takes the PCM stream lock while the caller already holds it. Since the conversion was done only for consistency reason and the open-call with snd_pcm_stop() to the XRUN state is a correct usage, let's revert the commit back as the fix. Fixes: dc865fb9 ("ASoC: sti: Use snd_pcm_stop_xrun() helper") Reported-by: NDaniel Palmer <daniel@0x0f.com> Cc: Arnaud POULIQUEN <arnaud.pouliquen@st.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220315091319.3351522-1-daniel@0x0f.comSigned-off-by: NTakashi Iwai <tiwai@suse.de> Reviewed-by: NArnaud Pouliquen <arnaud.pouliquen@foss.st.com> Link: https://lore.kernel.org/r/20220315164158.19804-1-tiwai@suse.deSigned-off-by: NMark Brown <broonie@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Oliver Graute 提交于
stable inclusion from stable-v5.10.109 commit 56dc187b35d5a0ac9d08560684721abf3aefa4df bugzilla: https://gitee.com/openeuler/kernel/issues/I574AE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=56dc187b35d5a0ac9d08560684721abf3aefa4df -------------------------------- commit b6821b0d upstream. In rare cases the display is flipped or mirrored. This was observed more often in a low temperature environment. A clean reset on init_display() should help to get registers in a sane state. Fixes: ef8f3177 (staging: fbtft: use init function instead of init sequence) Cc: stable@vger.kernel.org Signed-off-by: NOliver Graute <oliver.graute@kococonnector.com> Link: https://lore.kernel.org/r/20220210085322.15676-1-oliver.graute@kococonnector.com [sudip: adjust context] Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Tadeusz Struk 提交于
stable inclusion from stable-v5.10.109 commit 351493858ebc192c4526182f4c5819466e345659 bugzilla: https://gitee.com/openeuler/kernel/issues/I574AE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=351493858ebc192c4526182f4c5819466e345659 -------------------------------- commit 2e8e4c8f upstream. When an invalid (non existing) handle is used in a TPM command, that uses the resource manager interface (/dev/tpmrm0) the resource manager tries to load it from its internal cache, but fails and the tpm_dev_transmit returns an -EINVAL error to the caller. The existing async handler doesn't handle these error cases currently and the condition in the poll handler never returns mask with EPOLLIN set. The result is that the poll call blocks and the application gets stuck until the user_read_timer wakes it up after 120 sec. Change the tpm_dev_async_work function to handle error conditions returned from tpm_dev_transmit they are also reflected in the poll mask and a correct error code could passed back to the caller. Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: <linux-integrity@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Fixes: 9e1b74a6 ("tpm: add support for nonblocking operation") Tested-by: Jarkko Sakkinen<jarkko@kernel.org> Signed-off-by: NTadeusz Struk <tstruk@gmail.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org> Cc: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Michal Koutný 提交于
stable inclusion from stable-v5.10.109 commit ea21245cdcab3f2b46aecd421ac5f5753a1cf88d bugzilla: https://gitee.com/openeuler/kernel/issues/I574AE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea21245cdcab3f2b46aecd421ac5f5753a1cf88d -------------------------------- commit 467a726b upstream. The idea is to check: a) the owning user_ns of cgroup_ns, b) capabilities in init_user_ns. The commit 24f60085 ("cgroup-v1: Require capabilities to set release_agent") got this wrong in the write handler of release_agent since it checked user_ns of the opener (may be different from the owning user_ns of cgroup_ns). Secondly, to avoid possibly confused deputy, the capability of the opener must be checked. Fixes: 24f60085 ("cgroup-v1: Require capabilities to set release_agent") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/Signed-off-by: NMichal Koutný <mkoutny@suse.com> Reviewed-by: NMasami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Li 提交于
stable inclusion from stable-v5.10.109 commit 9eeaa2d7d58ae7fe66bdb016a03fe251c48fe222 bugzilla: https://gitee.com/openeuler/kernel/issues/I574AE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9eeaa2d7d58ae7fe66bdb016a03fe251c48fe222 -------------------------------- commit 839a534f upstream. In d_make_root, when we fail to allocate dentry for root inode, we will iput root inode and returned value is NULL in this function. So we do not need to release this inode again at d_make_root's caller. Signed-off-by: NChen Li <chenli@uniontech.com> Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Cc: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Tadeusz Struk 提交于
stable inclusion from stable-v5.10.109 commit ae8ec5eabb1a0e672e054ef50374f3d8508d6828 bugzilla: https://gitee.com/openeuler/kernel/issues/I574AE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ae8ec5eabb1a0e672e054ef50374f3d8508d6828 -------------------------------- commit 5e34af41 upstream. Syzbot found a kernel bug in the ipv6 stack: LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580 The reproducer triggers it by sending a crafted message via sendmmsg() call, which triggers skb_over_panic, and crashes the kernel: skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575 head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0 dev:<NULL> Update the check that prevents an invalid packet with MTU equal to the fregment header size to eat up all the space for payload. The reproducer can be found here: LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000 Reported-by: syzbot+e223cf47ec8ae183f2a0@syzkaller.appspotmail.com Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org> Acked-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220310232538.1044947-1-tadeusz.struk@linaro.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 22 6月, 2022 34 次提交
-
-
由 Huaixin Chang 提交于
mainline inclusion from mainline-v5.15-rc4 commit d73df887 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CPWE CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d73df887b6b8174dfbb7f5f878fbd1e0e2eb3f08 -------------------------------- Basic description of usage and effect for CFS Bandwidth Control Burst. Co-developed-by: NShanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NHuaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Acked-by: NTejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210830032215.16302-3-changhuaixin@linux.alibaba.comSigned-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Huaixin Chang 提交于
mainline inclusion from mainline-v5.15-rc4 commit bcb1704a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CPWE CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bcb1704a1ed2de580a46f28922e223a65f16e0f5 -------------------------------- Two new statistics are introduced to show the internal of burst feature and explain why burst helps or not. nr_bursts: number of periods bandwidth burst occurs burst_time: cumulative wall-time (in nanoseconds) that any cpus has used above quota in respective periods Co-developed-by: NShanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NHuaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Acked-by: NTejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210830032215.16302-2-changhuaixin@linux.alibaba.comSigned-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Huaixin Chang 提交于
mainline inclusion from mainline-v5.13-rc6 commit f4183717 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CPWE CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4183717b370ad28dd0c0d74760142b20e6e7931 -------------------------------- The CFS bandwidth controller limits CPU requests of a task group to quota during each period. However, parallel workloads might be bursty so that they get throttled even when their average utilization is under quota. And they are latency sensitive at the same time so that throttling them is undesired. We borrow time now against our future underrun, at the cost of increased interference against the other system users. All nicely bounded. Traditional (UP-EDF) bandwidth control is something like: (U = \Sum u_i) <= 1 This guaranteeds both that every deadline is met and that the system is stable. After all, if U were > 1, then for every second of walltime, we'd have to run more than a second of program time, and obviously miss our deadline, but the next deadline will be further out still, there is never time to catch up, unbounded fail. This work observes that a workload doesn't always executes the full quota; this enables one to describe u_i as a statistical distribution. For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100) (the traditional WCET). This effectively allows u to be smaller, increasing the efficiency (we can pack more tasks in the system), but at the cost of missing deadlines when all the odds line up. However, it does maintain stability, since every overrun must be paired with an underrun as long as our x is above the average. That is, suppose we have 2 tasks, both specify a p(95) value, then we have a p(95)*p(95) = 90.25% chance both tasks are within their quota and everything is good. At the same time we have a p(5)p(5) = 0.25% chance both tasks will exceed their quota at the same time (guaranteed deadline fail). Somewhere in between there's a threshold where one exceeds and the other doesn't underrun enough to compensate; this depends on the specific CDFs. At the same time, we can say that the worst case deadline miss, will be \Sum e_i; that is, there is a bounded tardiness (under the assumption that x+e is indeed WCET). The benefit of burst is seen when testing with schbench. Default value of kernel.sched_cfs_bandwidth_slice_us(5ms) and CONFIG_HZ(1000) is used. mkdir /sys/fs/cgroup/cpu/test echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us ./schbench -m 1 -t 3 -r 20 -c 80000 -R 10 The average CPU usage is at 80%. I run this for 10 times, and got long tail latency for 6 times and got throttled for 8 times. Tail latencies are shown below, and it wasn't the worst case. Latency percentiles (usec) 50.0000th: 19872 75.0000th: 21344 90.0000th: 22176 95.0000th: 22496 *99.0000th: 22752 99.5000th: 22752 99.9000th: 22752 min=0, max=22727 rps: 9.90 p95 (usec) 22496 p99 (usec) 22752 p95/cputime 28.12% p99/cputime 28.44% The interferenece when using burst is valued by the possibilities for missing the deadline and the average WCET. Test results showed that when there many cgroups or CPU is under utilized, the interference is limited. More details are shown in: https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/Co-developed-by: NShanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NTianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: NHuaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBen Segall <bsegall@google.com> Acked-by: NTejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210621092800.23714-2-changhuaixin@linux.alibaba.comSigned-off-by: NHui Tang <tanghui20@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 tatataeki 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4MC3F CVE: NA ---------------------------------- Multiple operations on cgroups in cgroup v1 are related to the status of the cgroup. The status of the current cgroup can be displayed in cgroupv2, but it cannot be displayed in cgroup v1, so the cgroup.flag_stat member is added in memory cgroup to display the status of the current cgroup and sub-cgroups. Testing result: List the status of user.slice [root@test user.slice]#cat memory.flag_stat NO_REF 0 ONLINE 1 RELEASED 0 VISIBLE 1 DYING 0 CHILD_NO_REF 0 CHILD_ONLINE 1 CHILD_RELEASED 0 CHILD_VISIBLE 1 CHILD_DYING 0 Create a new cgroup in user.slice [root@test user.slice]#mkdir user-test List the current status of user.slice after operation above [root@test user.slice]#cat memory.flag_stat NO_REF 0 ONLINE 1 RELEASED 0 VISIBLE 1 DYING 0 CHILD_NO_REF 0 CHILD_ONLINE 2 CHILD_RELEASED 0 CHILD_VISIBLE 2 CHILD_DYING 0 Signed-off-by: Ntatataeki <shengzeyu19_98@163.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Gou Hao 提交于
uniontech inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I40JRR CVE: NA ------------------- After alloc the sbi->persisters memory, dep_init will call dep_fini when error happened.Because sbi->persisters is not set to 0, -> dep_fini() can be called with sbi->persisters[] uninitialized, thus kthread_stop() can be called with random value. Signed-off-by: NGou Hao <gouhao@uniontech.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Namjae Jeon 提交于
mainline inclusion from mainline-v5.19-rc1 commit f26967b9 category: bugfix bugzilla: 186929, https://gitee.com/src-openeuler/kernel/issues/I5D82L CVE: CVE-2022-1973 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f26967b9f7a830e228bb13fb41bd516ddd9d789d ---------------------------------------------------------------- log_read_rst() returns ENOMEM error when there is not enough memory. In this case, if info is returned without initialization, it attempts to kfree the uninitialized info->r_page pointer. This patch moves the memset initialization code to before log_read_rst() is called. Reported-by: NGerald Lee <sundaywind2004@gmail.com> Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org> Signed-off-by: NKonstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 ChenXiaoSong 提交于
hulk inclusion category: bugfix bugzilla: 186345, https://gitee.com/openeuler/kernel/issues/I4T2WV CVE: NA -------------------------------- This reverts commit ce368536. filemap_sample_wb_err() will return 0 if nobody has seen the error yet, then filemap_check_wb_err() will return the unchanged writeback error, async write() will become sync write(). Reproducer: nfs server | nfs client --------------------------------|---------------------------------------------- # No space left on server | fallocate -l 100G /server/nospc | | | mount -t nfs $nfs_server_ip:/ /mnt | | # Expected error: No space left on device | dd if=/dev/zero of=/mnt/file count=1 ibs=1K | | # Release space on mountpoint | rm /mnt/nospc | | # Very very slow | dd if=/dev/zero of=/mnt/file count=1 ibs=1K Signed-off-by: NChenXiaoSong <chenxiaosong2@huawei.com> Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Reichl 提交于
mainline inclusion from stable-v5.13-rc1 commit 92cf7d36 category: bugfix bugzilla: 186908, https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=92cf7d36384b99d5a57bf4422904a3c16dc4527a -------------------------------- Skip the warnings about mount option being deprecated if we are remounting and deprecated option state is not changing. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211605Fix-suggested-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NPavel Reichl <preichl@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Reichl 提交于
mainline inclusion from stable-v5.13-rc1 commit 0f98b4ec category: bugfix bugzilla: 186908, https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f98b4ece18da9d8287bb4cc4e8f78b8760ea0d0 -------------------------------- Rename mp variable to parsisng_mp so it is easy to distinguish between current mount point handle and handle for mount point which mount options are being parsed. Suggested-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NPavel Reichl <preichl@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Conflicts: fs/xfs/xfs_super.c Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Xiyu Yang 提交于
mainline inclusion from mainline-v5.16-rc1 commit 31d21d21 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5C8IW CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=31d21d219b51dcfb16e18427eddae5394d402820 -------------------------------- refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations. Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: NXin Tan <tanxin.ctf@gmail.com> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1626674355-55795-1-git-send-email-xiyuyang19@fudan.edu.cnSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Michael Ellerman 提交于
mainline inclusion from mainline-v5.19-rc2 commit 8e127844 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5C43D?from=project-issue CVE: CVE-2022-32981 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=8e1278444446fc97778a5e5c99bca1ce0bbc5ec9 -------------------------------- The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process to read/write registers of another process. To get/set a register, the API takes an index into an imaginary address space called the "USER area", where the registers of the process are laid out in some fashion. The kernel then maps that index to a particular register in its own data structures and gets/sets the value. The API only allows a single machine-word to be read/written at a time. So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels. The way floating point registers (FPRs) are addressed is somewhat complicated, because double precision float values are 64-bit even on 32-bit CPUs. That means on 32-bit kernels each FPR occupies two word-sized locations in the USER area. On 64-bit kernels each FPR occupies one word-sized location in the USER area. Internally the kernel stores the FPRs in an array of u64s, or if VSX is enabled, an array of pairs of u64s where one half of each pair stores the FPR. Which half of the pair stores the FPR depends on the kernel's endianness. To handle the different layouts of the FPRs depending on VSX/no-VSX and big/little endian, the TS_FPR() macro was introduced. Unfortunately the TS_FPR() macro does not take into account the fact that the addressing of each FPR differs between 32-bit and 64-bit kernels. It just takes the index into the "USER area" passed from userspace and indexes into the fp_state.fpr array. On 32-bit there are 64 indexes that address FPRs, but only 32 entries in the fp_state.fpr array, meaning the user can read/write 256 bytes past the end of the array. Because the fp_state sits in the middle of the thread_struct there are various fields than can be overwritten, including some pointers. As such it may be exploitable. It has also been observed to cause systems to hang or otherwise misbehave when using gdbserver, and is probably the root cause of this report which could not be easily reproduced: https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/ Rather than trying to make the TS_FPR() macro even more complicated to fix the bug, or add more macros, instead add a special-case for 32-bit kernels. This is more obvious and hopefully avoids a similar bug happening again in future. Note that because 32-bit kernels never have VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to ensure that 32-bit && VSX is never enabled. Fixes: 87fec051 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: stable@vger.kernel.org # v3.13+ Reported-by: NAriel Miculas <ariel.miculas@belden.com> Tested-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.auSigned-off-by: NYipeng Zou <zouyipeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Reviewed-by: NLiao Chang <liaochang1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wenpeng Liang 提交于
mainline inclusion from mainline-v5.18-rc1 commit 73f7e056 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=73f7e05609ec Abstract the alloc_cqc() into several parts and separate the process unrelated to allocating CQC. Link: https://lore.kernel.org/r/20220302064830.61706-10-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chengchang Tang 提交于
mainline inclusion from mainline-v5.18-rc1 commit b65afbd2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=b65afbd2a05c Abstract the alloc_srqc() into several parts and separate the alloc_srqn() from the alloc_srqc(). Link: https://lore.kernel.org/r/20220302064830.61706-9-liangwenpeng@huawei.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wenpeng Liang 提交于
mainline inclusion from mainline-v5.18-rc1 commit 904de76c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=904de76c42b7 hns_roce_alloc_cmd_mailbox() never returns NULL, so the check should be IS_ERR(). And the error code should be converted as the function's return value. Link: https://lore.kernel.org/r/20220302064830.61706-8-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chengchang Tang 提交于
mainline inclusion from mainline-v5.18-rc1 commit cf7f8f5c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=cf7f8f5c1c54 Remove duplicate code for creating and destroying hardware contexts via mailbox. Link: https://lore.kernel.org/r/20220302064830.61706-7-liangwenpeng@huawei.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chengchang Tang 提交于
mainline inclusion from mainline-v5.18-rc1 commit 162e29fe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=162e29feabba The current mailbox functions have too many parameters, making the code difficult to maintain. So construct a new structure mbox_msg to pass the information needed by mailbox. Link: https://lore.kernel.org/r/20220302064830.61706-6-liangwenpeng@huawei.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wenpeng Liang 提交于
mainline inclusion from mainline-v5.18-rc1 commit e50cda2b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=e50cda2b9f84 The "op" field of the mailbox occupies 8 bits, so the parameter "op" should be of type u8. Link: https://lore.kernel.org/r/20220302064830.61706-5-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wenpeng Liang 提交于
mainline inclusion from mainline-v5.18-rc1 commit 479dc93b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=479dc93ba75d The parameter "out_param" of the mailbox is always null when the context is destroyed. So remove the function parameter "mailbox". Link: https://lore.kernel.org/r/20220302064830.61706-4-liangwenpeng@huawei.comSigned-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chengchang Tang 提交于
mainline inclusion from mainline-v5.18-rc1 commit 0018ed4b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=0018ed4bb07f The value of the function parameter "timeout" is unique. Therefore, it is unnecessary to specify the parameter "timeout" value each time. So remove it. Link: https://lore.kernel.org/r/20220302064830.61706-3-liangwenpeng@huawei.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com> Signed-off-by: NHaoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chengchang Tang 提交于
mainline inclusion from mainline-v5.18-rc1 commit 5a32949d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5A9XK cve: NA reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=5a32949d81cc The parameter "op_modifier" is only used for HIP06. It is useless for HIP08 and later versions. After removing HIP06, this parameter is no longer used, so remove it. Link: https://lore.kernel.org/r/20220302064830.61706-2-liangwenpeng@huawei.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com> Signed-off-by: NHaoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NZhengfeng Luo <luozhengfeng@h-partners.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Willy Tarreau 提交于
mainline inclusion from mainline-v5.18-rc6 commit 4c2c8f03 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5C3A9 CVE: CVE-2022-32296 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c2c8f03a5ab7cb04ec64724d7d176d00bcc91e5 -------------------------------- Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately identify a client by forcing it to emit only 40 times more connections than there are entries in the table_perturb[] table. The previous two improvements consisting in resalting the secret every 10s and adding randomness to each port selection only slightly improved the situation, and the current value of 2^8 was too small as it's not very difficult to make a client emit 10k connections in less than 10 seconds. Thus we're increasing the perturb table from 2^8 to 2^16 so that the same precision now requires 2.6M connections, which is more difficult in this time frame and harder to hide as a background activity. The impact is that the table now uses 256 kB instead of 1 kB, which could mostly affect devices making frequent outgoing connections. However such components usually target a small set of destinations (load balancers, database clients, perf assessment tools), and in practice only a few entries will be visited, like before. A live test at 1 million connections per second showed no performance difference from the previous value. Reported-by: NMoshe Kol <moshe.kol@mail.huji.ac.il> Reported-by: NYossi Gilad <yossi.gilad@mail.huji.ac.il> Reported-by: NAmit Klein <aksecurity@gmail.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Conflicts: net/ipv4/inet_hashtables.c Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Eric Dumazet 提交于
stable inclusion from stable-v5.10.119 commit 33f1b4a27abced7ae0f740d2ec3040debf7c4b3c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5C3A9 CVE: CVE-2022-32296 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=33f1b4a27abced7ae0f740d2ec3040debf7c4b3c -------------------------------- commit 190cc824 upstream. RFC 6056 (Recommendations for Transport-Protocol Port Randomization) provides good summary of why source selection needs extra care. David Dworken reminded us that linux implements Algorithm 3 as described in RFC 6056 3.3.3 Quoting David : In the context of the web, this creates an interesting info leak where websites can count how many TCP connections a user's computer is establishing over time. For example, this allows a website to count exactly how many subresources a third party website loaded. This also allows: - Distinguishing between different users behind a VPN based on distinct source port ranges. - Tracking users over time across multiple networks. - Covert communication channels between different browsers/browser profiles running on the same computer - Tracking what applications are running on a computer based on the pattern of how fast source ports are getting incremented. Section 3.3.4 describes an enhancement, that reduces attackers ability to use the basic information currently stored into the shared 'u32 hint'. This change also decreases collision rate when multiple applications need to connect() to different destinations. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDavid Dworken <ddworken@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NStefan Ghinea <stefan.ghinea@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: net/ipv4/inet_hashtables.c Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Baokun Li 提交于
hulk inclusion category: bugfix bugzilla: 186777, https://gitee.com/openeuler/kernel/issues/I5C568 CVE: NA -------------------------------- ext4_mb_normalize_request() can move logical start of allocated blocks to reduce fragmentation and better utilize preallocation. However logical block requested as a start of allocation (ac->ac_o_ex.fe_logical) should always be covered by allocated blocks so we should check that by modifying and to or in the assertion. Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Baokun Li 提交于
hulk inclusion category: bugfix bugzilla: 186777, https://gitee.com/openeuler/kernel/issues/I5C568 CVE: NA -------------------------------- Hulk Robot reported a BUG_ON: ================================================================== kernel BUG at fs/ext4/mballoc.c:3211! [...] RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f [...] Call Trace: ext4_mb_new_blocks+0x9df/0x5d30 ext4_ext_map_blocks+0x1803/0x4d80 ext4_map_blocks+0x3a4/0x1a10 ext4_writepages+0x126d/0x2c30 do_writepages+0x7f/0x1b0 __filemap_fdatawrite_range+0x285/0x3b0 file_write_and_wait_range+0xb1/0x140 ext4_sync_file+0x1aa/0xca0 vfs_fsync_range+0xfb/0x260 do_fsync+0x48/0xa0 [...] ================================================================== Above issue may happen as follows: ------------------------------------- do_fsync vfs_fsync_range ext4_sync_file file_write_and_wait_range __filemap_fdatawrite_range do_writepages ext4_writepages mpage_map_and_submit_extent mpage_map_one_extent ext4_map_blocks ext4_mb_new_blocks ext4_mb_normalize_request >>> start + size <= ac->ac_o_ex.fe_logical ext4_mb_regular_allocator ext4_mb_simple_scan_group ext4_mb_use_best_found ext4_mb_new_preallocation ext4_mb_new_inode_pa ext4_mb_use_inode_pa >>> set ac->ac_b_ex.fe_len <= 0 ext4_mb_mark_diskspace_used >>> BUG_ON(ac->ac_b_ex.fe_len <= 0); we can easily reproduce this problem with the following commands: `fallocate -l100M disk` `mkfs.ext4 -b 1024 -g 256 disk` `mount disk /mnt` `fsstress -d /mnt -l 0 -n 1000 -p 1` The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP. Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur when the size is truncated. So start should be the start position of the group where ac_o_ex.fe_logical is located after alignment. In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP is very large, the value calculated by start_off is more accurate. Fixes: cd648b8a ("ext4: trim allocation requests to group size") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhengchao Shao 提交于
hulk inclusion category: bugfix bugzilla: 186807 https://gitee.com/openeuler/kernel/issues/I5ATLD CVE: NA -------------------------------- When we clean up namespace, we have to notify every netdevice that dev is down. If device that registered too many, the notify time will take too many CPU time, It will course CPU soft-lockup issue. The reprocedure is followed: NIFS=50 for ((i=0; i<$NIFS; i++)) do ip netns add dummy-ns$i ip netns exec dummy-ns$i ip link set lo up done for ((j=0; j<$NIFS; j++)) do for ((i=0; i<1000; i++)) do if=eth$j$i ip netns exec dummy-ns$j ip link add $if type dummy ip netns exec dummy-ns$j ip link set $if up done done for ((i=0; i<$NIFS; i++)) do ip netns del dummy-ns$i done The test will result in the following stack. So clean up work must sleep for a while when notify device down/change. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u8:5:288] Modules linked in: CPU: 0 PID: 288 Comm: kworker/u8:5 Tainted: G B 5.10.0+ #5 Hardware name: linux,dummy-virt (DT) Workqueue: netns cleanup_net pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--) pc : atomic_set include/asm-generic/atomic-instrumented.h:46 [inline] pc : __alloc_skb+0x268/0x450 net/core/skbuff.c:241 lr : atomic_set include/asm-generic/atomic-instrumented.h:46 [inline] lr : __alloc_skb+0x268/0x450 net/core/skbuff.c:241 sp : ffff000015607610 x29: ffff000015607610 x28: 00000000ffffffff x27: 0000000000000001 x26: ffff0000cc9400e0 x25: ffff0000c745c1be x24: 1fffe00002ac0ed0 x23: 0000000000000000 x22: ffff0000cc9400c0 x21: ffff0000c745c234 x20: ffff0000cc940000 x19: ffff0000c745c140 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 1fffe00002ac0f00 x13: 0000000000000000 x12: ffff80001992801d x11: 1fffe0001992801c x10: ffff80001992801c x9 : dfffa00000000000 x8 : ffff0000cc9400e3 x7 : 0000000000000001 x6 : ffff80001992801c x5 : ffff0000cc9400e0 x4 : dfffa00000000000 x3 : ffffa00011529a78 x2 : 0000000000000003 x1 : 0000000000000000 x0 : ffff0000cc9400e0 Call trace: atomic_set include/asm-generic/atomic-instrumented.h:46 [inline] __alloc_skb+0x268/0x450 net/core/skbuff.c:241 alloc_skb include/linux/skbuff.h:1107 [inline] nlmsg_new include/net/netlink.h:958 [inline] rtmsg_ifa+0xf4/0x1e0 net/ipv4/devinet.c:1900 __inet_del_ifa+0x328/0x650 net/ipv4/devinet.c:427 inet_del_ifa net/ipv4/devinet.c:465 [inline] inetdev_destroy net/ipv4/devinet.c:318 [inline] inetdev_event+0x2ac/0xac0 net/ipv4/devinet.c:1599 notifier_call_chain kernel/notifier.c:83 [inline] raw_notifier_call_chain+0x94/0xd0 kernel/notifier.c:410 call_netdevice_notifiers_info+0x9c/0x14c net/core/dev.c:2047 call_netdevice_notifiers_extack net/core/dev.c:2059 [inline] call_netdevice_notifiers net/core/dev.c:2073 [inline] rollback_registered_many+0x3d0/0x7dc net/core/dev.c:9558 unregister_netdevice_many+0x40/0x1b0 net/core/dev.c:10779 default_device_exit_batch+0x24c/0x2a0 net/core/dev.c:11262 ops_exit_list+0xb4/0xd0 net/core/net_namespace.c:192 cleanup_net+0x2b8/0x540 net/core/net_namespace.c:608 process_one_work+0x3ec/0xa40 kernel/workqueue.c:2279 worker_thread+0x110/0x8b0 kernel/workqueue.c:2425 kthread+0x1ac/0x1fc kernel/kthread.c:313 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1034 Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Coly Li 提交于
mainline inclusion from v5.19-rc1 commit a1a2d8f0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A ----------------------------------------- The kworker routine update_writeback_rate() is schedued to update the writeback rate in every 5 seconds by default. Before calling __update_writeback_rate() to do real job, semaphore dc->writeback_lock should be held by the kworker routine. At the same time, bcache writeback thread routine bch_writeback_thread() also needs to hold dc->writeback_lock before flushing dirty data back into the backing device. If the dirty data set is large, it might be very long time for bch_writeback_thread() to scan all dirty buckets and releases dc->writeback_lock. In such case update_writeback_rate() can be starved for long enough time so that kernel reports a soft lockup warn- ing started like: watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713] Such soft lockup condition is unnecessary, because after the writeback thread finishes its job and releases dc->writeback_lock, the kworker update_writeback_rate() may continue to work and everything is fine indeed. This patch avoids the unnecessary soft lockup by the following method, - Add new member to struct cached_dev - dc->rate_update_retry (0 by default) - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock) firstly, if it fails then lock contention happens. - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_MAX_SKIPS (15), doesn't acquire the lock and reschedules the kworker for next try. - If dc->rate_update_retry > BCH_WBRATE_UPDATE_MAX_SKIPS, no retry anymore and call down_read(&dc->writeback_lock) to wait for the lock. By the above method, at worst case update_writeback_rate() may retry for 1+ minutes before blocking on dc->writeback_lock by calling down_read(). For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft lockup warning message can be avoided. When retrying to acquire dc->writeback_lock in update_writeback_rate(), of course the writeback rate cannot be updated. It is fair, because when the kworker is blocked on the lock contention of dc->writeback_lock, the writeback rate cannot be updated neither. This change follows Jens Axboe's suggestion to a more clear and simple version. Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220528124550.32834-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jia-Ju Bai 提交于
mainline inclusion from v5.19-rc1 commit 40f567bb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A ------------------------------------------ The function kzalloc() in detached_dev_do_request() can fail, so its return value should be checked. Fixes: bc082a55 ("bcache: fix inaccurate io state for detached bcache devices") Reported-by: NTOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220527152818.27545-4-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Coly Li 提交于
mainline inclusion from v5.19-rc1 commit 7d6b902e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A --------------------------------------- The local variables check_state (in bch_btree_check()) and state (in bch_sectors_dirty_init()) should be fully filled by 0, because before allocating them on stack, they were dynamically allocated by kzalloc(). Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Coly Li 提交于
mainline inclusion from v5.19-rc1 commit 32feee36 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A --------------------------------- The journal no-space deadlock was reported time to time. Such deadlock can happen in the following situation. When all journal buckets are fully filled by active jset with heavy write I/O load, the cache set registration (after a reboot) will load all active jsets and inserting them into the btree again (which is called journal replay). If a journaled bkey is inserted into a btree node and results btree node split, new journal request might be triggered. For example, the btree grows one more level after the node split, then the root node record in cache device super block will be upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no space in journal buckets, the journal replay has to wait for new journal bucket to be reclaimed after at least one journal bucket replayed. This is one example that how the journal no-space deadlock happens. The solution to avoid the deadlock is to reserve 1 journal bucket in run time, and only permit the reserved journal bucket to be used during cache set registration procedure for things like journal replay. Then the journal space will never be fully filled, there is no chance for journal no-space deadlock to happen anymore. This patch adds a new member "bool do_reserve" in struct journal, it is inititalized to 0 (false) when struct journal is allocated, and set to 1 (true) by bch_journal_space_reserve() when all initialization done in run_cache_set(). In the run time when journal_reclaim() tries to allocate a new journal bucket, free_journal_buckets() is called to check whether there are enough free journal buckets to use. If there is only 1 free journal bucket and journal->do_reserve is 1 (true), the last bucket is reserved and free_journal_buckets() will return 0 to indicate no free journal bucket. Then journal_reclaim() will give up, and try next time to see whetheer there is free journal bucket to allocate. By this method, there is always 1 jouranl bucket reserved in run time. During the cache set registration, journal->do_reserve is 0 (false), so the reserved journal bucket can be used to avoid the no-space deadlock. Reported-by: NNikhil Kshirsagar <nkshirsagar@gmail.com> Signed-off-by: NColy Li <colyli@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220524102336.10684-5-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Coly Li 提交于
mainline inclusion from v5.19-rc1 commit 80db4e47 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A ---------------------------------------- After making bch_sectors_dirty_init() being multithreaded, the existing incremental dirty sector counting in bch_root_node_dirty_init() doesn't release btree occupation after iterating 500000 (INIT_KEYS_EACH_TIME) bkeys. Because a read lock is added on btree root node to prevent the btree to be split during the dirty sectors counting, other I/O requester has no chance to gain the write lock even restart bcache_btree(). That is to say, the incremental dirty sectors counting is incompatible to the multhreaded bch_sectors_dirty_init(). We have to choose one and drop another one. In my testing, with 512 bytes random writes, I generate 1.2T dirty data and a btree with 400K nodes. With single thread and incremental dirty sectors counting, it takes 30+ minites to register the backing device. And with multithreaded dirty sectors counting, the backing device registration can be accomplished within 2 minutes. The 30+ minutes V.S. 2- minutes difference makes me decide to keep multithreaded bch_sectors_dirty_init() and drop the incremental dirty sectors counting. This is what this patch does. But INIT_KEYS_EACH_TIME is kept, in sectors_dirty_init_fn() the CPU will be released by cond_resched() after every INIT_KEYS_EACH_TIME keys iterated. This is to avoid the watchdog reports a bogus soft lockup warning. Fixes: b144e45f ("bcache: make bch_sectors_dirty_init() to be multithreaded") Signed-off-by: NColy Li <colyli@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220524102336.10684-4-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Coly Li 提交于
mainline inclusion from v5.19-rc1 commit 4dc34ae1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A ------------------------------------------- Commit b144e45f ("bcache: make bch_sectors_dirty_init() to be multithreaded") makes bch_sectors_dirty_init() to be much faster when counting dirty sectors by iterating all dirty keys in the btree. But it isn't in ideal shape yet, still can be improved. This patch does the following changes to improve current parallel dirty keys iteration on the btree, - Add read lock to root node when multiple threads iterating the btree, to prevent the root node gets split by I/Os from other registered bcache devices. - Remove local variable "char name[32]" and generate kernel thread name string directly when calling kthread_run(). - Allocate "struct bch_dirty_init_state state" directly on stack and avoid the unnecessary dynamic memory allocation for it. - Decrease BCH_DIRTY_INIT_THRD_MAX from 64 to 12 which is enough indeed. - Increase &state->started to count created kernel thread after it succeeds to create. - When wait for all dirty key counting threads to finish, use wait_event() to replace wait_event_interruptible(). With the above changes, the code is more clear, and some potential error conditions are avoided. Fixes: b144e45f ("bcache: make bch_sectors_dirty_init() to be multithreaded") Signed-off-by: NColy Li <colyli@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220524102336.10684-3-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Coly Li 提交于
mainline inclusion from v5.19-rc1 commit 62253644 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A ---------------------------------------- Commit 8e710227 ("bcache: make bch_btree_check() to be multithreaded") makes bch_btree_check() to be much faster when checking all btree nodes during cache device registration. But it isn't in ideal shap yet, still can be improved. This patch does the following thing to improve current parallel btree nodes check by multiple threads in bch_btree_check(), - Add read lock to root node while checking all the btree nodes with multiple threads. Although currently it is not mandatory but it is good to have a read lock in code logic. - Remove local variable 'char name[32]', and generate kernel thread name string directly when calling kthread_run(). - Allocate local variable "struct btree_check_state check_state" on the stack and avoid unnecessary dynamic memory allocation for it. - Reduce BCH_BTR_CHKTHREAD_MAX from 64 to 12 which is enough indeed. - Increase check_state->started to count created kernel thread after it succeeds to create. - When wait for all checking kernel threads to finish, use wait_event() to replace wait_event_interruptible(). With this change, the code is more clear, and some potential error conditions are avoided. Fixes: 8e710227 ("bcache: make bch_btree_check() to be multithreaded") Signed-off-by: NColy Li <colyli@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220524102336.10684-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mingzhe Zou 提交于
mainline inclusion from v5.18-rc1 commit 887554ab category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A ---------------------------------------- When multiple threads to check btree nodes in parallel, the main thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag: wait_event_interruptible(check_state->wait, atomic_read(&check_state->started) == 0 || test_bit(CACHE_SET_IO_DISABLE, &c->flags)); However, the bch_btree_node_read and bch_btree_node_read_done maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE will be set. If the flag already set, the main thread return error. At the same time, maybe some threads still running and read NULL pointer, the kernel will crash. This patch change the event wait condition, the main thread must wait for all threads to stop. Fixes: 8e710227 ("bcache: make bch_btree_check() to be multithreaded") Signed-off-by: NMingzhe Zou <mingzhe.zou@easystack.cn> Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mingzhe Zou 提交于
mainline inclusion from v5.18-rc1 commit 7b1002f7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue CVE: N/A -------------------------------------- When attaching a cached device (a.k.a backing device) to a cache device, bch_sectors_dirty_init() is called to count dirty sectors and stripes (see what bcache_dev_sectors_dirty_add() does) on the cache device. When bcache_dev_sectors_dirty_add() is called, set_bit(stripe, d->full_dirty_stripes) or clear_bit(stripe, d->full_dirty_stripes) operation will always be performed. In full_dirty_stripes, each 1bit represents stripe_size (8192) sectors (512B), so 1bit=4MB (8192*512), and each CPU cache line=64B=512bit=2048MB. When 20 threads process a cached disk with 100G dirty data, a single thread processes about 23M at a time, and 20 threads total 460M. These full_dirty_stripes bits corresponding to the 460M data is likely to fall in the same CPU cache line. When one of these threads performs a set_bit or clear_bit operation, the same CPU cache line of other threads will become invalid and must read the full_dirty_stripes from the main memory again. Compared with single thread, the time of a bcache_dev_sectors_dirty_add() call is increased by about 50 times in our test (100G dirty data, 20 threads, bcache_dev_sectors_dirty_add() is called more than 20 million times). This patch tries to test_bit before set_bit or clear_bit operation. Therefore, a lot of force set and clear operations will be avoided, and most of bcache_dev_sectors_dirty_add() calls will only read CPU cache line. Signed-off-by: NMingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-