提交 · a2f889939449193efd37454e6ed43d2ba9a46044 · openeuler / Kernel

08 12月, 2022 2 次提交

signal handling: don't use BUG_ON() for debugging · a2f88993

由 Xia Fukun 提交于 12月 08, 2022

stable inclusion
from stable-v4.19.267
commit 93d9cef55f8fe463e3b9f6c73c7a32619222c657
category: bugfix
bugzilla: 187828, https://gitee.com/openeuler/kernel/issues/I63UEU
CVE: NA

--------------------------------

[ Upstream commit a382f8fe ]

These are indeed "should not happen" situations, but it turns out recent
changes made the 'task_is_stopped_or_trace()' case trigger (fix for that
exists, is pending more testing), and the BUG_ON() makes it
unnecessarily hard to actually debug for no good reason.

It's been that way for a long time, but let's make it clear: BUG_ON() is
not good for debugging, and should never be used in situations where you
could just say "this shouldn't happen, but we can continue".

Use WARN_ON_ONCE() instead to make sure it gets logged, and then just
continue running.  Instead of making the system basically unusuable
because you crashed the machine while potentially holding some very core
locks (eg this function is commonly called while holding 'tasklist_lock'
for writing).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NXia Fukun <xiafukun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

a2f88993

ida: don't use BUG_ON() for debugging · eaab6483

由 Xia Fukun 提交于 12月 08, 2022

stable inclusion
from stable-v4.19.267
commit 33d2f83e3f2c1fdabb365d25bed3aa630041cbc0
category: bugfix
bugzilla: 188002, https://gitee.com/openeuler/kernel/issues/I63UEU
CVE: NA

--------------------------------

commit fc82bbf4 upstream.

This is another old BUG_ON() that just shouldn't exist (see also commit
a382f8fe: "signal handling: don't use BUG_ON() for debugging").

In fact, as Matthew Wilcox points out, this condition shouldn't really
even result in a warning, since a negative id allocation result is just
a normal allocation failure:

  "I wonder if we should even warn here -- sure, the caller is trying to
   free something that wasn't allocated, but we don't warn for
   kfree(NULL)"

and goes on to point out how that current error check is only causing
people to unnecessarily do their own index range checking before freeing
it.

This was noted by Itay Iellin, because the bluetooth HCI socket cookie
code does *not* do that range checking, and ends up just freeing the
error case too, triggering the BUG_ON().

The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
splat, but there really is no reason to BUG_ON() here, and we have
generally striven for allocation models where it's always ok to just do

    free(alloc());

even if the allocation were to fail for some random reason (usually
obviously that "random" reason being some resource limit).

Fixes: 88eca020 ("ida: simplified functions for id allocation")
Reported-by: NItay Iellin <ieitayie@gmail.com>
Suggested-by: NMatthew Wilcox <willy@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NXia Fukun <xiafukun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

eaab6483

06 12月, 2022 1 次提交

!272 [openEuler-1.0-LTS] Add MWAIT Cx support for Zhaoxin CPUs. · 75ea48ac

由 openeuler-ci-bot 提交于 12月 06, 2022

Merge Pull Request from: @leoliu-oc 
 
When the processor is idle，low-power idle states (C-states) can be used to save power. For Zhaoxin processors，there are two methods to enter idle states. One is HLT instruction and legacy method of I/O reads from the CPI-defined register (known as P_LVLx)，the other one is MWAIT instruction with idle states hints.

Default for legacy operating system，HLT and P_LVLx I/O reads are used for Zhaoxin Processors to enter idle states, but we have checked on some Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors.

### Issue
https://gitee.com/openeuler/kernel/issues/I62TOM

### Test
N/A

### Known Issue
N/A

### Default config change
N/A

 
 
Link:https://gitee.com/openeuler/kernel/pulls/272 
Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

75ea48ac

05 12月, 2022 3 次提交

Bluetooth: L2CAP: Fix u8 overflow · dde9a0f9

由 Sungwoo Kim 提交于 12月 05, 2022

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I63D3E
CVE: CVE-2022-45934

--------------------------------

By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases
multiple times and eventually it will wrap around the maximum number
(i.e., 255).
This patch prevents this by adding a boundary check with
L2CAP_MAX_CONF_RSP

Btmon log:
Bluetooth monitor ver 5.64
= Note: Linux version 6.1.0-rc2 (x86_64)                               0.264594
= Note: Bluetooth subsystem version 2.22                               0.264636
@ MGMT Open: btmon (privileged) version 1.22                  {0x0001} 0.272191
= New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0)          [hci0] 13.877604
@ RAW Open: 9496 (privileged) version 2.22                   {0x0002} 13.890741
= Open Index: 00:00:00:00:00:00                                [hci0] 13.900426
(...)
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #32 [hci0] 14.273106
        invalid packet size (12 != 1033)
        08 00 01 00 02 01 04 00 01 10 ff ff              ............
> ACL Data RX: Handle 200 flags 0x00 dlen 1547             #33 [hci0] 14.273561
        invalid packet size (14 != 1547)
        0a 00 01 00 04 01 06 00 40 00 00 00 00 00        ........@.....
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #34 [hci0] 14.274390
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04  ........@.......
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #35 [hci0] 14.274932
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00  ........@.......
= bluetoothd: Bluetooth daemon 5.43                                   14.401828
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #36 [hci0] 14.275753
        invalid packet size (12 != 1033)
        08 00 01 00 04 01 04 00 40 00 00 00              ........@...
Signed-off-by: NSungwoo Kim <iam@sung-woo.kim>
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

dde9a0f9

l2tp: Don't sleep and disable BH under writer-side sk_callback_lock · 08d3ac9e

由 Jakub Sitnicki 提交于 12月 05, 2022

mainline inclusion
from mainline-v6.1-rc7
commit af295e85
category: bugfix
bugzilla: 188056, https://gitee.com/src-openeuler/kernel/issues/I62RNU
CVE: CVE-2022-4129

--------------------------------

When holding a reader-writer spin lock we cannot sleep. Calling
setup_udp_tunnel_sock() with write lock held violates this rule, because we
end up calling percpu_down_read(), which might sleep, as syzbot reports
[1]:

 __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890
 percpu_down_read include/linux/percpu-rwsem.h:49 [inline]
 cpus_read_lock+0x1b/0x140 kernel/cpu.c:310
 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158
 udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline]
 setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81
 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509
 pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723

Trim the writer-side critical section for sk_callback_lock down to the
minimum, so that it covers only operations on sk_user_data.

Also, when grabbing the sk_callback_lock, we always need to disable BH, as
Eric points out. Failing to do so leads to deadlocks because we acquire
sk_callback_lock in softirq context, which can get stuck waiting on us if:

1) it runs on the same CPU, or

       CPU0
       ----
  lock(clock-AF_INET6);
  <Interrupt>
    lock(clock-AF_INET6);

2) lock ordering leads to priority inversion

       CPU0                    CPU1
       ----                    ----
  lock(clock-AF_INET6);
                               local_irq_disable();
                               lock(&tcp_hashinfo.bhash[i].lock);
                               lock(clock-AF_INET6);
  <Interrupt>
    lock(&tcp_hashinfo.bhash[i].lock);

... as syzbot reports [2,3]. Use the _bh variants for write_(un)lock.

[1] https://lore.kernel.org/netdev/0000000000004e78ec05eda79749@google.com/
[2] https://lore.kernel.org/netdev/000000000000e38b6605eda76f98@google.com/
[3] https://lore.kernel.org/netdev/000000000000dfa31e05eda76f75@google.com/

v2:
- Check and set sk_user_data while holding sk_callback_lock for both
  L2TP encapsulation types (IP and UDP) (Tetsuo)

Cc: Tom Parkin <tparkin@katalix.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Fixes: b68777d5 ("l2tp: Serialize access to sk_user_data with sk_callback_lock")
Reported-by: NEric Dumazet <edumazet@google.com>
Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com
Reported-by: syzbot+50680ced9e98a61f7698@syzkaller.appspotmail.com
Reported-by: syzbot+de987172bb74a381879b@syzkaller.appspotmail.com
Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLu Wei <luwei32@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

08d3ac9e

l2tp: Serialize access to sk_user_data with sk_callback_lock · 9e00bd66

由 Jakub Sitnicki 提交于 12月 05, 2022

mainline inclusion
from mainline-v6.1-rc6
commit b68777d5
category: bugfix
bugzilla: 188056, https://gitee.com/src-openeuler/kernel/issues/I62RNU
CVE: CVE-2022-4129

--------------------------------

sk->sk_user_data has multiple users, which are not compatible with each
other. Writers must synchronize by grabbing the sk->sk_callback_lock.

l2tp currently fails to grab the lock when modifying the underlying tunnel
socket fields. Fix it by adding appropriate locking.

We err on the side of safety and grab the sk_callback_lock also inside the
sk_destruct callback overridden by l2tp, even though there should be no
refs allowing access to the sock at the time when sk_destruct gets called.

v4:
- serialize write to sk_user_data in l2tp sk_destruct

v3:
- switch from sock lock to sk_callback_lock
- document write-protection for sk_user_data

v2:
- update Fixes to point to origin of the bug
- use real names in Reported/Tested-by tags

Cc: Tom Parkin <tparkin@katalix.com>
Fixes: 3557baab ("[L2TP]: PPP over L2TP driver core")
Reported-by: NHaowei Yan <g1042620637@gmail.com>
Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

conflict:
	net/l2tp/l2tp_core.c
Signed-off-by: NLu Wei <luwei32@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

9e00bd66

29 11月, 2022 5 次提交

!288 Add support for ConnectX6 Lx and ConnectX6Dx with openEuler inbox driver · 150b3f83

由 openeuler-ci-bot 提交于 11月 29, 2022

Merge Pull Request from: @meitingli 
 
Add the upcoming ConnectX-6 Dx, ConnectX-6 LX device ID.
In addition, add "ConnectX Family mlx5Gen Virtual Function" device ID. 
Every new HCA VF will be identified with this device ID. 
Different VF models will be distinguished by their revision id.
 
 
Link:https://gitee.com/openeuler/kernel/pulls/288 
Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

150b3f83

net/mlx5: Update the list of the PCI supported devices · a42e2dc5

由 Shani Shapp 提交于 11月 12, 2019

mainline inclusion
from mainline-v5.4
commit b7eca940
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63MQP
CVE: NA

----------------------------------------------------------------------

Add the upcoming ConnectX-6 LX device ID.

Fixes: 85327a9c ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: NShani Shapp <shanish@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NMeiting Li <limeiting1@huawei.com>

a42e2dc5

net/mlx5: Update the list of the PCI supported devices · 4975dd57

由 Eran Ben Elisha 提交于 1月 27, 2019

mainline inclusion
from mainline-v5.1-rc1
commit 85327a9c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63MQP
CVE: NA

----------------------------------------------------------------------

Add the upcoming ConnectX-6 Dx.

In addition, add "ConnectX Family mlx5Gen Virtual Function" device ID.
Every new HCA VF will be identified with this device ID. Different VF
models will be distinguished by their revision id.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NAya Levin <ayal@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NMeiting Li <limeiting1@huawei.com>

4975dd57

drivers: net: slip: fix NPD bug in sl_tx_timeout() · 599915d8

由 Duoming Zhou 提交于 11月 29, 2022

stable inclusion
from stable-4.19.239
commit 753b9d220a7d36dac70e7c6d05492d10d6f9dd36
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I62KQZ
CVE: CVE-2022-41858

--------------------------------

[ Upstream commit ec4eb8a8 ]

When a slip driver is detaching, the slip_close() will act to
cleanup necessary resources and sl->tty is set to NULL in
slip_close(). Meanwhile, the packet we transmit is blocked,
sl_tx_timeout() will be called. Although slip_close() and
sl_tx_timeout() use sl->lock to synchronize, we don`t judge
whether sl->tty equals to NULL in sl_tx_timeout() and the
null pointer dereference bug will happen.

   (Thread 1)                 |      (Thread 2)
                              | slip_close()
                              |   spin_lock_bh(&sl->lock)
                              |   ...
...                           |   sl->tty = NULL //(1)
sl_tx_timeout()               |   spin_unlock_bh(&sl->lock)
  spin_lock(&sl->lock);       |
  ...                         |   ...
  tty_chars_in_buffer(sl->tty)|
    if (tty->ops->..) //(2)   |
    ...                       |   synchronize_rcu()

We set NULL to sl->tty in position (1) and dereference sl->tty
in position (2).

This patch adds check in sl_tx_timeout(). If sl->tty equals to
NULL, sl_tx_timeout() will goto out.
Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
Reviewed-by: NJiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

599915d8

staging: rtl8712: fix use after free bugs · 3e0deabf

由 Dan Carpenter 提交于 11月 29, 2022

stable inclusion
from stable-v5.10.142
commit 19e3f69d19801940abc2ac37c169882769ed9770
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I62AX2
CVE: CVE-2022-4095

--------------------------------

_Read/Write_MACREG callbacks are NULL so the read/write_macreg_hdl()
functions don't do anything except free the "pcmd" pointer.  It
results in a use after free.  Delete them.

Fixes: 2865d42c ("staging: r8712u: Add the new driver to the mainline kernel")
Cc: stable <stable@kernel.org>
Reported-by: NZheng Wang <hackerzheng666@gmail.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/Yw4ASqkYcUhUfoY2@kiliSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

3e0deabf

27 11月, 2022 1 次提交

x86/tsc: use topology_max_packages() in tsc watchdog check · 4f283abb

由 Feng Tang 提交于 11月 27, 2022

hulk inclusion
category: bugfix
bugzilla: 187942, https://gitee.com/openeuler/kernel/issues/I5U037
CVE: NA

-------------------------------

Commit b50db709 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.

In it, the hardware socket number is a key factor for judging
whether to disable the watchdog for TSC, and 'nr_online_nodes' was
chosen as an estimation due to it is needed in early boot phase
before registering 'tsc-early' clocksource, where all none-boot
CPUs are not brought up yet.

In recent patch review, Dave Hansen pointed out there are many
cases that 'nr_online_nodes' could have issue, like:
* numa emulation (numa=fake=4 etc.)
* numa=off
* platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
  persistent memory nodes.

Peter Zijlstra suggested to use logical package ids, but it is
only usable after smp_init() and all CPUs are initialized.

One solution is to skip the watchdog for 'tsc-early' clocksource,
and move the check after smp_init(), while before 'tsc'
clocksoure is registered, where topology_max_packages() could
be used as a much more accurate socket number.
Signed-off-by: NFeng Tang <feng.tang@intel.com>

Conflict:
	arch/x86/kernel/tsc.c
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4f283abb

26 11月, 2022 2 次提交

scsi: hisi_sas: Set iptt aborted flag when receiving an abnormal CQ · 4cccc16a

由 Xingui Yang 提交于 11月 26, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I62ZXO
CVE: NA

------------------------------------------------

During the write I/O, when the SAS PHY switch is tested, the hardware
may reports two CQs for one IO. the first cq indicates invalid port when
DPH scheduling, the second cq indicates that response frame has been
written to the memory but the I/O is ended abnormally due to I/O data
underload. So set iptt aborted flag when receiving an abnormal CQ, then the
host will discards the IPTT frame received from the SAS hard disk.
Signed-off-by: NXingui Yang <yangxingui@huawei.com>
Reviewed-by: Nkang fenglong <kangfenglong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4cccc16a

ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 · bc9ebdce

由 Luís Henriques 提交于 11月 26, 2022

mainline inclusion
from mainline-v6.0-rc7
commit 29a5b8a1
category: bugfix
bugzilla: 187444, https://gitee.com/openeuler/kernel/issues/I6261Z
CVE: NA

--------------------------------

When walking through an inode extents, the ext4_ext_binsearch_idx() function
assumes that the extent header has been previously validated.  However, there
are no checks that verify that the number of entries (eh->eh_entries) is
non-zero when depth is > 0.  And this will lead to problems because the
EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:

[  135.245946] ------------[ cut here ]------------
[  135.247579] kernel BUG at fs/ext4/extents.c:2258!
[  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
[  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
[  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
[  135.256475] Code:
[  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
[  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
[  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
[  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
[  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
[  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
[  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
[  135.277952] Call Trace:
[  135.278635]  <TASK>
[  135.279247]  ? preempt_count_add+0x6d/0xa0
[  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
[  135.281612]  ? _raw_read_unlock+0x18/0x30
[  135.282704]  ext4_map_blocks+0x294/0x5a0
[  135.283745]  ? xa_load+0x6f/0xa0
[  135.284562]  ext4_mpage_readpages+0x3d6/0x770
[  135.285646]  read_pages+0x67/0x1d0
[  135.286492]  ? folio_add_lru+0x51/0x80
[  135.287441]  page_cache_ra_unbounded+0x124/0x170
[  135.288510]  filemap_get_pages+0x23d/0x5a0
[  135.289457]  ? path_openat+0xa72/0xdd0
[  135.290332]  filemap_read+0xbf/0x300
[  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
[  135.292192]  new_sync_read+0x103/0x170
[  135.293014]  vfs_read+0x15d/0x180
[  135.293745]  ksys_read+0xa1/0xe0
[  135.294461]  do_syscall_64+0x3c/0x80
[  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

This patch simply adds an extra check in __ext4_ext_check(), verifying that
eh_entries is not 0 when eh_depth is > 0.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
Cc: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Signed-off-by: NLuís Henriques <lhenriques@suse.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NBaokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

bc9ebdce

24 11月, 2022 1 次提交

Add MWAIT Cx support for Zhaoxin CPUs. · e1b6487f

由 leoliu 提交于 9月 28, 2022

zhaoxin inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I62TOM
CVE: NA

----------------------------------------------------------------

When the processor is idle，low-power idle states (C-states) can be used
to save power. For Zhaoxin processors，there are two methods to enter idle
states. One is HLT instruction and legacy method of I/O reads from the
ACPI-defined register (known as P_LVLx)，the other one is MWAIT
instruction with idle states hints.

Default for legacy operating system，HLT and P_LVLx I/O reads are used for
Zhaoxin Processors to enter idle states, but we have checked on some
Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O
reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors.
Signed-off-by: Nleoliu <leoliu@zhaoxin.com>

e1b6487f

21 11月, 2022 1 次提交

svm: Delete unused ioctl command · ace4dc02

由 Wang Wensheng 提交于 11月 21, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I61RA3
CVE: NA

-------------------------------

The following two ioctl commands are not in used at all. Delete those
implementation.

SVM_IOCTL_SET_RC
SVM_IOCTL_REMAP_PROC
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Nchenweilong <chenweilong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

ace4dc02

19 11月, 2022 4 次提交

Revert "posix-cpu-timers: Make timespec to nsec conversion safe" · 73149b1f

由 Yuyao Lin 提交于 11月 19, 2022

Offering: HULK
hulk inclusion
category: bugfix
bugzilla: 186339

--------------------------------

This reverts commit 098b0e01.

Function timespec64_to_ns() Add the upper and lower limits check in
commit c0b52741 ("time: Prevent undefined behaviour in timespec64_to_ns()"),
timespec64_to_ktime() only check the upper limits, so revert this
patch can fix overflow.
Signed-off-by: NYuyao Lin <linyuyao1@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>

73149b1f

block: limit request dispatch loop duration · 99aa811a

由 Shin'ichiro Kawasaki 提交于 11月 19, 2022

mainline inclusion
from mainline-v5.18-rc1
commit 572299f0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I61Z7G
CVE: NA

--------------------------------

When IO requests are made continuously and the target block device
handles requests faster than request arrival, the request dispatch loop
keeps on repeating to dispatch the arriving requests very long time,
more than a minute. Since the loop runs as a workqueue worker task, the
very long loop duration triggers workqueue watchdog timeout and BUG [1].

To avoid the very long loop duration, break the loop periodically. When
opportunity to dispatch requests still exists, check need_resched(). If
need_resched() returns true, the dispatch loop already consumed its time
slice, then reschedule the dispatch work and break the loop. With heavy
IO load, need_resched() does not return true for 20~30 seconds. To cover
such case, check time spent in the dispatch loop with jiffies. If more
than 1 second is spent, reschedule the dispatch work and break the loop.

[1]

[  609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s!
[  609.701820] Showing busy workqueues and worker pools:
[  609.707915] workqueue events: flags=0x0
[  609.712615]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  609.712626]     pending: drm_fb_helper_damage_work [drm_kms_helper]
[  609.712687] workqueue events_freezable: flags=0x4
[  609.732943]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  609.732952]     pending: pci_pme_list_scan
[  609.732968] workqueue events_power_efficient: flags=0x80
[  609.751947]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  609.751955]     pending: neigh_managed_work
[  609.752018] workqueue kblockd: flags=0x18
[  609.769480]   pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4
[  609.769488]     in-flight: 1020:blk_mq_run_work_fn
[  609.769498]     pending: blk_mq_timeout_work, blk_mq_run_work_fn
[  609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67
[  639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s!
[  639.909513] Showing busy workqueues and worker pools:
[  639.915404] workqueue events: flags=0x0
[  639.920197]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  639.920215]     pending: drm_fb_helper_damage_work [drm_kms_helper]
[  639.920365] workqueue kblockd: flags=0x18
[  639.939932]   pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4
[  639.939942]     in-flight: 1020:blk_mq_run_work_fn
[  639.939955]     pending: blk_mq_timeout_work, blk_mq_run_work_fn
[  639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67

Fixes: 6e6fcbc2 ("blk-mq: support batching dispatch in case of io")
Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/
Link: https://lore.kernel.org/r/20220318022641.133484-1-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

99aa811a

Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM · 814dc650

由 Luiz Augusto von Dentz 提交于 11月 19, 2022

mainline inclusion
from mainline-v6.1-rc4
commit 711f8c3f
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5ZNPH
CVE: CVE-2022-42896

--------------------------------

The Bluetooth spec states that the valid range for SPSM is from
0x0001-0x00ff so it is invalid to accept values outside of this range:

  BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
  page 1059:
  Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges

CVE: CVE-2022-42896
CC: stable@vger.kernel.org
Reported-by: NTamás Koczka <poprdi@google.com>
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: NTedd Ho-Jeong An <tedd.an@intel.com>
Conflicts:
	net/bluetooth/l2cap_core.c
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

814dc650

Bluetooth: L2CAP: Fix attempting to access uninitialized memory · 21714ac1

由 Luiz Augusto von Dentz 提交于 11月 19, 2022

stable inclusion
from stable-v4.19.265
commit 36919a82f335784d86b4def308739559bb47943d
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5ZNRS
CVE: CVE-2022-42895

--------------------------------

commit b1a2cd50 upstream.

On l2cap_parse_conf_req the variable efs is only initialized if
remote_efs has been set.

CVE: CVE-2022-42895
CC: stable@vger.kernel.org
Reported-by: NTamás Koczka <poprdi@google.com>
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: NTedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

21714ac1

15 11月, 2022 1 次提交

block: check flags of claimed slave bdev to fix uaf for bd_holder_dir · 7fcfabe9

由 Li Lingfeng 提交于 11月 15, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60QE9
CVE: NA

--------------------------------

As explained in 0eb44012 ("block: fix use after free for
bd_holder_dir"), we should make sure the "disk" is still live and
then grab a reference to 'bd_holder_dir'. However, the "disk"
should be "the claimed slave bdev" rather than "the holding disk".

Fixes: 0eb44012 ("block: fix use after free for bd_holder_dir")
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

7fcfabe9

14 11月, 2022 2 次提交

block: fix use after free for bd_holder_dir · 0eb44012

由 Yu Kuai 提交于 11月 14, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60QE9
CVE: NA

--------------------------------

Currently, the caller of bd_link_disk_holer() get 'bdev' by
blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'.
Howerver, it's possible that del_gendisk() can be called currently, and
'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus
use after free is triggered.

t1:				t2:
bdev = blkdev_get_by_dev
				del_gendisk
				 kobject_put(bd_holder_dir)
				  kobject_free()
bd_link_disk_holder

Fix the problem by checking disk is still live and grabbing a reference
to 'bd_holder_dir' first in bd_link_disk_holder().

Link: https://lore.kernel.org/all/20221103025541.1875809-3-yukuai1@huaweicloud.com/Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

0eb44012

Revert "block: Fix UAF in bd_link_disk_holder()" · 9237bc1d

由 Yu Kuai 提交于 11月 14, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60QE9
CVE: NA

--------------------------------

This reverts commit 01b1ec1d.

Official solution will be applied to mainline, and this solution can't
fix uaf for 'bd_holder_dir' thoroughly. hence revert this temporary
solution. Officail solution will be backported in the next patch.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

9237bc1d

08 11月, 2022 17 次提交

init/main.c: return 1 from handled __setup() functions · d484e833

由 Randy Dunlap 提交于 11月 08, 2022

stable inclusion
from stable-4.19.238
commit c7daf1b4ad809692d5c26f33c02ed8a031066548
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
CVE: NA

--------------------------------

[ Upstream commit f9a40b08 ]

initcall_blacklist() should return 1 to indicate that it handled its
cmdline arguments.

set_debug_rodata() should return 1 to indicate that it handled its
cmdline arguments.  Print a warning if the option string is invalid.

This prevents these strings from being added to the 'init' program's
environment as they are not init arguments/parameters.

Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reported-by: NIgor Zhbanov <i.zhbanov@omprussia.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

d484e833

x86/pm: Save the MSR validity status at context setup · 47477ca7

由 Pawan Gupta 提交于 11月 08, 2022

stable inclusion
from stable-4.19.238
commit c7daf1b4ad809692d5c26f33c02ed8a031066548
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
CVE: NA

--------------------------------

commit 73924ec4 upstream.

The mechanism to save/restore MSRs during S3 suspend/resume checks for
the MSR validity during suspend, and only restores the MSR if its a
valid MSR.  This is not optimal, as an invalid MSR will unnecessarily
throw an exception for every suspend cycle.  The more invalid MSRs,
higher the impact will be.

Check and save the MSR validity at setup.  This ensures that only valid
MSRs that are guaranteed to not throw an exception will be attempted
during suspend.

Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
Suggested-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

47477ca7

x86/speculation: Restore speculation related MSRs during S3 resume · 2b91784d

由 Pawan Gupta 提交于 11月 08, 2022

stable inclusion
from stable-4.19.238
commit c7daf1b4ad809692d5c26f33c02ed8a031066548
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
CVE: NA

--------------------------------

commit e2a1256b upstream.

After resuming from suspend-to-RAM, the MSRs that control CPU's
speculative execution behavior are not being restored on the boot CPU.

These MSRs are used to mitigate speculative execution vulnerabilities.
Not restoring them correctly may leave the CPU vulnerable.  Secondary
CPU's MSRs are correctly being restored at S3 resume by
identify_secondary_cpu().

During S3 resume, restore these MSRs for boot CPU when restoring its
processor state.

Fixes: 77243971 ("x86/bugs/intel: Set proper CPU features and setup RDS")
Reported-by: NNeelima Krishnan <neelima.krishnan@intel.com>
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

2b91784d

x86/cpu: Load microcode during restore_processor_state() · 27dd57ae

由 Borislav Petkov 提交于 11月 08, 2022

stable inclusion
from stable-4.19.238
commit c7daf1b4ad809692d5c26f33c02ed8a031066548
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
CVE: NA

--------------------------------

commit f9e14dbb upstream.

When resuming from system sleep state, restore_processor_state()
restores the boot CPU MSRs. These MSRs could be emulated by microcode.
If microcode is not loaded yet, writing to emulated MSRs leads to
unchecked MSR access error:

  ...
  PM: Calling lapic_suspend+0x0/0x210
  unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr)
  Call Trace:
    <TASK>
    ? restore_processor_state
    x86_acpi_suspend_lowlevel
    acpi_suspend_enter
    suspend_devices_and_enter
    pm_suspend.cold
    state_store
    kobj_attr_store
    sysfs_kf_write
    kernfs_fop_write_iter
    new_sync_write
    vfs_write
    ksys_write
    __x64_sys_write
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
   RIP: 0033:0x7fda13c260a7

To ensure microcode emulated MSRs are available for restoration, load
the microcode on the boot CPU before restoring these MSRs.

  [ Pawan: write commit message and productize it. ]

Fixes: e2a1256b ("x86/speculation: Restore speculation related MSRs during S3 resume")
Reported-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841
Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

27dd57ae

genirq: Synchronize interrupt thread startup · 950faec0

由 Thomas Pfaff 提交于 11月 08, 2022

stable inclusion
from stable-4.19.238
commit c7daf1b4ad809692d5c26f33c02ed8a031066548
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
CVE: NA

--------------------------------

commit 8707898e upstream.

A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:

   setserial
     open("/dev/ttyXXX")
       request_irq()
     do_stuff()
      -> serial interrupt
         -> wake(irq_thread)
	      desc->threads_active++;
     close()
       free_irq()
         kthread_stop(irq_thread)
     synchronize_irq() <- hangs because desc->threads_active != 0

The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.

This problem was introduced with commit 519cc865, which addressed a
interrupt sharing issue in the PCIe code.

Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.

To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.

This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.

Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().

This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.

[ tglx: Amended changelog ]

Fixes: 519cc865 ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: NThomas Pfaff <tpfaff@pcs.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMarc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

950faec0

nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices · 3b23e85f

由 Michael Kelley 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.261
commit 5f7fd71e5bebf337769f20dd125822ce63266e4d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit c292a337 ]

The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
non-functional on NVMe devices because the nvme_pr_clear()
and nvme_pr_release() functions set the IEKEY field incorrectly.
The IEKEY field should be set only when the key is zero (i.e,
not specified).  The current code does it backwards.

Furthermore, the NVMe spec describes the persistent
reservation "clear" function as an option on the reservation
release command. The current implementation of nvme_pr_clear()
erroneously uses the reservation register command.

Fix these errors. Note that NVMe version 1.3 and later specify
that setting the IEKEY field will return an error of Invalid
Field in Command.  The fix will set IEKEY when the key is zero,
which is appropriate as these ioctls consider a zero key to
be "unspecified", and the intention of the spec change is
to require a valid key.

Tested on a version 1.4 PCI NVMe device in an Azure VM.

Fixes: 1673f1f0 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Fixes: 1d277a63 ("NVMe: Add persistent reservation ops")
Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Conflicts:
	drivers/nvme/host/core.c
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

3b23e85f

once: add DO_ONCE_SLOW() for sleepable contexts · 3743e9b5

由 Eric Dumazet 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit f5686a03b138f6330eeda082ee4f96c8109f56f3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit 62c07983 ]

Christophe Leroy reported a ~80ms latency spike
happening at first TCP connect() time.

This is because __inet_hash_connect() uses get_random_once()
to populate a perturbation table which became quite big
after commit 4c2c8f03 ("tcp: increase source port perturb table to 2^16")

get_random_once() uses DO_ONCE(), which block hard irqs for the duration
of the operation.

This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
for operations where we prefer to stay in process context.

Then __inet_hash_connect() can use get_random_slow_once()
to populate its perturbation table.

Fixes: 4c2c8f03 ("tcp: increase source port perturb table to 2^16")
Fixes: 190cc824 ("tcp: change source port randomizarion at connect() time")
Reported-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#tSigned-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Tested-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

One conflict occurs because the commit 4c2c8f03 ("tcp: increase
source port perturb table to 2^16") is integrated but the commit
e9261476 ("tcp: dynamically allocate the perturb table used by
source port") is not integrated.
One conflict occurs because the commit 1027b96e ("once: Fix panic
when module unload") is not integrated.

Conflicts:
	net/ipv4/inet_hashtables.c
	lib/once.c
Signed-off-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

3743e9b5

inet: fully convert sk->sk_rx_dst to RCU rules · 4c4298bf

由 Eric Dumazet 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit 75a578000ae5e511e5d0e8433c94a14d9c99c412
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

commit 8f905c0e upstream.

syzbot reported various issues around early demux,
one being included in this changelog [1]

sk->sk_rx_dst is using RCU protection without clearly
documenting it.

And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
are not following standard RCU rules.

[a]    dst_release(dst);
[b]    sk->sk_rx_dst = NULL;

They look wrong because a delete operation of RCU protected
pointer is supposed to clear the pointer before
the call_rcu()/synchronize_rcu() guarding actual memory freeing.

In some cases indeed, dst could be freed before [b] is done.

We could cheat by clearing sk_rx_dst before calling
dst_release(), but this seems the right time to stick
to standard RCU annotations and debugging facilities.

[1]
BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204

CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
 __kasan_report mm/kasan/report.c:433 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
 dst_check include/net/dst.h:470 [inline]
 tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
 ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
 ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
 __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
 __netif_receive_skb_list net/core/dev.c:5608 [inline]
 netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
 gro_normal_list net/core/dev.c:5853 [inline]
 gro_normal_list net/core/dev.c:5849 [inline]
 napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
 __napi_poll+0xaf/0x440 net/core/dev.c:7023
 napi_poll net/core/dev.c:7090 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7177
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
 common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
 asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
RIP: 0033:0x7f5e972bfd57
Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
RSP: 002b:00007fff8a413210 EFLAGS: 00000283
RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
 </TASK>

Allocated by task 13:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
 kasan_slab_alloc include/linux/kasan.h:259 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3234 [inline]
 slab_alloc mm/slub.c:3242 [inline]
 kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
 dst_alloc+0x146/0x1f0 net/core/dst.c:92
 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
 ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
 ip_route_input_rcu net/ipv4/route.c:2470 [inline]
 ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
 ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
 ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
 __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
 __netif_receive_skb_list net/core/dev.c:5608 [inline]
 netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
 gro_normal_list net/core/dev.c:5853 [inline]
 gro_normal_list net/core/dev.c:5849 [inline]
 napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
 __napi_poll+0xaf/0x440 net/core/dev.c:7023
 napi_poll net/core/dev.c:7090 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7177
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Freed by task 13:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 kasan_set_track+0x21/0x30 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free mm/kasan/common.c:328 [inline]
 __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:1723 [inline]
 slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
 slab_free mm/slub.c:3513 [inline]
 kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
 dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
 rcu_do_batch kernel/rcu/tree.c:2506 [inline]
 rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Last potentially related work creation:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
 __call_rcu kernel/rcu/tree.c:2985 [inline]
 call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
 dst_release net/core/dst.c:177 [inline]
 dst_release+0x79/0xe0 net/core/dst.c:167
 tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0x134/0x3b0 net/core/sock.c:2768
 release_sock+0x54/0x1b0 net/core/sock.c:3300
 tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 sock_write_iter+0x289/0x3c0 net/socket.c:1057
 call_write_iter include/linux/fs.h:2162 [inline]
 new_sync_write+0x429/0x660 fs/read_write.c:503
 vfs_write+0x7cd/0xae0 fs/read_write.c:590
 ksys_write+0x1ee/0x250 fs/read_write.c:643
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88807f1cb700
 which belongs to the cache ip_dst_cache of size 176
The buggy address is located 58 bytes inside of
 176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
The buggy address belongs to the page:
page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
 prep_new_page mm/page_alloc.c:2418 [inline]
 get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
 alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
 alloc_slab_page mm/slub.c:1793 [inline]
 allocate_slab mm/slub.c:1930 [inline]
 new_slab+0x32d/0x4a0 mm/slub.c:1993
 ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
 slab_alloc_node mm/slub.c:3200 [inline]
 slab_alloc mm/slub.c:3242 [inline]
 kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
 dst_alloc+0x146/0x1f0 net/core/dst.c:92
 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
 __mkroute_output net/ipv4/route.c:2564 [inline]
 ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
 ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
 __ip_route_output_key include/net/route.h:126 [inline]
 ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
 ip_route_output_key include/net/route.h:142 [inline]
 geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
 geneve_xmit_skb drivers/net/geneve.c:899 [inline]
 geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
 __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
 netdev_start_xmit include/linux/netdevice.h:5008 [inline]
 xmit_one net/core/dev.c:3590 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
 __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1338 [inline]
 free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
 free_unref_page_prepare mm/page_alloc.c:3309 [inline]
 free_unref_page+0x19/0x690 mm/page_alloc.c:3388
 qlink_free mm/kasan/quarantine.c:146 [inline]
 qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
 kasan_slab_alloc include/linux/kasan.h:259 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3234 [inline]
 kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
 __alloc_skb+0x215/0x340 net/core/skbuff.c:414
 alloc_skb include/linux/skbuff.h:1126 [inline]
 alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
 sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
 mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
 add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
 add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
 mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
 mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
 mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445

Memory state around the buggy address:
 ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                        ^
 ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
 ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
[cmllamas: fixed trivial merge conflict]
Signed-off-by: NCarlos Llamas <cmllamas@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

Conflicts:
	net/ipv4/af_inet.c
Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4c4298bf

ext4: continue to expand file system when the target size doesn't reach · 3b8b0d3f

由 Jerry Lee 李修賢提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit f2180ad6a43501597d20eacad0c6f146c51d4bbd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

commit df3cb754 upstream.

When expanding a file system from (16TiB-2MiB) to 18TiB, the operation
exits early which leads to result inconsistency between resize2fs and
Ext4 kernel driver.

=== before ===
○ → resize2fs /dev/mapper/thin
resize2fs 1.45.5 (07-Jan-2020)
Filesystem at /dev/mapper/thin is mounted on /mnt/test; on-line resizing required
old_desc_blocks = 2048, new_desc_blocks = 2304
The filesystem on /dev/mapper/thin is now 4831837696 (4k) blocks long.

[  865.186308] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[  912.091502] dm-4: detected capacity change from 34359738368 to 38654705664
[  970.030550] dm-5: detected capacity change from 34359734272 to 38654701568
[ 1000.012751] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
[ 1000.012878] EXT4-fs (dm-5): resized filesystem to 4294967296

=== after ===
[  129.104898] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[  143.773630] dm-4: detected capacity change from 34359738368 to 38654705664
[  198.203246] dm-5: detected capacity change from 34359734272 to 38654701568
[  207.918603] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
[  207.918754] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
[  207.918758] EXT4-fs (dm-5): Converting file system to meta_bg
[  207.918790] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
[  221.454050] EXT4-fs (dm-5): resized to 4658298880 blocks
[  227.634613] EXT4-fs (dm-5): resized filesystem to 4831837696
Signed-off-by: NJerry Lee <jerrylee@qnap.com>
Link: https://lore.kernel.org/r/PU1PR04MB22635E739BD21150DC182AC6A18C9@PU1PR04MB2263.apcprd04.prod.outlook.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

3b8b0d3f

nvme: copy firmware_rev on each init · 8f5816c2

由 Keith Busch 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit 366a2b3110c69f919fb3277acc1a0bb8cd8a8dbd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit a8eb6c1b ]

The firmware revision can change on after a reset so copy the most
recent info each time instead of just the first time, otherwise the
sysfs firmware_rev entry may contain stale data.
Reported-by: NJeff Lien <jeff.lien@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

8f5816c2

net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory · 86774965

由 Liu Jian 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit 5fe03917bb017d9af68a95f989f1c122eebc69a6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit 3f8ef65a ]

Fixes the below NULL pointer dereference:

  [...]
  [   14.471200] Call Trace:
  [   14.471562]  <TASK>
  [   14.471882]  lock_acquire+0x245/0x2e0
  [   14.472416]  ? remove_wait_queue+0x12/0x50
  [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
  [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
  [   14.474318]  ? remove_wait_queue+0x12/0x50
  [   14.474907]  remove_wait_queue+0x12/0x50
  [   14.475480]  sk_stream_wait_memory+0x20d/0x340
  [   14.476127]  ? do_wait_intr_irq+0x80/0x80
  [   14.476704]  do_tcp_sendpages+0x287/0x600
  [   14.477283]  tcp_bpf_push+0xab/0x260
  [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
  [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
  [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
  [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
  [   14.480311]  sock_sendmsg+0x2d/0x40
  [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
  [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
  [   14.482048]  ___sys_sendmsg+0x78/0xb0
  [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
  [   14.483215]  ? __do_fault+0x2a/0x1a0
  [   14.483738]  ? do_fault+0x15e/0x5d0
  [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
  [   14.484874]  ? lock_is_held_type+0xdf/0x130
  [   14.485474]  ? find_held_lock+0x2d/0x90
  [   14.486046]  ? __sys_sendmsg+0x41/0x70
  [   14.486587]  __sys_sendmsg+0x41/0x70
  [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
  [   14.487822]  do_syscall_64+0x34/0x80
  [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [...]

The test scenario has the following flow:

thread1                               thread2
-----------                           ---------------
 tcp_bpf_sendmsg
  tcp_bpf_send_verdict
   tcp_bpf_sendmsg_redir              sock_close
    tcp_bpf_push_locked                 __sock_release
     tcp_bpf_push                         //inet_release
      do_tcp_sendpages                    sock->ops->release
       sk_stream_wait_memory          	   // tcp_close
          sk_wait_event                      sk->sk_prot->close
           release_sock(__sk);
            ***
                                                lock_sock(sk);
                                                  __tcp_close
                                                    sock_orphan(sk)
                                                      sk->sk_wq  = NULL
                                                release_sock
            ****
           lock_sock(__sk);
          remove_wait_queue(sk_sleep(sk), &wait);
             sk_sleep(sk)
             //NULL pointer dereference
             &rcu_dereference_raw(sk->sk_wq)->wait

While waiting for memory in thread1, the socket is released with its wait
queue because thread2 has closed it. This caused by tcp_bpf_send_verdict
didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1.

We should check if SOCK_DEAD flag is set on wakeup in sk_stream_wait_memory
before accessing the wait queue.
Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20220823133755.314697-2-liujian56@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

86774965

can: bcm: check the result of can_send() in bcm_can_tx() · 4b29a149

由 Ziyang Xuan 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit dae06957f856eb699f2a504a46891718c9b1e0d3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit 3fd7bfd2 ]

If can_send() fail, it should not update frames_abs counter
in bcm_can_tx(). Add the result check for can_send() in bcm_can_tx().
Suggested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Suggested-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/9851878e74d6d37aee2f1ee76d68361a46f89458.1663206163.git.william.xuanziyang@huawei.comAcked-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4b29a149

xfrm: Update ipcomp_scratches with NULL when freed · 16cebfda

由 Khalid Masum 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit 1e8abde895b3ac6a368cbdb372e8800c49e73a28
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit 8a04d2fc ]

Currently if ipcomp_alloc_scratches() fails to allocate memory
ipcomp_scratches holds obsolete address. So when we try to free the
percpu scratches using ipcomp_free_scratches() it tries to vfree non
existent vm area. Described below:

static void * __percpu *ipcomp_alloc_scratches(void)
{
        ...
        scratches = alloc_percpu(void *);
        if (!scratches)
                return NULL;
ipcomp_scratches does not know about this allocation failure.
Therefore holding the old obsolete address.
        ...
}

So when we free,

static void ipcomp_free_scratches(void)
{
        ...
        scratches = ipcomp_scratches;
Assigning obsolete address from ipcomp_scratches

        if (!scratches)
                return;

        for_each_possible_cpu(i)
               vfree(*per_cpu_ptr(scratches, i));
Trying to free non existent page, causing warning: trying to vfree
existent vm area.
        ...
}

Fix this breakage by updating ipcomp_scrtches with NULL when scratches
is freed
Suggested-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reported-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com
Tested-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com
Signed-off-by: NKhalid Masum <khalid.masum.92@gmail.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

16cebfda

tcp: annotate data-race around tcp_md5sig_pool_populated · e68b9f9b

由 Eric Dumazet 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit 5c4e1b8939195fe27b05d791577f92445b139a3e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit aacd467c ]

tcp_md5sig_pool_populated can be read while another thread
changes its value.

The race has no consequence because allocations
are protected with tcp_md5sig_mutex.

This patch adds READ_ONCE() and WRITE_ONCE() to document
the race and silence KCSAN.
Reported-by: NAbhishek Shah <abhishek.shah@columbia.edu>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e68b9f9b

tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited · 4e3f7a25

由 Neal Cardwell 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit a434d10e7a90e301ea4a1826ee758b53c79d7de8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

[ Upstream commit f4ce91ce ]

This commit fixes a bug in the tracking of max_packets_out and
is_cwnd_limited. This bug can cause the connection to fail to remember
that is_cwnd_limited is true, causing the connection to fail to grow
cwnd when it should, causing throughput to be lower than it should be.

The following event sequence is an example that triggers the bug:

 (a) The connection is cwnd_limited, but packets_out is not at its
     peak due to TSO deferral deciding not to send another skb yet.
     In such cases the connection can advance max_packets_seq and set
     tp->is_cwnd_limited to true and max_packets_out to a small
     number.

(b) Then later in the round trip the connection is pacing-limited (not
     cwnd-limited), and packets_out is larger. In such cases the
     connection would raise max_packets_out to a bigger number but
     (unexpectedly) flip tp->is_cwnd_limited from true to false.

This commit fixes that bug.

One straightforward fix would be to separately track (a) the next
window after max_packets_out reaches a maximum, and (b) the next
window after tp->is_cwnd_limited is set to true. But this would
require consuming an extra u32 sequence number.

Instead, to save space we track only the most important
information. Specifically, we track the strongest available signal of
the degree to which the cwnd is fully utilized:

(1) If the connection is cwnd-limited then we remember that fact for
the current window.

(2) If the connection not cwnd-limited then we track the maximum
number of outstanding packets in the current window.

In particular, note that the new logic cannot trigger the buggy
(a)/(b) sequence above because with the new logic a condition where
tp->packets_out > tp->max_packets_out can only trigger an update of
tp->is_cwnd_limited if tp->is_cwnd_limited is false.

This first showed up in a testing of a BBRv2 dev branch, but this
buggy behavior highlighted a general issue with the
tcp_cwnd_validate() logic that can cause cwnd to fail to increase at
the proper rate for any TCP congestion control, including Reno or
CUBIC.

Fixes: ca8a2263 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NKevin(Yudong) Yang <yyd@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4e3f7a25

ext4: fix null-ptr-deref in ext4_write_info · 4700f752

由 Baokun Li 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit 947264e00c46de19a016fd81218118c708fed2f3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

commit f9c1f248 upstream.

I caught a null-ptr-deref bug as follows:

==================================================================
KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
CPU: 1 PID: 1589 Comm: umount Not tainted 5.10.0-02219-dirty #339
RIP: 0010:ext4_write_info+0x53/0x1b0
[...]
Call Trace:
 dquot_writeback_dquots+0x341/0x9a0
 ext4_sync_fs+0x19e/0x800
 __sync_filesystem+0x83/0x100
 sync_filesystem+0x89/0xf0
 generic_shutdown_super+0x79/0x3e0
 kill_block_super+0xa1/0x110
 deactivate_locked_super+0xac/0x130
 deactivate_super+0xb6/0xd0
 cleanup_mnt+0x289/0x400
 __cleanup_mnt+0x16/0x20
 task_work_run+0x11c/0x1c0
 exit_to_user_mode_prepare+0x203/0x210
 syscall_exit_to_user_mode+0x5b/0x3a0
 do_syscall_64+0x59/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 ==================================================================

Above issue may happen as follows:
-------------------------------------
exit_to_user_mode_prepare
 task_work_run
  __cleanup_mnt
   cleanup_mnt
    deactivate_super
     deactivate_locked_super
      kill_block_super
       generic_shutdown_super
        shrink_dcache_for_umount
         dentry = sb->s_root
         sb->s_root = NULL              <--- Here set NULL
        sync_filesystem
         __sync_filesystem
          sb->s_op->sync_fs > ext4_sync_fs
           dquot_writeback_dquots
            sb->dq_op->write_info > ext4_write_info
             ext4_journal_start(d_inode(sb->s_root), EXT4_HT_QUOTA, 2)
              d_inode(sb->s_root)
               s_root->d_inode          <--- Null pointer dereference

To solve this problem, we use ext4_journal_start_sb directly
to avoid s_root being used.

Cc: stable@kernel.org
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220805123947.565152-1-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4700f752

Revert "fs: check FMODE_LSEEK to control internal pipe splicing" · d5e86eaf

由 Sasha Levin 提交于 11月 08, 2022

stable inclusion
from stable-v4.19.262
commit 6d43e94b8daf009694d709c5919d67067936577a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
CVE: NA

--------------------------------

This reverts commit fd0a6e99b61e6c08fa5cf585d54fd956f70c73a6.

Which was upstream commit 97ef77c5.

The commit is missing dependencies and breaks NFS tests, remove it for
now.
Reported-by: NSaeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

d5e86eaf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功