提交 · b2090592b52c64fbd455b01ed9c3ccf43e82c2bb · openeuler / Kernel

27 1月, 2021 40 次提交

net: cdc_ncm: correct overhead in delayed_ndp_size · b2090592

由 Jouni K. Seppänen 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit b044a949a5c5ddbe61a806bba44aab6148a6f356
bugzilla: 47450

--------------------------------

[ Upstream commit 7a68d725 ]

Aligning to tx_ndp_modulus is not sufficient because the next align
call can be cdc_ncm_align_tail, which can add up to ctx->tx_modulus +
ctx->tx_remainder - 1 bytes. This used to lead to occasional crashes
on a Huawei 909s-120 LTE module as follows:

- the condition marked /* if there is a remaining skb [...] */ is true
  so the swaps happen
- skb_out is set from ctx->tx_curr_skb
- skb_out->len is exactly 0x3f52
- ctx->tx_curr_size is 0x4000 and delayed_ndp_size is 0xac
  (note that the sum of skb_out->len and delayed_ndp_size is 0x3ffe)
- the for loop over n is executed once
- the cdc_ncm_align_tail call marked /* align beginning of next frame */
  increases skb_out->len to 0x3f56 (the sum is now 0x4002)
- the condition marked /* check if we had enough room left [...] */ is
  false so we break out of the loop
- the condition marked /* If requested, put NDP at end of frame. */ is
  true so the NDP is written into skb_out
- now skb_out->len is 0x4002, so padding_count is minus two interpreted
  as an unsigned number, which is used as the length argument to memset,
  leading to a crash with various symptoms but usually including

> Call Trace:
>  <IRQ>
>  cdc_ncm_fill_tx_frame+0x83a/0x970 [cdc_ncm]
>  cdc_mbim_tx_fixup+0x1d9/0x240 [cdc_mbim]
>  usbnet_start_xmit+0x5d/0x720 [usbnet]

The cdc_ncm_align_tail call first aligns on a ctx->tx_modulus
boundary (adding at most ctx->tx_modulus-1 bytes), then adds
ctx->tx_remainder bytes. Alternatively, the next alignment call can
occur in cdc_ncm_ndp16 or cdc_ncm_ndp32, in which case at most
ctx->tx_ndp_modulus-1 bytes are added.

A similar problem has occurred before, and the code is nontrivial to
reason about, so add a guard before the crashing call. By that time it
is too late to prevent any memory corruption (we'll have written past
the end of the buffer already) but we can at least try to get a warning
written into an on-disk log by avoiding the hard crash caused by padding
past the buffer with a huge number of zeros.
Signed-off-by: NJouni K. Seppänen <jks@iki.fi>
Fixes: 4a0e3e98 ("cdc_ncm: Add support for moving NDP to end of NCM frame")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209407Reported-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b2090592

btrfs: shrink delalloc pages instead of full inodes · 69837585

由 Josef Bacik 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit e3b5252b5cdb4458527aa2356277700d21bf625f
bugzilla: 47450

--------------------------------

[ Upstream commit e076ab2a ]

Commit 38d715f4 ("btrfs: use btrfs_start_delalloc_roots in
shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing
some infrastructure we have in place to flush inodes that we use for
device replace and snapshot.  However this introduced a pretty serious
performance regression.  To reproduce the user untarred the source
tarball of Firefox (360MiB xz compressed/1.5GiB uncompressed), and would
see it take anywhere from 5 to 20 times as long to untar in 5.10
compared to 5.9. This was observed on fast devices (SSD and better) and
not on HDD.

The root cause is because before we would generally use the normal
writeback path to reclaim delalloc space, and for this we would provide
it with the number of pages we wanted to flush.  The referenced commit
changed this to flush that many inodes, which drastically increased the
amount of space we were flushing in certain cases, which severely
affected performance.

We cannot revert this patch unfortunately because of 3d45f221
("btrfs: fix deadlock when cloning inline extent and low on free
metadata space") which requires the ability to skip flushing inodes that
are being cloned in certain scenarios, which means we need to keep using
our flushing infrastructure or risk re-introducing the deadlock.

Instead to fix this problem we can go back to providing
btrfs_start_delalloc_roots with a number of pages to flush, and then set
up a writeback_control and utilize sync_inode() to handle the flushing
for us.  This gives us the same behavior we had prior to the fix, while
still allowing us to avoid the deadlock that was fixed by Filipe.  I
redid the users original test and got the following results on one of
our test machines (256GiB of ram, 56 cores, 2TiB Intel NVMe drive)

  5.9		0m54.258s
  5.10		1m26.212s
  5.10+patch	0m38.800s

5.10+patch is significantly faster than plain 5.9 because of my patch
series "Change data reservations to use the ticketing infra" which
contained the patch that introduced the regression, but generally
improved the overall ENOSPC flushing mechanisms.

Additional testing on consumer-grade SSD (8GiB ram, 8 CPU) confirm
the results:

  5.10.5            4m00s
  5.10.5+patch      1m08s
  5.11-rc2	    5m14s
  5.11-rc2+patch    1m30s
Reported-by: NRené Rebe <rene@exactcode.de>
Fixes: 38d715f4 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
CC: stable@vger.kernel.org # 5.10
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Tested-by: NDavid Sterba <dsterba@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ add my test results ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

69837585

btrfs: fix deadlock when cloning inline extent and low on free metadata space · 24648205

由 Filipe Manana 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 17243f73ad742363721e1288fb74e7b151c801f7
bugzilla: 47450

--------------------------------

[ Upstream commit 3d45f221 ]

When cloning an inline extent there are cases where we can not just copy
the inline extent from the source range to the target range (e.g. when the
target range starts at an offset greater than zero). In such cases we copy
the inline extent's data into a page of the destination inode and then
dirty that page. However, after that we will need to start a transaction
for each processed extent and, if we are ever low on available metadata
space, we may need to flush existing delalloc for all dirty inodes in an
attempt to release metadata space - if that happens we may deadlock:

* the async reclaim task queued a delalloc work to flush delalloc for
  the destination inode of the clone operation;

* the task executing that delalloc work gets blocked waiting for the
  range with the dirty page to be unlocked, which is currently locked
  by the task doing the clone operation;

* the async reclaim task blocks waiting for the delalloc work to complete;

* the cloning task is waiting on the waitqueue of its reservation ticket
  while holding the range with the dirty page locked in the inode's
  io_tree;

* if metadata space is not released by some other task (like delalloc for
  some other inode completing for example), the clone task waits forever
  and as a consequence the delalloc work and async reclaim tasks will hang
  forever as well. Releasing more space on the other hand may require
  starting a transaction, which will hang as well when trying to reserve
  metadata space, resulting in a deadlock between all these tasks.

When this happens, traces like the following show up in dmesg/syslog:

  [87452.323003] INFO: task kworker/u16:11:1810830 blocked for more than 120 seconds.
  [87452.323644]       Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  [87452.324248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [87452.324852] task:kworker/u16:11  state:D stack:    0 pid:1810830 ppid:     2 flags:0x00004000
  [87452.325520] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
  [87452.326136] Call Trace:
  [87452.326737]  __schedule+0x5d1/0xcf0
  [87452.327390]  schedule+0x45/0xe0
  [87452.328174]  lock_extent_bits+0x1e6/0x2d0 [btrfs]
  [87452.328894]  ? finish_wait+0x90/0x90
  [87452.329474]  btrfs_invalidatepage+0x32c/0x390 [btrfs]
  [87452.330133]  ? __mod_memcg_state+0x8e/0x160
  [87452.330738]  __extent_writepage+0x2d4/0x400 [btrfs]
  [87452.331405]  extent_write_cache_pages+0x2b2/0x500 [btrfs]
  [87452.332007]  ? lock_release+0x20e/0x4c0
  [87452.332557]  ? trace_hardirqs_on+0x1b/0xf0
  [87452.333127]  extent_writepages+0x43/0x90 [btrfs]
  [87452.333653]  ? lock_acquire+0x1a3/0x490
  [87452.334177]  do_writepages+0x43/0xe0
  [87452.334699]  ? __filemap_fdatawrite_range+0xa4/0x100
  [87452.335720]  __filemap_fdatawrite_range+0xc5/0x100
  [87452.336500]  btrfs_run_delalloc_work+0x17/0x40 [btrfs]
  [87452.337216]  btrfs_work_helper+0xf1/0x600 [btrfs]
  [87452.337838]  process_one_work+0x24e/0x5e0
  [87452.338437]  worker_thread+0x50/0x3b0
  [87452.339137]  ? process_one_work+0x5e0/0x5e0
  [87452.339884]  kthread+0x153/0x170
  [87452.340507]  ? kthread_mod_delayed_work+0xc0/0xc0
  [87452.341153]  ret_from_fork+0x22/0x30
  [87452.341806] INFO: task kworker/u16:1:2426217 blocked for more than 120 seconds.
  [87452.342487]       Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  [87452.343274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [87452.344049] task:kworker/u16:1   state:D stack:    0 pid:2426217 ppid:     2 flags:0x00004000
  [87452.344974] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
  [87452.345655] Call Trace:
  [87452.346305]  __schedule+0x5d1/0xcf0
  [87452.346947]  ? kvm_clock_read+0x14/0x30
  [87452.347676]  ? wait_for_completion+0x81/0x110
  [87452.348389]  schedule+0x45/0xe0
  [87452.349077]  schedule_timeout+0x30c/0x580
  [87452.349718]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [87452.350340]  ? lock_acquire+0x1a3/0x490
  [87452.351006]  ? try_to_wake_up+0x7a/0xa20
  [87452.351541]  ? lock_release+0x20e/0x4c0
  [87452.352040]  ? lock_acquired+0x199/0x490
  [87452.352517]  ? wait_for_completion+0x81/0x110
  [87452.353000]  wait_for_completion+0xab/0x110
  [87452.353490]  start_delalloc_inodes+0x2af/0x390 [btrfs]
  [87452.353973]  btrfs_start_delalloc_roots+0x12d/0x250 [btrfs]
  [87452.354455]  flush_space+0x24f/0x660 [btrfs]
  [87452.355063]  btrfs_async_reclaim_metadata_space+0x1bb/0x480 [btrfs]
  [87452.355565]  process_one_work+0x24e/0x5e0
  [87452.356024]  worker_thread+0x20f/0x3b0
  [87452.356487]  ? process_one_work+0x5e0/0x5e0
  [87452.356973]  kthread+0x153/0x170
  [87452.357434]  ? kthread_mod_delayed_work+0xc0/0xc0
  [87452.357880]  ret_from_fork+0x22/0x30
  (...)
  < stack traces of several tasks waiting for the locks of the inodes of the
    clone operation >
  (...)
  [92867.444138] RSP: 002b:00007ffc3371bbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
  [92867.444624] RAX: ffffffffffffffda RBX: 00007ffc3371bea0 RCX: 00007f61efe73f97
  [92867.445116] RDX: 0000000000000000 RSI: 0000560fbd5d7a40 RDI: 0000560fbd5d8960
  [92867.445595] RBP: 00007ffc3371beb0 R08: 0000000000000001 R09: 0000000000000003
  [92867.446070] R10: 00007ffc3371b996 R11: 0000000000000246 R12: 0000000000000000
  [92867.446820] R13: 000000000000001f R14: 00007ffc3371bea0 R15: 00007ffc3371beb0
  [92867.447361] task:fsstress        state:D stack:    0 pid:2508238 ppid:2508153 flags:0x00004000
  [92867.447920] Call Trace:
  [92867.448435]  __schedule+0x5d1/0xcf0
  [92867.448934]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [92867.449423]  schedule+0x45/0xe0
  [92867.449916]  __reserve_bytes+0x4a4/0xb10 [btrfs]
  [92867.450576]  ? finish_wait+0x90/0x90
  [92867.451202]  btrfs_reserve_metadata_bytes+0x29/0x190 [btrfs]
  [92867.451815]  btrfs_block_rsv_add+0x1f/0x50 [btrfs]
  [92867.452412]  start_transaction+0x2d1/0x760 [btrfs]
  [92867.453216]  clone_copy_inline_extent+0x333/0x490 [btrfs]
  [92867.453848]  ? lock_release+0x20e/0x4c0
  [92867.454539]  ? btrfs_search_slot+0x9a7/0xc30 [btrfs]
  [92867.455218]  btrfs_clone+0x569/0x7e0 [btrfs]
  [92867.455952]  btrfs_clone_files+0xf6/0x150 [btrfs]
  [92867.456588]  btrfs_remap_file_range+0x324/0x3d0 [btrfs]
  [92867.457213]  do_clone_file_range+0xd4/0x1f0
  [92867.457828]  vfs_clone_file_range+0x4d/0x230
  [92867.458355]  ? lock_release+0x20e/0x4c0
  [92867.458890]  ioctl_file_clone+0x8f/0xc0
  [92867.459377]  do_vfs_ioctl+0x342/0x750
  [92867.459913]  __x64_sys_ioctl+0x62/0xb0
  [92867.460377]  do_syscall_64+0x33/0x80
  [92867.460842]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  (...)
  < stack traces of more tasks blocked on metadata reservation like the clone
    task above, because the async reclaim task has deadlocked >
  (...)

Another thing to notice is that the worker task that is deadlocked when
trying to flush the destination inode of the clone operation is at
btrfs_invalidatepage(). This is simply because the clone operation has a
destination offset greater than the i_size and we only update the i_size
of the destination file after cloning an extent (just like we do in the
buffered write path).

Since the async reclaim path uses btrfs_start_delalloc_roots() to trigger
the flushing of delalloc for all inodes that have delalloc, add a runtime
flag to an inode to signal it should not be flushed, and for inodes with
that flag set, start_delalloc_inodes() will simply skip them. When the
cloning code needs to dirty a page to copy an inline extent, set that flag
on the inode and then clear it when the clone operation finishes.

This could be sporadically triggered with test case generic/269 from
fstests, which exercises many fsstress processes running in parallel with
several dd processes filling up the entire filesystem.

CC: stable@vger.kernel.org # 5.9+
Fixes: 05a5a762 ("Btrfs: implement full reflink support for inline extents")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

24648205

btrfs: skip unnecessary searches for xattrs when logging an inode · 5ca00258

由 Filipe Manana 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 87738164592fdd531b068d069911aaa9f3c41c9d
bugzilla: 47450

--------------------------------

[ Upstream commit f2f121ab ]

Every time we log an inode we lookup in the fs/subvol tree for xattrs and
if we have any, log them into the log tree. However it is very common to
have inodes without any xattrs, so doing the search wastes times, but more
importantly it adds contention on the fs/subvol tree locks, either making
the logging code block and wait for tree locks or making the logging code
making other concurrent operations block and wait.

The most typical use cases where xattrs are used are when capabilities or
ACLs are defined for an inode, or when SELinux is enabled.

This change makes the logging code detect when an inode does not have
xattrs and skip the xattrs search the next time the inode is logged,
unless the inode is evicted and loaded again or a xattr is added to the
inode. Therefore skipping the search for xattrs on inodes that don't ever
have xattrs and are fsynced with some frequency.

The following script that calls dbench was used to measure the impact of
this change on a VM with 8 CPUs, 16Gb of ram, using a raw NVMe device
directly (no intermediary filesystem on the host) and using a non-debug
kernel (default configuration on Debian distributions):

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdk
  MNT=/mnt/sdk
  MOUNT_OPTIONS="-o ssd"

  mkfs.btrfs -f -m single -d single $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  dbench -D $MNT -t 200 40

  umount $MNT

The results before this change:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    5761605     0.172   312.057
 Close        4232452     0.002    10.927
 Rename        243937     1.406   277.344
 Unlink       1163456     0.631   298.402
 Deltree          160    11.581   221.107
 Mkdir             80     0.003     0.005
 Qpathinfo    5221410     0.065   122.309
 Qfileinfo     915432     0.001     3.333
 Qfsinfo       957555     0.003     3.992
 Sfileinfo     469244     0.023    20.494
 Find         2018865     0.448   123.659
 WriteX       2874851     0.049   118.529
 ReadX        9030579     0.004    21.654
 LockX          18754     0.003     4.423
 UnlockX        18754     0.002     0.331
 Flush         403792    10.944   359.494

Throughput 908.444 MB/sec  40 clients  40 procs  max_latency=359.500 ms

The results after this change:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    6442521     0.159   230.693
 Close        4732357     0.002    10.972
 Rename        272809     1.293   227.398
 Unlink       1301059     0.563   218.500
 Deltree          160     7.796    54.887
 Mkdir             80     0.008     0.478
 Qpathinfo    5839452     0.047   124.330
 Qfileinfo    1023199     0.001     4.996
 Qfsinfo      1070760     0.003     5.709
 Sfileinfo     524790     0.033    21.765
 Find         2257658     0.314   125.611
 WriteX       3211520     0.040   232.135
 ReadX        10098969     0.004    25.340
 LockX          20974     0.003     1.569
 UnlockX        20974     0.002     3.475
 Flush         451553    10.287   331.037

Throughput 1011.77 MB/sec  40 clients  40 procs  max_latency=331.045 ms

+10.8% throughput, -8.2% max latency
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

5ca00258

scsi: ufs: Fix -Wsometimes-uninitialized warning · 78da0297

由 Arnd Bergmann 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit e28ace868c1e945f8c61cee147168e26d6c9f2d6
bugzilla: 47450

--------------------------------

[ Upstream commit 4c60244d ]

clang complains about a possible code path in which a variable is used
without an initialization:

drivers/scsi/ufs/ufshcd.c:7690:3: error: variable 'sdp' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
                BUG_ON(1);
                ^~~~~~~~~
include/asm-generic/bug.h:63:36: note: expanded from macro 'BUG_ON'
 #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                   ^~~~~~~~~~~~~~~~~~~

Turn the BUG_ON(1) into an unconditional BUG() that makes it clear to clang
that this code path is never hit.

Link: https://lore.kernel.org/r/20201203223137.1205933-1-arnd@kernel.org
Fixes: 4f3e900b ("scsi: ufs: Clear UAC for FFU and RPMB LUNs")
Reviewed-by: NAvri Altman <avri.altman@wdc.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

78da0297

io_uring: Fix return value from alloc_fixed_file_ref_node · 65a3dac2

由 Matthew Wilcox (Oracle) 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 458b40598dc0ccbbb1d3522f56a287ea0a127165
bugzilla: 47450

--------------------------------

[ Upstream commit 3e2224c5 ]

alloc_fixed_file_ref_node() currently returns an ERR_PTR on failure.
io_sqe_files_unregister() expects it to return NULL and since it can only
return -ENOMEM, it makes more sense to change alloc_fixed_file_ref_node()
to behave that way.

Fixes: 1ffc5422 ("io_uring: fix io_sqe_files_unregister() hangs")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

65a3dac2

drm/panfrost: Don't corrupt the queue mutex on open/close · 94be15c2

由 Steven Price 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 51495b719515ddae417e4bafc7e100c34833af4b
bugzilla: 47450

--------------------------------

[ Upstream commit a17d609e ]

The mutex within the panfrost_queue_state should have the lifetime of
the queue, however it was erroneously initialised/destroyed during
panfrost_job_{open,close} which is called every time a client
opens/closes the drm node.

Move the initialisation/destruction to panfrost_job_{init,fini} where it
belongs.

Fixes: 1a11a88c ("drm/panfrost: Fix job timeout handling")
Signed-off-by: NSteven Price <steven.price@arm.com>
Reviewed-by: NBoris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: NBoris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201029170047.30564-1-steven.price@arm.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

94be15c2

iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context · 58871446

由 Bjorn Andersson 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 9d7751a39a19b0090300b2b0498e397f9047e125
bugzilla: 47450

--------------------------------

[ Upstream commit aded8c7c ]

On SM8150 it's occasionally observed that the boot hangs in between the
writing of SMEs and context banks in arm_smmu_device_reset().

The problem seems to coincide with a display refresh happening after
updating the stream mapping, but before clearing - and there by
disabling translation - the context bank picked to emulate translation
bypass.

Resolve this by explicitly disabling the bypass context already in
cfg_probe.

Fixes: f9081b8f ("iommu/arm-smmu-qcom: Implement S2CR quirk")
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210106005038.4152731-1-bjorn.andersson@linaro.orgSigned-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

58871446

RDMA/hns: Avoid filling sl in high 3 bits of vlan_id · b971c668

由 Weihang Li 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 85bbe2e64ab430af3c27a0bc4e22dae04a5e10e6
bugzilla: 47450

--------------------------------

[ Upstream commit 94a8c4df ]

Only the low 12 bits of vlan_id is valid, and service level has been
filled in Address Vector. So there is no need to fill sl in vlan_id in
Address Vector.

Fixes: 7406c003 ("RDMA/hns: Only record vlan info for HIP08")
Link: https://lore.kernel.org/r/1607650657-35992-5-git-send-email-liweihang@huawei.comSigned-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b971c668

io_uring: patch up IOPOLL overflow_flush sync · 52927550

由 Pavel Begunkov 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 85e25e2370a20352b72af34940fb32746a64fc28
bugzilla: 47450

--------------------------------

commit 6c503150 upstream

IOPOLL skips completion locking but keeps it under uring_lock, thus
io_cqring_overflow_flush() and so io_cqring_events() need additional
locking with uring_lock in some cases for IOPOLL.

Remove __io_cqring_overflow_flush() from io_cqring_events(), introduce a
wrapper around flush doing needed synchronisation and call it by hand.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

52927550

io_uring: limit {io|sq}poll submit locking scope · 6e523e3c

由 Pavel Begunkov 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit bc924dd21ecf8a8363091ef02fdac3115d024b91
bugzilla: 47450

--------------------------------

commit 89448c47 upstream

We don't need to take uring_lock for SQPOLL|IOPOLL to do
io_cqring_overflow_flush() when cq_overflow_list is empty, remove it
from the hot path.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

6e523e3c

io_uring: synchronise IOPOLL on task_submit fail · 4283556d

由 Pavel Begunkov 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit 1d5e50da5cc7483849b815ee34559be4f3902a3b
bugzilla: 47450

--------------------------------

commit 81b6d05c upstream

io_req_task_submit() might be called for IOPOLL, do the fail path under
uring_lock to comply with IOPOLL synchronisation based solely on it.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

4283556d

powerpc/32s: Fix RTAS machine check with VMAP stack · 3a92caae

由 Christophe Leroy 提交于 1月 23, 2021

stable inclusion
from stable-5.10.8
commit bca9ca5a603f6c5586a7dfd35e06abe6d5fcd559
bugzilla: 47450

--------------------------------

[ Upstream commit 98bf2d3f ]

When we have VMAP stack, exception prolog 1 sets r1, not r11.

When it is not an RTAS machine check, don't trash r1 because it is
needed by prolog 1.

Fixes: da7bb43a ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Fixes: d2e00603 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Squash in fixup for RTAS machine check from Christophe]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.euSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

3a92caae

ARM: 9031/1: hyp-stub: remove unused .L__boot_cpu_mode_offset symbol · 89842dc5

由 Ard Biesheuvel 提交于 1月 23, 2021

mainline inclusion
from mainline-5.11-rc1
commit 6c7a6d22
category: bugfix
bugzilla: 46882
CVE: NA

--------------------------------

Commit aaac3733 ("ARM: kvm: replace open coded VA->PA calculations
with adr_l call") removed all uses of .L__boot_cpu_mode_offset, so there
is no longer a need to define it.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

89842dc5

ARM: kvm: replace open coded VA->PA calculations with adr_l call · f0514be8

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit aaac3733
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Replace the open coded calculations of the actual physical address
of the KVM stub vector table with a single adr_l invocation.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit aaac3733)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

f0514be8

ARM: head.S: use PC relative insn sequence to calculate PHYS_OFFSET · 622b2462

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 3bcf906b
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Replace the open coded arithmetic with a simple adr_l/sub pair. This
removes some open coded arithmetic involving virtual addresses, avoids
literal pools on v7+, and slightly reduces the footprint of the code.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 3bcf906b)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

622b2462

ARM: sleep.S: use PC-relative insn sequence for sleep_save_sp/mpidr_hash · 4743e04d

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit d74d2b22
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Replace the open coded PC relative offset calculations with adr_l and
ldr_l invocations. This removes some open coded PC relative arithmetic,
avoids literal pools on v7+, and slightly reduces the footprint of the
code. Note that ALT_SMP() expects a single instruction so move the macro
invocation after it.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit d74d2b22)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

4743e04d

ARM: head: use PC-relative insn sequence for __smp_alt · b4428c3b

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 59d2f282
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Now that calling __do_fixup_smp_on_up() can be done without passing
the physical-to-virtual offset in r3, we can replace the open coded
PC relative offset calculations with a pair of adr_l invocations. This
removes some open coded arithmetic involving virtual addresses, avoids
literal pools on v7+, and slightly reduces the footprint of the code.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 59d2f282)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b4428c3b

ARM: kernel: use relative references for UP/SMP alternatives · ec0036de

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 450abd38
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Currently, the .alt.smp.init section contains the virtual addresses
of the patch sites. Since patching may occur both before and after
switching into virtual mode, this requires some manual handling of
the address when applying the UP alternative.

Let's simplify this by using relative offsets in the table entries:
this allows us to simply add each entry's address to its contents,
regardless of whether we are running in virtual mode or not.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 450abd38)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

ec0036de

ARM: head.S: use PC-relative insn sequence for secondary_data · 88142794

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 91580f0d
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Replace the open coded PC relative offset calculations with adr_l
and ldr_l invocations. This removes some open coded arithmetic
involving virtual addresses, avoids literal pools on v7+, and slightly
reduces the footprint of the code.

Note that it also removes a stale comment about the contents of r6.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 91580f0d)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

88142794

ARM: head-common.S: use PC-relative insn sequence for idmap creation · d2cfd32f

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 172c34c9
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Replace the open coded PC relative offset calculations involving
__turn_mmu_on and __turn_mmu_on_end with a pair of adr_l invocations.
This removes some open coded arithmetic involving virtual addresses,
avoids literal pools on v7+, and slightly reduces the footprint of the
code.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 172c34c9)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d2cfd32f

ARM: head-common.S: use PC-relative insn sequence for __proc_info · 05f583e8

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 62c4a2e2
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Replace the open coded PC relative offset calculations with a pair of
adr_l invocations. This removes some open coded arithmetic involving
virtual addresses, avoids literal pools on v7+, and slightly reduces
the footprint of the code.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 62c4a2e2)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

05f583e8

ARM: efistub: replace adrl pseudo-op with adr_l macro invocation · 5e9de2ca

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 67e3f828
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
The ARM 'adrl' pseudo instruction is a bit problematic, as it does not
exist in Thumb mode, and it is not implemented by Clang either. Since
the Thumb variant has a slightly bigger range, it is sometimes necessary
to emit the 'adrl' variant in ARM mode where Thumb mode can use adr just
fine. However, that still leaves the Clang issue, which does not appear
to be supporting this any time soon.

So let's switch to the adr_l macro, which works for both ARM and Thumb,
and has unlimited range.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 67e3f828)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

5e9de2ca

ARM: p2v: reduce p2v alignment requirement to 2 MiB · b9012d8b

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 9443076e
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
The ARM kernel's linear map starts at PAGE_OFFSET, which maps to a
physical address (PHYS_OFFSET) that is platform specific, and is
discovered at boot. Since we don't want to slow down translations
between physical and virtual addresses by keeping the offset in a
variable in memory, we implement this by patching the code performing
the translation, and putting the offset between PAGE_OFFSET and the
start of physical RAM directly into the instruction opcodes.

As we only patch up to 8 bits of offset, yielding 4 GiB >> 8 == 16 MiB
of granularity, we have to round up PHYS_OFFSET to the next multiple if
the start of physical RAM is not a multiple of 16 MiB. This wastes some
physical RAM, since the memory that was skipped will now live below
PAGE_OFFSET, making it inaccessible to the kernel.

We can improve this by changing the patchable sequences and the patching
logic to carry more bits of offset: 11 bits gives us 4 GiB >> 11 == 2 MiB
of granularity, and so we will never waste more than that amount by
rounding up the physical start of DRAM to the next multiple of 2 MiB.
(Note that 2 MiB granularity guarantees that the linear mapping can be
created efficiently, whereas less than 2 MiB may result in the linear
mapping needing another level of page tables)

This helps Zhen Lei's scenario, where the start of DRAM is known to be
occupied. It also helps EFI boot, which relies on the firmware's page
allocator to allocate space for the decompressed kernel as low as
possible. And if the KASLR patches ever land for 32-bit, it will give
us 3 more bits of randomization of the placement of the kernel inside
the linear region.

For the ARM code path, it simply comes down to using two add/sub
instructions instead of one for the carryless version, and patching
each of them with the correct immediate depending on the rotation
field. For the LPAE calculation, which has to deal with a carry, it
patches the MOVW instruction with up to 12 bits of offset (but we only
need 11 bits anyway)

For the Thumb2 code path, patching more than 11 bits of displacement
would be somewhat cumbersome, but the 11 bits we need fit nicely into
the second word of the u16[2] opcode, so we simply update the immediate
assignment and the left shift to create an addend of the right magnitude.
Suggested-by: NZhen Lei <thunder.leizhen@huawei.com>
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 9443076e)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b9012d8b

ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE · 5247796f

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit e8e00f5a
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
In preparation for reducing the phys-to-virt minimum relative alignment
from 16 MiB to 2 MiB, switch to patchable sequences involving MOVW
instructions that can more easily be manipulated to carry a 12-bit
immediate. Note that the non-LPAE ARM sequence is not updated: MOVW
may not be supported on non-LPAE platforms, and the sequence itself
can be updated more easily to apply the 12 bits of displacement.

For Thumb2, which has many more versions of opcodes, switch to a sequence
that can be patched by the same patching code for both versions. Note
that the Thumb2 opcodes for MOVW and MVN are unambiguous, and have no
rotation bits in their immediate fields, so there is no need to use
placeholder constants in the asm blocks.

While at it, drop the 'volatile' qualifiers from the asm blocks: the
code does not have any side effects that are invisible to the compiler,
so it is free to omit these sequences if the outputs are not used.
Suggested-by: NRussell King <linux@armlinux.org.uk>
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit e8e00f5a)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

5247796f

ARM: p2v: simplify __fixup_pv_table() · 9054cf12

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 0e3db6c9
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Declutter the code in __fixup_pv_table() by using the new adr_l/str_l
macros to take PC relative references to external symbols, and by
using the value of PHYS_OFFSET passed in r8 to calculate the p2v
offset.
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 0e3db6c9)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

9054cf12

ARM: p2v: use relative references in patch site arrays · 971f3cf8

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 2730e8ea
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Free up a register in the p2v patching code by switching to relative
references, which don't require keeping the phys-to-virt displacement
live in a register.
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 2730e8ea)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

971f3cf8

ARM: p2v: drop redundant 'type' argument from __pv_stub · 56050b42

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 0869f3b9
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
We always pass the same value for 'type' so pull it into the __pv_stub
macro itself.
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 0869f3b9)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

56050b42

ARM: p2v: factor out BE8 handling · 2d1d9a64

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 7a94849e
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
The big and little endian versions of the ARM p2v patching routine only
differ in the values of the constants, so factor those out into macros
so that we only have one version of the logic sequence to maintain.
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 7a94849e)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

2d1d9a64

ARM: p2v: factor out shared loop processing · c103fe82

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 4b16421c
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
The ARM and Thumb2 versions of the p2v patching loop have some overlap
at the end of the loop, so factor that out. As numeric labels are not
required to be unique, and may therefore be ambiguous, use named local
labels for the start and end of the loop instead.
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 4b16421c)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

c103fe82

ARM: p2v: move patching code to separate assembler source file · 9930ac07

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit eae78e1a
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Move the phys2virt patching code into a separate .S file before doing
some work on it.
Suggested-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit eae78e1a)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

9930ac07

ARM: module: add support for place relative relocations · 4084b9ae

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 22f2d230
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
When using the new adr_l/ldr_l/str_l macros to refer to external symbols
from modules, the linker may emit place relative ELF relocations that
need to be fixed up by the module loader. So add support for these.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 22f2d230)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

4084b9ae

ARM: assembler: introduce adr_l, ldr_l and str_l macros · f49945c1

由 Ard Biesheuvel 提交于 12月 31, 2020

mainline inclusion
from mainline-5.11-rc1
commit 0b167463
category: bugfix
bugzilla: 46882
CVE: NA

-------------------------------------------------
Like arm64, ARM supports position independent code sequences that
produce symbol references with a greater reach than the ordinary
adr/ldr instructions. Since on ARM, the adrl pseudo-instruction is
only supported in ARM mode (and not at all when using Clang), having
a adr_l macro like we do on arm64 is useful, and increases symmetry
as well.

Currently, we use open coded instruction sequences involving literals
and arithmetic operations. Instead, we can use movw/movt pairs on v7
CPUs, circumventing the D-cache entirely.

E.g., on v7+ CPUs, we can emit a PC-relative reference as follows:

       movw         <reg>, #:lower16:<sym> - (1f + 8)
       movt         <reg>, #:upper16:<sym> - (1f + 8)
  1:   add          <reg>, <reg>, pc

For older CPUs, we can emit the literal into a subsection, allowing it
to be emitted out of line while retaining the ability to perform
arithmetic on label offsets.

E.g., on pre-v7 CPUs, we can emit a PC-relative reference as follows:

       ldr          <reg>, 2f
  1:   add          <reg>, <reg>, pc
       .subsection  1
  2:   .long        <sym> - (1b + 8)
       .previous

This is allowed by the assembler because, unlike ordinary sections,
subsections are combined into a single section in the object file, and
so the label references are not true cross-section references that are
visible as relocations. (Subsections have been available in binutils
since 2004 at least, so they should not cause any issues with older
toolchains.)

So use the above to implement the macros mov_l, adr_l, ldr_l and str_l,
all of which will use movw/movt pairs on v7 and later CPUs, and use
PC-relative literals otherwise.
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
(cherry picked from commit 0b167463)
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

f49945c1

scsi: target: Fix XCOPY NAA identifier lookup · 2dc991b9

由 David Disseldorp 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 6f1e88527c1869de08632efa2cc796e0131850dc
bugzilla: 47429

--------------------------------

commit 2896c938 upstream.

When attempting to match EXTENDED COPY CSCD descriptors with corresponding
se_devices, target_xcopy_locate_se_dev_e4() currently iterates over LIO's
global devices list which includes all configured backstores.

This change ensures that only initiator-accessible backstores are
considered during CSCD descriptor lookup, according to the session's
se_node_acl LUN list.

To avoid LUN removal race conditions, device pinning is changed from being
configfs based to instead using the se_node_acl lun_ref.

Reference: CVE-2020-28374
Fixes: cbf031f4 ("target: Add support for EXTENDED_COPY copy offload emulation")
Reviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

2dc991b9

rtlwifi: rise completion at the last step of firmware callback · c3b744f9

由 Ping-Ke Shih 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 513729aecb53cdd0ba4e5e5aebc8b2fddcb0131e
bugzilla: 47429

--------------------------------

commit 4dfde294 upstream.

request_firmware_nowait() which schedules another work is used to load
firmware when USB is probing. If USB is unplugged before running the
firmware work, it goes disconnect ops, and then causes use-after-free.
Though we wait for completion of firmware work before freeing the hw,
firmware callback rises completion too early. So I move it to the
last step.

usb 5-1: Direct firmware load for rtlwifi/rtl8192cufw.bin failed with error -2
rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
rtlwifi: Selected firmware is not available
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

==================================================================
BUG: KASAN: use-after-free in rtl_fw_do_work.cold+0x68/0x6a drivers/net/wireless/realtek/rtlwifi/core.c:93
Write of size 4 at addr ffff8881454cff50 by task kworker/0:6/7379

CPU: 0 PID: 7379 Comm: kworker/0:6 Not tainted 5.10.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events request_firmware_work_func
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
 rtl_fw_do_work.cold+0x68/0x6a drivers/net/wireless/realtek/rtlwifi/core.c:93
 request_firmware_work_func+0x12c/0x230 drivers/base/firmware_loader/main.c:1079
 process_one_work+0x933/0x1520 kernel/workqueue.c:2272
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
 kthread+0x38c/0x460 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296

The buggy address belongs to the page:
page:00000000f54435b3 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1454cf
flags: 0x200000000000000()
raw: 0200000000000000 0000000000000000 ffffea00051533c8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8881454cfe00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8881454cfe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff8881454cff00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                 ^
 ffff8881454cff80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8881454d0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Reported-by: syzbot+65be4277f3c489293939@syzkaller.appspotmail.com
Signed-off-by: NPing-Ke Shih <pkshih@realtek.com>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201214053106.7748-1-pkshih@realtek.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>

c3b744f9

xsk: Fix memory leak for failed bind · 2fa8ad24

由 Magnus Karlsson 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 0fae7d269ef7343e052bb66d4f79022e4456fe82
bugzilla: 47429

--------------------------------

commit 8bee6833 upstream.

Fix a possible memory leak when a bind of an AF_XDP socket fails. When
the fill and completion rings are created, they are tied to the
socket. But when the buffer pool is later created at bind time, the
ownership of these two rings are transferred to the buffer pool as
they might be shared between sockets (and the buffer pool cannot be
created until we know what we are binding to). So, before the buffer
pool is created, these two rings are cleaned up with the socket, and
after they have been transferred they are cleaned up together with
the buffer pool.

The problem is that ownership was transferred before it was absolutely
certain that the buffer pool could be created and initialized
correctly and when one of these errors occurred, the fill and
completion rings did neither belong to the socket nor the pool and
where therefore leaked. Solve this by moving the ownership transfer
to the point where the buffer pool has been completely set up and
there is no way it can fail.

Fixes: 7361f9c3 ("xsk: Move fill and completion rings to buffer pool")
Reported-by: syzbot+cfa88ddd0655afa88763@syzkaller.appspotmail.com
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20201214085127.3960-1-magnus.karlsson@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

2fa8ad24

KVM: x86: fix shift out of bounds reported by UBSAN · 2ff92444

由 Paolo Bonzini 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 563135ec664ffb80a2297e94d618b04b228a1262
bugzilla: 47429

--------------------------------

commit 2f80d502 upstream.

Since we know that e >= s, we can reassociate the left shift,
changing the shifted number from 1 to 2 in exchange for
decreasing the right hand side by 1.

Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

2ff92444

x86/mtrr: Correct the range check before performing MTRR type lookups · 87fb483d

由 Ying-Tsun Huang 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 02ccda90ef7e23a225b68789bce9e8353f9caa1f
bugzilla: 47429

--------------------------------

commit cb7f4a8b upstream.

In mtrr_type_lookup(), if the input memory address region is not in the
MTRR, over 4GB, and not over the top of memory, a write-back attribute
is returned. These condition checks are for ensuring the input memory
address region is actually mapped to the physical memory.

However, if the end address is just aligned with the top of memory,
the condition check treats the address is over the top of memory, and
write-back attribute is not returned.

And this hits in a real use case with NVDIMM: the nd_pmem module tries
to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a
NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes
very low since it is aligned with the top of memory and its memory type
is uncached-minus.

Move the input end address change to inclusive up into
mtrr_type_lookup(), before checking for the top of memory in either
mtrr_type_lookup_{variable,fixed}() helpers.

 [ bp: Massage commit message. ]

Fixes: 0cc705f5 ("x86/mm/mtrr: Clean up mtrr_type_lookup()")
Signed-off-by: NYing-Tsun Huang <ying-tsun.huang@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

87fb483d

dmaengine: idxd: off by one in cleanup code · 06164803

由 Dan Carpenter 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 6e3c67976eda30959833d852bc13c7d0a342cfa9
bugzilla: 47429

--------------------------------

commit ff58f7dd upstream.

The clean up is off by one so this will start at "i" and it should start
with "i - 1" and then it doesn't unregister the zeroeth elements in the
array.

Fixes: c52ca478 ("dmaengine: idxd: add configuration component of driver")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NDave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/X9nFeojulsNqUSnG@mwandaSigned-off-by: NVinod Koul <vkoul@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

06164803

netfilter: nft_dynset: report EOPNOTSUPP on missing set feature · 44a4771c

由 Pablo Neira Ayuso 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 8b109f4cd1dc2224f900702483be81d61beab864
bugzilla: 47429

--------------------------------

commit 95cd4bca upstream.

If userspace requests a feature which is not available the original set
definition, then bail out with EOPNOTSUPP. If userspace sends
unsupported dynset flags (new feature not supported by this kernel),
then report EOPNOTSUPP to userspace. EINVAL should be only used to
report malformed netlink messages from userspace.

Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

44a4771c

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功