1. 27 1月, 2021 40 次提交
    • J
      net: cdc_ncm: correct overhead in delayed_ndp_size · b2090592
      Jouni K. Seppänen 提交于
      stable inclusion
      from stable-5.10.8
      commit b044a949a5c5ddbe61a806bba44aab6148a6f356
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit 7a68d725 ]
      
      Aligning to tx_ndp_modulus is not sufficient because the next align
      call can be cdc_ncm_align_tail, which can add up to ctx->tx_modulus +
      ctx->tx_remainder - 1 bytes. This used to lead to occasional crashes
      on a Huawei 909s-120 LTE module as follows:
      
      - the condition marked /* if there is a remaining skb [...] */ is true
        so the swaps happen
      - skb_out is set from ctx->tx_curr_skb
      - skb_out->len is exactly 0x3f52
      - ctx->tx_curr_size is 0x4000 and delayed_ndp_size is 0xac
        (note that the sum of skb_out->len and delayed_ndp_size is 0x3ffe)
      - the for loop over n is executed once
      - the cdc_ncm_align_tail call marked /* align beginning of next frame */
        increases skb_out->len to 0x3f56 (the sum is now 0x4002)
      - the condition marked /* check if we had enough room left [...] */ is
        false so we break out of the loop
      - the condition marked /* If requested, put NDP at end of frame. */ is
        true so the NDP is written into skb_out
      - now skb_out->len is 0x4002, so padding_count is minus two interpreted
        as an unsigned number, which is used as the length argument to memset,
        leading to a crash with various symptoms but usually including
      
      > Call Trace:
      >  <IRQ>
      >  cdc_ncm_fill_tx_frame+0x83a/0x970 [cdc_ncm]
      >  cdc_mbim_tx_fixup+0x1d9/0x240 [cdc_mbim]
      >  usbnet_start_xmit+0x5d/0x720 [usbnet]
      
      The cdc_ncm_align_tail call first aligns on a ctx->tx_modulus
      boundary (adding at most ctx->tx_modulus-1 bytes), then adds
      ctx->tx_remainder bytes. Alternatively, the next alignment call can
      occur in cdc_ncm_ndp16 or cdc_ncm_ndp32, in which case at most
      ctx->tx_ndp_modulus-1 bytes are added.
      
      A similar problem has occurred before, and the code is nontrivial to
      reason about, so add a guard before the crashing call. By that time it
      is too late to prevent any memory corruption (we'll have written past
      the end of the buffer already) but we can at least try to get a warning
      written into an on-disk log by avoiding the hard crash caused by padding
      past the buffer with a huge number of zeros.
      Signed-off-by: NJouni K. Seppänen <jks@iki.fi>
      Fixes: 4a0e3e98 ("cdc_ncm: Add support for moving NDP to end of NCM frame")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=209407Reported-by: Nkernel test robot <lkp@intel.com>
      Reviewed-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      b2090592
    • J
      btrfs: shrink delalloc pages instead of full inodes · 69837585
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.8
      commit e3b5252b5cdb4458527aa2356277700d21bf625f
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit e076ab2a ]
      
      Commit 38d715f4 ("btrfs: use btrfs_start_delalloc_roots in
      shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing
      some infrastructure we have in place to flush inodes that we use for
      device replace and snapshot.  However this introduced a pretty serious
      performance regression.  To reproduce the user untarred the source
      tarball of Firefox (360MiB xz compressed/1.5GiB uncompressed), and would
      see it take anywhere from 5 to 20 times as long to untar in 5.10
      compared to 5.9. This was observed on fast devices (SSD and better) and
      not on HDD.
      
      The root cause is because before we would generally use the normal
      writeback path to reclaim delalloc space, and for this we would provide
      it with the number of pages we wanted to flush.  The referenced commit
      changed this to flush that many inodes, which drastically increased the
      amount of space we were flushing in certain cases, which severely
      affected performance.
      
      We cannot revert this patch unfortunately because of 3d45f221
      ("btrfs: fix deadlock when cloning inline extent and low on free
      metadata space") which requires the ability to skip flushing inodes that
      are being cloned in certain scenarios, which means we need to keep using
      our flushing infrastructure or risk re-introducing the deadlock.
      
      Instead to fix this problem we can go back to providing
      btrfs_start_delalloc_roots with a number of pages to flush, and then set
      up a writeback_control and utilize sync_inode() to handle the flushing
      for us.  This gives us the same behavior we had prior to the fix, while
      still allowing us to avoid the deadlock that was fixed by Filipe.  I
      redid the users original test and got the following results on one of
      our test machines (256GiB of ram, 56 cores, 2TiB Intel NVMe drive)
      
        5.9		0m54.258s
        5.10		1m26.212s
        5.10+patch	0m38.800s
      
      5.10+patch is significantly faster than plain 5.9 because of my patch
      series "Change data reservations to use the ticketing infra" which
      contained the patch that introduced the regression, but generally
      improved the overall ENOSPC flushing mechanisms.
      
      Additional testing on consumer-grade SSD (8GiB ram, 8 CPU) confirm
      the results:
      
        5.10.5            4m00s
        5.10.5+patch      1m08s
        5.11-rc2	    5m14s
        5.11-rc2+patch    1m30s
      Reported-by: NRené Rebe <rene@exactcode.de>
      Fixes: 38d715f4 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
      CC: stable@vger.kernel.org # 5.10
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Tested-by: NDavid Sterba <dsterba@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add my test results ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      69837585
    • F
      btrfs: fix deadlock when cloning inline extent and low on free metadata space · 24648205
      Filipe Manana 提交于
      stable inclusion
      from stable-5.10.8
      commit 17243f73ad742363721e1288fb74e7b151c801f7
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit 3d45f221 ]
      
      When cloning an inline extent there are cases where we can not just copy
      the inline extent from the source range to the target range (e.g. when the
      target range starts at an offset greater than zero). In such cases we copy
      the inline extent's data into a page of the destination inode and then
      dirty that page. However, after that we will need to start a transaction
      for each processed extent and, if we are ever low on available metadata
      space, we may need to flush existing delalloc for all dirty inodes in an
      attempt to release metadata space - if that happens we may deadlock:
      
      * the async reclaim task queued a delalloc work to flush delalloc for
        the destination inode of the clone operation;
      
      * the task executing that delalloc work gets blocked waiting for the
        range with the dirty page to be unlocked, which is currently locked
        by the task doing the clone operation;
      
      * the async reclaim task blocks waiting for the delalloc work to complete;
      
      * the cloning task is waiting on the waitqueue of its reservation ticket
        while holding the range with the dirty page locked in the inode's
        io_tree;
      
      * if metadata space is not released by some other task (like delalloc for
        some other inode completing for example), the clone task waits forever
        and as a consequence the delalloc work and async reclaim tasks will hang
        forever as well. Releasing more space on the other hand may require
        starting a transaction, which will hang as well when trying to reserve
        metadata space, resulting in a deadlock between all these tasks.
      
      When this happens, traces like the following show up in dmesg/syslog:
      
        [87452.323003] INFO: task kworker/u16:11:1810830 blocked for more than 120 seconds.
        [87452.323644]       Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        [87452.324248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [87452.324852] task:kworker/u16:11  state:D stack:    0 pid:1810830 ppid:     2 flags:0x00004000
        [87452.325520] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
        [87452.326136] Call Trace:
        [87452.326737]  __schedule+0x5d1/0xcf0
        [87452.327390]  schedule+0x45/0xe0
        [87452.328174]  lock_extent_bits+0x1e6/0x2d0 [btrfs]
        [87452.328894]  ? finish_wait+0x90/0x90
        [87452.329474]  btrfs_invalidatepage+0x32c/0x390 [btrfs]
        [87452.330133]  ? __mod_memcg_state+0x8e/0x160
        [87452.330738]  __extent_writepage+0x2d4/0x400 [btrfs]
        [87452.331405]  extent_write_cache_pages+0x2b2/0x500 [btrfs]
        [87452.332007]  ? lock_release+0x20e/0x4c0
        [87452.332557]  ? trace_hardirqs_on+0x1b/0xf0
        [87452.333127]  extent_writepages+0x43/0x90 [btrfs]
        [87452.333653]  ? lock_acquire+0x1a3/0x490
        [87452.334177]  do_writepages+0x43/0xe0
        [87452.334699]  ? __filemap_fdatawrite_range+0xa4/0x100
        [87452.335720]  __filemap_fdatawrite_range+0xc5/0x100
        [87452.336500]  btrfs_run_delalloc_work+0x17/0x40 [btrfs]
        [87452.337216]  btrfs_work_helper+0xf1/0x600 [btrfs]
        [87452.337838]  process_one_work+0x24e/0x5e0
        [87452.338437]  worker_thread+0x50/0x3b0
        [87452.339137]  ? process_one_work+0x5e0/0x5e0
        [87452.339884]  kthread+0x153/0x170
        [87452.340507]  ? kthread_mod_delayed_work+0xc0/0xc0
        [87452.341153]  ret_from_fork+0x22/0x30
        [87452.341806] INFO: task kworker/u16:1:2426217 blocked for more than 120 seconds.
        [87452.342487]       Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        [87452.343274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [87452.344049] task:kworker/u16:1   state:D stack:    0 pid:2426217 ppid:     2 flags:0x00004000
        [87452.344974] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
        [87452.345655] Call Trace:
        [87452.346305]  __schedule+0x5d1/0xcf0
        [87452.346947]  ? kvm_clock_read+0x14/0x30
        [87452.347676]  ? wait_for_completion+0x81/0x110
        [87452.348389]  schedule+0x45/0xe0
        [87452.349077]  schedule_timeout+0x30c/0x580
        [87452.349718]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
        [87452.350340]  ? lock_acquire+0x1a3/0x490
        [87452.351006]  ? try_to_wake_up+0x7a/0xa20
        [87452.351541]  ? lock_release+0x20e/0x4c0
        [87452.352040]  ? lock_acquired+0x199/0x490
        [87452.352517]  ? wait_for_completion+0x81/0x110
        [87452.353000]  wait_for_completion+0xab/0x110
        [87452.353490]  start_delalloc_inodes+0x2af/0x390 [btrfs]
        [87452.353973]  btrfs_start_delalloc_roots+0x12d/0x250 [btrfs]
        [87452.354455]  flush_space+0x24f/0x660 [btrfs]
        [87452.355063]  btrfs_async_reclaim_metadata_space+0x1bb/0x480 [btrfs]
        [87452.355565]  process_one_work+0x24e/0x5e0
        [87452.356024]  worker_thread+0x20f/0x3b0
        [87452.356487]  ? process_one_work+0x5e0/0x5e0
        [87452.356973]  kthread+0x153/0x170
        [87452.357434]  ? kthread_mod_delayed_work+0xc0/0xc0
        [87452.357880]  ret_from_fork+0x22/0x30
        (...)
        < stack traces of several tasks waiting for the locks of the inodes of the
          clone operation >
        (...)
        [92867.444138] RSP: 002b:00007ffc3371bbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
        [92867.444624] RAX: ffffffffffffffda RBX: 00007ffc3371bea0 RCX: 00007f61efe73f97
        [92867.445116] RDX: 0000000000000000 RSI: 0000560fbd5d7a40 RDI: 0000560fbd5d8960
        [92867.445595] RBP: 00007ffc3371beb0 R08: 0000000000000001 R09: 0000000000000003
        [92867.446070] R10: 00007ffc3371b996 R11: 0000000000000246 R12: 0000000000000000
        [92867.446820] R13: 000000000000001f R14: 00007ffc3371bea0 R15: 00007ffc3371beb0
        [92867.447361] task:fsstress        state:D stack:    0 pid:2508238 ppid:2508153 flags:0x00004000
        [92867.447920] Call Trace:
        [92867.448435]  __schedule+0x5d1/0xcf0
        [92867.448934]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
        [92867.449423]  schedule+0x45/0xe0
        [92867.449916]  __reserve_bytes+0x4a4/0xb10 [btrfs]
        [92867.450576]  ? finish_wait+0x90/0x90
        [92867.451202]  btrfs_reserve_metadata_bytes+0x29/0x190 [btrfs]
        [92867.451815]  btrfs_block_rsv_add+0x1f/0x50 [btrfs]
        [92867.452412]  start_transaction+0x2d1/0x760 [btrfs]
        [92867.453216]  clone_copy_inline_extent+0x333/0x490 [btrfs]
        [92867.453848]  ? lock_release+0x20e/0x4c0
        [92867.454539]  ? btrfs_search_slot+0x9a7/0xc30 [btrfs]
        [92867.455218]  btrfs_clone+0x569/0x7e0 [btrfs]
        [92867.455952]  btrfs_clone_files+0xf6/0x150 [btrfs]
        [92867.456588]  btrfs_remap_file_range+0x324/0x3d0 [btrfs]
        [92867.457213]  do_clone_file_range+0xd4/0x1f0
        [92867.457828]  vfs_clone_file_range+0x4d/0x230
        [92867.458355]  ? lock_release+0x20e/0x4c0
        [92867.458890]  ioctl_file_clone+0x8f/0xc0
        [92867.459377]  do_vfs_ioctl+0x342/0x750
        [92867.459913]  __x64_sys_ioctl+0x62/0xb0
        [92867.460377]  do_syscall_64+0x33/0x80
        [92867.460842]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        (...)
        < stack traces of more tasks blocked on metadata reservation like the clone
          task above, because the async reclaim task has deadlocked >
        (...)
      
      Another thing to notice is that the worker task that is deadlocked when
      trying to flush the destination inode of the clone operation is at
      btrfs_invalidatepage(). This is simply because the clone operation has a
      destination offset greater than the i_size and we only update the i_size
      of the destination file after cloning an extent (just like we do in the
      buffered write path).
      
      Since the async reclaim path uses btrfs_start_delalloc_roots() to trigger
      the flushing of delalloc for all inodes that have delalloc, add a runtime
      flag to an inode to signal it should not be flushed, and for inodes with
      that flag set, start_delalloc_inodes() will simply skip them. When the
      cloning code needs to dirty a page to copy an inline extent, set that flag
      on the inode and then clear it when the clone operation finishes.
      
      This could be sporadically triggered with test case generic/269 from
      fstests, which exercises many fsstress processes running in parallel with
      several dd processes filling up the entire filesystem.
      
      CC: stable@vger.kernel.org # 5.9+
      Fixes: 05a5a762 ("Btrfs: implement full reflink support for inline extents")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      24648205
    • F
      btrfs: skip unnecessary searches for xattrs when logging an inode · 5ca00258
      Filipe Manana 提交于
      stable inclusion
      from stable-5.10.8
      commit 87738164592fdd531b068d069911aaa9f3c41c9d
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit f2f121ab ]
      
      Every time we log an inode we lookup in the fs/subvol tree for xattrs and
      if we have any, log them into the log tree. However it is very common to
      have inodes without any xattrs, so doing the search wastes times, but more
      importantly it adds contention on the fs/subvol tree locks, either making
      the logging code block and wait for tree locks or making the logging code
      making other concurrent operations block and wait.
      
      The most typical use cases where xattrs are used are when capabilities or
      ACLs are defined for an inode, or when SELinux is enabled.
      
      This change makes the logging code detect when an inode does not have
      xattrs and skip the xattrs search the next time the inode is logged,
      unless the inode is evicted and loaded again or a xattr is added to the
      inode. Therefore skipping the search for xattrs on inodes that don't ever
      have xattrs and are fsynced with some frequency.
      
      The following script that calls dbench was used to measure the impact of
      this change on a VM with 8 CPUs, 16Gb of ram, using a raw NVMe device
      directly (no intermediary filesystem on the host) and using a non-debug
      kernel (default configuration on Debian distributions):
      
        $ cat test.sh
        #!/bin/bash
      
        DEV=/dev/sdk
        MNT=/mnt/sdk
        MOUNT_OPTIONS="-o ssd"
      
        mkfs.btrfs -f -m single -d single $DEV
        mount $MOUNT_OPTIONS $DEV $MNT
      
        dbench -D $MNT -t 200 40
      
        umount $MNT
      
      The results before this change:
      
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    5761605     0.172   312.057
       Close        4232452     0.002    10.927
       Rename        243937     1.406   277.344
       Unlink       1163456     0.631   298.402
       Deltree          160    11.581   221.107
       Mkdir             80     0.003     0.005
       Qpathinfo    5221410     0.065   122.309
       Qfileinfo     915432     0.001     3.333
       Qfsinfo       957555     0.003     3.992
       Sfileinfo     469244     0.023    20.494
       Find         2018865     0.448   123.659
       WriteX       2874851     0.049   118.529
       ReadX        9030579     0.004    21.654
       LockX          18754     0.003     4.423
       UnlockX        18754     0.002     0.331
       Flush         403792    10.944   359.494
      
      Throughput 908.444 MB/sec  40 clients  40 procs  max_latency=359.500 ms
      
      The results after this change:
      
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    6442521     0.159   230.693
       Close        4732357     0.002    10.972
       Rename        272809     1.293   227.398
       Unlink       1301059     0.563   218.500
       Deltree          160     7.796    54.887
       Mkdir             80     0.008     0.478
       Qpathinfo    5839452     0.047   124.330
       Qfileinfo    1023199     0.001     4.996
       Qfsinfo      1070760     0.003     5.709
       Sfileinfo     524790     0.033    21.765
       Find         2257658     0.314   125.611
       WriteX       3211520     0.040   232.135
       ReadX        10098969     0.004    25.340
       LockX          20974     0.003     1.569
       UnlockX        20974     0.002     3.475
       Flush         451553    10.287   331.037
      
      Throughput 1011.77 MB/sec  40 clients  40 procs  max_latency=331.045 ms
      
      +10.8% throughput, -8.2% max latency
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      5ca00258
    • A
      scsi: ufs: Fix -Wsometimes-uninitialized warning · 78da0297
      Arnd Bergmann 提交于
      stable inclusion
      from stable-5.10.8
      commit e28ace868c1e945f8c61cee147168e26d6c9f2d6
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit 4c60244d ]
      
      clang complains about a possible code path in which a variable is used
      without an initialization:
      
      drivers/scsi/ufs/ufshcd.c:7690:3: error: variable 'sdp' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
                      BUG_ON(1);
                      ^~~~~~~~~
      include/asm-generic/bug.h:63:36: note: expanded from macro 'BUG_ON'
       #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                         ^~~~~~~~~~~~~~~~~~~
      
      Turn the BUG_ON(1) into an unconditional BUG() that makes it clear to clang
      that this code path is never hit.
      
      Link: https://lore.kernel.org/r/20201203223137.1205933-1-arnd@kernel.org
      Fixes: 4f3e900b ("scsi: ufs: Clear UAC for FFU and RPMB LUNs")
      Reviewed-by: NAvri Altman <avri.altman@wdc.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      78da0297
    • M
      io_uring: Fix return value from alloc_fixed_file_ref_node · 65a3dac2
      Matthew Wilcox (Oracle) 提交于
      stable inclusion
      from stable-5.10.8
      commit 458b40598dc0ccbbb1d3522f56a287ea0a127165
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit 3e2224c5 ]
      
      alloc_fixed_file_ref_node() currently returns an ERR_PTR on failure.
      io_sqe_files_unregister() expects it to return NULL and since it can only
      return -ENOMEM, it makes more sense to change alloc_fixed_file_ref_node()
      to behave that way.
      
      Fixes: 1ffc5422 ("io_uring: fix io_sqe_files_unregister() hangs")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      65a3dac2
    • S
      drm/panfrost: Don't corrupt the queue mutex on open/close · 94be15c2
      Steven Price 提交于
      stable inclusion
      from stable-5.10.8
      commit 51495b719515ddae417e4bafc7e100c34833af4b
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit a17d609e ]
      
      The mutex within the panfrost_queue_state should have the lifetime of
      the queue, however it was erroneously initialised/destroyed during
      panfrost_job_{open,close} which is called every time a client
      opens/closes the drm node.
      
      Move the initialisation/destruction to panfrost_job_{init,fini} where it
      belongs.
      
      Fixes: 1a11a88c ("drm/panfrost: Fix job timeout handling")
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Reviewed-by: NBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: NBoris Brezillon <boris.brezillon@collabora.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201029170047.30564-1-steven.price@arm.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      94be15c2
    • B
      iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context · 58871446
      Bjorn Andersson 提交于
      stable inclusion
      from stable-5.10.8
      commit 9d7751a39a19b0090300b2b0498e397f9047e125
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit aded8c7c ]
      
      On SM8150 it's occasionally observed that the boot hangs in between the
      writing of SMEs and context banks in arm_smmu_device_reset().
      
      The problem seems to coincide with a display refresh happening after
      updating the stream mapping, but before clearing - and there by
      disabling translation - the context bank picked to emulate translation
      bypass.
      
      Resolve this by explicitly disabling the bypass context already in
      cfg_probe.
      
      Fixes: f9081b8f ("iommu/arm-smmu-qcom: Implement S2CR quirk")
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Link: https://lore.kernel.org/r/20210106005038.4152731-1-bjorn.andersson@linaro.orgSigned-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      58871446
    • W
      RDMA/hns: Avoid filling sl in high 3 bits of vlan_id · b971c668
      Weihang Li 提交于
      stable inclusion
      from stable-5.10.8
      commit 85bbe2e64ab430af3c27a0bc4e22dae04a5e10e6
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit 94a8c4df ]
      
      Only the low 12 bits of vlan_id is valid, and service level has been
      filled in Address Vector. So there is no need to fill sl in vlan_id in
      Address Vector.
      
      Fixes: 7406c003 ("RDMA/hns: Only record vlan info for HIP08")
      Link: https://lore.kernel.org/r/1607650657-35992-5-git-send-email-liweihang@huawei.comSigned-off-by: NWeihang Li <liweihang@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      b971c668
    • P
      io_uring: patch up IOPOLL overflow_flush sync · 52927550
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.8
      commit 85e25e2370a20352b72af34940fb32746a64fc28
      bugzilla: 47450
      
      --------------------------------
      
      commit 6c503150 upstream
      
      IOPOLL skips completion locking but keeps it under uring_lock, thus
      io_cqring_overflow_flush() and so io_cqring_events() need additional
      locking with uring_lock in some cases for IOPOLL.
      
      Remove __io_cqring_overflow_flush() from io_cqring_events(), introduce a
      wrapper around flush doing needed synchronisation and call it by hand.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      52927550
    • P
      io_uring: limit {io|sq}poll submit locking scope · 6e523e3c
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.8
      commit bc924dd21ecf8a8363091ef02fdac3115d024b91
      bugzilla: 47450
      
      --------------------------------
      
      commit 89448c47 upstream
      
      We don't need to take uring_lock for SQPOLL|IOPOLL to do
      io_cqring_overflow_flush() when cq_overflow_list is empty, remove it
      from the hot path.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      6e523e3c
    • P
      io_uring: synchronise IOPOLL on task_submit fail · 4283556d
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.8
      commit 1d5e50da5cc7483849b815ee34559be4f3902a3b
      bugzilla: 47450
      
      --------------------------------
      
      commit 81b6d05c upstream
      
      io_req_task_submit() might be called for IOPOLL, do the fail path under
      uring_lock to comply with IOPOLL synchronisation based solely on it.
      
      Cc: stable@vger.kernel.org # 5.5+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      4283556d
    • C
      powerpc/32s: Fix RTAS machine check with VMAP stack · 3a92caae
      Christophe Leroy 提交于
      stable inclusion
      from stable-5.10.8
      commit bca9ca5a603f6c5586a7dfd35e06abe6d5fcd559
      bugzilla: 47450
      
      --------------------------------
      
      [ Upstream commit 98bf2d3f ]
      
      When we have VMAP stack, exception prolog 1 sets r1, not r11.
      
      When it is not an RTAS machine check, don't trash r1 because it is
      needed by prolog 1.
      
      Fixes: da7bb43a ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
      Fixes: d2e00603 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      [mpe: Squash in fixup for RTAS machine check from Christophe]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.euSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      3a92caae
    • A
      ARM: 9031/1: hyp-stub: remove unused .L__boot_cpu_mode_offset symbol · 89842dc5
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 6c7a6d22
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      --------------------------------
      
      Commit aaac3733 ("ARM: kvm: replace open coded VA->PA calculations
      with adr_l call") removed all uses of .L__boot_cpu_mode_offset, so there
      is no longer a need to define it.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      89842dc5
    • A
      ARM: kvm: replace open coded VA->PA calculations with adr_l call · f0514be8
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit aaac3733
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Replace the open coded calculations of the actual physical address
      of the KVM stub vector table with a single adr_l invocation.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit aaac3733)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      f0514be8
    • A
      ARM: head.S: use PC relative insn sequence to calculate PHYS_OFFSET · 622b2462
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 3bcf906b
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Replace the open coded arithmetic with a simple adr_l/sub pair. This
      removes some open coded arithmetic involving virtual addresses, avoids
      literal pools on v7+, and slightly reduces the footprint of the code.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 3bcf906b)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      622b2462
    • A
      ARM: sleep.S: use PC-relative insn sequence for sleep_save_sp/mpidr_hash · 4743e04d
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit d74d2b22
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Replace the open coded PC relative offset calculations with adr_l and
      ldr_l invocations. This removes some open coded PC relative arithmetic,
      avoids literal pools on v7+, and slightly reduces the footprint of the
      code. Note that ALT_SMP() expects a single instruction so move the macro
      invocation after it.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit d74d2b22)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      4743e04d
    • A
      ARM: head: use PC-relative insn sequence for __smp_alt · b4428c3b
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 59d2f282
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Now that calling __do_fixup_smp_on_up() can be done without passing
      the physical-to-virtual offset in r3, we can replace the open coded
      PC relative offset calculations with a pair of adr_l invocations. This
      removes some open coded arithmetic involving virtual addresses, avoids
      literal pools on v7+, and slightly reduces the footprint of the code.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 59d2f282)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      b4428c3b
    • A
      ARM: kernel: use relative references for UP/SMP alternatives · ec0036de
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 450abd38
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Currently, the .alt.smp.init section contains the virtual addresses
      of the patch sites. Since patching may occur both before and after
      switching into virtual mode, this requires some manual handling of
      the address when applying the UP alternative.
      
      Let's simplify this by using relative offsets in the table entries:
      this allows us to simply add each entry's address to its contents,
      regardless of whether we are running in virtual mode or not.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 450abd38)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      ec0036de
    • A
      ARM: head.S: use PC-relative insn sequence for secondary_data · 88142794
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 91580f0d
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Replace the open coded PC relative offset calculations with adr_l
      and ldr_l invocations. This removes some open coded arithmetic
      involving virtual addresses, avoids literal pools on v7+, and slightly
      reduces the footprint of the code.
      
      Note that it also removes a stale comment about the contents of r6.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 91580f0d)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      88142794
    • A
      ARM: head-common.S: use PC-relative insn sequence for idmap creation · d2cfd32f
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 172c34c9
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Replace the open coded PC relative offset calculations involving
      __turn_mmu_on and __turn_mmu_on_end with a pair of adr_l invocations.
      This removes some open coded arithmetic involving virtual addresses,
      avoids literal pools on v7+, and slightly reduces the footprint of the
      code.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 172c34c9)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      d2cfd32f
    • A
      ARM: head-common.S: use PC-relative insn sequence for __proc_info · 05f583e8
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 62c4a2e2
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Replace the open coded PC relative offset calculations with a pair of
      adr_l invocations. This removes some open coded arithmetic involving
      virtual addresses, avoids literal pools on v7+, and slightly reduces
      the footprint of the code.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 62c4a2e2)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      05f583e8
    • A
      ARM: efistub: replace adrl pseudo-op with adr_l macro invocation · 5e9de2ca
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 67e3f828
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      The ARM 'adrl' pseudo instruction is a bit problematic, as it does not
      exist in Thumb mode, and it is not implemented by Clang either. Since
      the Thumb variant has a slightly bigger range, it is sometimes necessary
      to emit the 'adrl' variant in ARM mode where Thumb mode can use adr just
      fine. However, that still leaves the Clang issue, which does not appear
      to be supporting this any time soon.
      
      So let's switch to the adr_l macro, which works for both ARM and Thumb,
      and has unlimited range.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 67e3f828)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      5e9de2ca
    • A
      ARM: p2v: reduce p2v alignment requirement to 2 MiB · b9012d8b
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 9443076e
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      The ARM kernel's linear map starts at PAGE_OFFSET, which maps to a
      physical address (PHYS_OFFSET) that is platform specific, and is
      discovered at boot. Since we don't want to slow down translations
      between physical and virtual addresses by keeping the offset in a
      variable in memory, we implement this by patching the code performing
      the translation, and putting the offset between PAGE_OFFSET and the
      start of physical RAM directly into the instruction opcodes.
      
      As we only patch up to 8 bits of offset, yielding 4 GiB >> 8 == 16 MiB
      of granularity, we have to round up PHYS_OFFSET to the next multiple if
      the start of physical RAM is not a multiple of 16 MiB. This wastes some
      physical RAM, since the memory that was skipped will now live below
      PAGE_OFFSET, making it inaccessible to the kernel.
      
      We can improve this by changing the patchable sequences and the patching
      logic to carry more bits of offset: 11 bits gives us 4 GiB >> 11 == 2 MiB
      of granularity, and so we will never waste more than that amount by
      rounding up the physical start of DRAM to the next multiple of 2 MiB.
      (Note that 2 MiB granularity guarantees that the linear mapping can be
      created efficiently, whereas less than 2 MiB may result in the linear
      mapping needing another level of page tables)
      
      This helps Zhen Lei's scenario, where the start of DRAM is known to be
      occupied. It also helps EFI boot, which relies on the firmware's page
      allocator to allocate space for the decompressed kernel as low as
      possible. And if the KASLR patches ever land for 32-bit, it will give
      us 3 more bits of randomization of the placement of the kernel inside
      the linear region.
      
      For the ARM code path, it simply comes down to using two add/sub
      instructions instead of one for the carryless version, and patching
      each of them with the correct immediate depending on the rotation
      field. For the LPAE calculation, which has to deal with a carry, it
      patches the MOVW instruction with up to 12 bits of offset (but we only
      need 11 bits anyway)
      
      For the Thumb2 code path, patching more than 11 bits of displacement
      would be somewhat cumbersome, but the 11 bits we need fit nicely into
      the second word of the u16[2] opcode, so we simply update the immediate
      assignment and the left shift to create an addend of the right magnitude.
      Suggested-by: NZhen Lei <thunder.leizhen@huawei.com>
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 9443076e)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      b9012d8b
    • A
      ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE · 5247796f
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit e8e00f5a
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      In preparation for reducing the phys-to-virt minimum relative alignment
      from 16 MiB to 2 MiB, switch to patchable sequences involving MOVW
      instructions that can more easily be manipulated to carry a 12-bit
      immediate. Note that the non-LPAE ARM sequence is not updated: MOVW
      may not be supported on non-LPAE platforms, and the sequence itself
      can be updated more easily to apply the 12 bits of displacement.
      
      For Thumb2, which has many more versions of opcodes, switch to a sequence
      that can be patched by the same patching code for both versions. Note
      that the Thumb2 opcodes for MOVW and MVN are unambiguous, and have no
      rotation bits in their immediate fields, so there is no need to use
      placeholder constants in the asm blocks.
      
      While at it, drop the 'volatile' qualifiers from the asm blocks: the
      code does not have any side effects that are invisible to the compiler,
      so it is free to omit these sequences if the outputs are not used.
      Suggested-by: NRussell King <linux@armlinux.org.uk>
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit e8e00f5a)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      5247796f
    • A
      ARM: p2v: simplify __fixup_pv_table() · 9054cf12
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 0e3db6c9
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Declutter the code in __fixup_pv_table() by using the new adr_l/str_l
      macros to take PC relative references to external symbols, and by
      using the value of PHYS_OFFSET passed in r8 to calculate the p2v
      offset.
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 0e3db6c9)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      9054cf12
    • A
      ARM: p2v: use relative references in patch site arrays · 971f3cf8
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 2730e8ea
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Free up a register in the p2v patching code by switching to relative
      references, which don't require keeping the phys-to-virt displacement
      live in a register.
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 2730e8ea)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      971f3cf8
    • A
      ARM: p2v: drop redundant 'type' argument from __pv_stub · 56050b42
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 0869f3b9
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      We always pass the same value for 'type' so pull it into the __pv_stub
      macro itself.
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 0869f3b9)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      56050b42
    • A
      ARM: p2v: factor out BE8 handling · 2d1d9a64
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 7a94849e
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      The big and little endian versions of the ARM p2v patching routine only
      differ in the values of the constants, so factor those out into macros
      so that we only have one version of the logic sequence to maintain.
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 7a94849e)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      2d1d9a64
    • A
      ARM: p2v: factor out shared loop processing · c103fe82
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 4b16421c
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      The ARM and Thumb2 versions of the p2v patching loop have some overlap
      at the end of the loop, so factor that out. As numeric labels are not
      required to be unique, and may therefore be ambiguous, use named local
      labels for the start and end of the loop instead.
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 4b16421c)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      c103fe82
    • A
      ARM: p2v: move patching code to separate assembler source file · 9930ac07
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit eae78e1a
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Move the phys2virt patching code into a separate .S file before doing
      some work on it.
      Suggested-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit eae78e1a)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      9930ac07
    • A
      ARM: module: add support for place relative relocations · 4084b9ae
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 22f2d230
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      When using the new adr_l/ldr_l/str_l macros to refer to external symbols
      from modules, the linker may emit place relative ELF relocations that
      need to be fixed up by the module loader. So add support for these.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 22f2d230)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      4084b9ae
    • A
      ARM: assembler: introduce adr_l, ldr_l and str_l macros · f49945c1
      Ard Biesheuvel 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit 0b167463
      category: bugfix
      bugzilla: 46882
      CVE: NA
      
      -------------------------------------------------
      Like arm64, ARM supports position independent code sequences that
      produce symbol references with a greater reach than the ordinary
      adr/ldr instructions. Since on ARM, the adrl pseudo-instruction is
      only supported in ARM mode (and not at all when using Clang), having
      a adr_l macro like we do on arm64 is useful, and increases symmetry
      as well.
      
      Currently, we use open coded instruction sequences involving literals
      and arithmetic operations. Instead, we can use movw/movt pairs on v7
      CPUs, circumventing the D-cache entirely.
      
      E.g., on v7+ CPUs, we can emit a PC-relative reference as follows:
      
             movw         <reg>, #:lower16:<sym> - (1f + 8)
             movt         <reg>, #:upper16:<sym> - (1f + 8)
        1:   add          <reg>, <reg>, pc
      
      For older CPUs, we can emit the literal into a subsection, allowing it
      to be emitted out of line while retaining the ability to perform
      arithmetic on label offsets.
      
      E.g., on pre-v7 CPUs, we can emit a PC-relative reference as follows:
      
             ldr          <reg>, 2f
        1:   add          <reg>, <reg>, pc
             .subsection  1
        2:   .long        <sym> - (1b + 8)
             .previous
      
      This is allowed by the assembler because, unlike ordinary sections,
      subsections are combined into a single section in the object file, and
      so the label references are not true cross-section references that are
      visible as relocations. (Subsections have been available in binutils
      since 2004 at least, so they should not cause any issues with older
      toolchains.)
      
      So use the above to implement the macros mov_l, adr_l, ldr_l and str_l,
      all of which will use movw/movt pairs on v7 and later CPUs, and use
      PC-relative literals otherwise.
      Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      (cherry picked from commit 0b167463)
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      f49945c1
    • D
      scsi: target: Fix XCOPY NAA identifier lookup · 2dc991b9
      David Disseldorp 提交于
      stable inclusion
      from stable-5.10.7
      commit 6f1e88527c1869de08632efa2cc796e0131850dc
      bugzilla: 47429
      
      --------------------------------
      
      commit 2896c938 upstream.
      
      When attempting to match EXTENDED COPY CSCD descriptors with corresponding
      se_devices, target_xcopy_locate_se_dev_e4() currently iterates over LIO's
      global devices list which includes all configured backstores.
      
      This change ensures that only initiator-accessible backstores are
      considered during CSCD descriptor lookup, according to the session's
      se_node_acl LUN list.
      
      To avoid LUN removal race conditions, device pinning is changed from being
      configfs based to instead using the se_node_acl lun_ref.
      
      Reference: CVE-2020-28374
      Fixes: cbf031f4 ("target: Add support for EXTENDED_COPY copy offload emulation")
      Reviewed-by: NLee Duncan <lduncan@suse.com>
      Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
      Signed-off-by: NMike Christie <michael.christie@oracle.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      2dc991b9
    • P
      rtlwifi: rise completion at the last step of firmware callback · c3b744f9
      Ping-Ke Shih 提交于
      stable inclusion
      from stable-5.10.7
      commit 513729aecb53cdd0ba4e5e5aebc8b2fddcb0131e
      bugzilla: 47429
      
      --------------------------------
      
      commit 4dfde294 upstream.
      
      request_firmware_nowait() which schedules another work is used to load
      firmware when USB is probing. If USB is unplugged before running the
      firmware work, it goes disconnect ops, and then causes use-after-free.
      Though we wait for completion of firmware work before freeing the hw,
      firmware callback rises completion too early. So I move it to the
      last step.
      
      usb 5-1: Direct firmware load for rtlwifi/rtl8192cufw.bin failed with error -2
      rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
      rtlwifi: Selected firmware is not available
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      
      ==================================================================
      BUG: KASAN: use-after-free in rtl_fw_do_work.cold+0x68/0x6a drivers/net/wireless/realtek/rtlwifi/core.c:93
      Write of size 4 at addr ffff8881454cff50 by task kworker/0:6/7379
      
      CPU: 0 PID: 7379 Comm: kworker/0:6 Not tainted 5.10.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events request_firmware_work_func
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
       rtl_fw_do_work.cold+0x68/0x6a drivers/net/wireless/realtek/rtlwifi/core.c:93
       request_firmware_work_func+0x12c/0x230 drivers/base/firmware_loader/main.c:1079
       process_one_work+0x933/0x1520 kernel/workqueue.c:2272
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
       kthread+0x38c/0x460 kernel/kthread.c:292
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      
      The buggy address belongs to the page:
      page:00000000f54435b3 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1454cf
      flags: 0x200000000000000()
      raw: 0200000000000000 0000000000000000 ffffea00051533c8 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881454cfe00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8881454cfe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff8881454cff00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                       ^
       ffff8881454cff80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8881454d0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      Reported-by: syzbot+65be4277f3c489293939@syzkaller.appspotmail.com
      Signed-off-by: NPing-Ke Shih <pkshih@realtek.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201214053106.7748-1-pkshih@realtek.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      c3b744f9
    • M
      xsk: Fix memory leak for failed bind · 2fa8ad24
      Magnus Karlsson 提交于
      stable inclusion
      from stable-5.10.7
      commit 0fae7d269ef7343e052bb66d4f79022e4456fe82
      bugzilla: 47429
      
      --------------------------------
      
      commit 8bee6833 upstream.
      
      Fix a possible memory leak when a bind of an AF_XDP socket fails. When
      the fill and completion rings are created, they are tied to the
      socket. But when the buffer pool is later created at bind time, the
      ownership of these two rings are transferred to the buffer pool as
      they might be shared between sockets (and the buffer pool cannot be
      created until we know what we are binding to). So, before the buffer
      pool is created, these two rings are cleaned up with the socket, and
      after they have been transferred they are cleaned up together with
      the buffer pool.
      
      The problem is that ownership was transferred before it was absolutely
      certain that the buffer pool could be created and initialized
      correctly and when one of these errors occurred, the fill and
      completion rings did neither belong to the socket nor the pool and
      where therefore leaked. Solve this by moving the ownership transfer
      to the point where the buffer pool has been completely set up and
      there is no way it can fail.
      
      Fixes: 7361f9c3 ("xsk: Move fill and completion rings to buffer pool")
      Reported-by: syzbot+cfa88ddd0655afa88763@syzkaller.appspotmail.com
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/20201214085127.3960-1-magnus.karlsson@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      2fa8ad24
    • P
      KVM: x86: fix shift out of bounds reported by UBSAN · 2ff92444
      Paolo Bonzini 提交于
      stable inclusion
      from stable-5.10.7
      commit 563135ec664ffb80a2297e94d618b04b228a1262
      bugzilla: 47429
      
      --------------------------------
      
      commit 2f80d502 upstream.
      
      Since we know that e >= s, we can reassociate the left shift,
      changing the shifted number from 1 to 2 in exchange for
      decreasing the right hand side by 1.
      
      Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      2ff92444
    • Y
      x86/mtrr: Correct the range check before performing MTRR type lookups · 87fb483d
      Ying-Tsun Huang 提交于
      stable inclusion
      from stable-5.10.7
      commit 02ccda90ef7e23a225b68789bce9e8353f9caa1f
      bugzilla: 47429
      
      --------------------------------
      
      commit cb7f4a8b upstream.
      
      In mtrr_type_lookup(), if the input memory address region is not in the
      MTRR, over 4GB, and not over the top of memory, a write-back attribute
      is returned. These condition checks are for ensuring the input memory
      address region is actually mapped to the physical memory.
      
      However, if the end address is just aligned with the top of memory,
      the condition check treats the address is over the top of memory, and
      write-back attribute is not returned.
      
      And this hits in a real use case with NVDIMM: the nd_pmem module tries
      to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a
      NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes
      very low since it is aligned with the top of memory and its memory type
      is uncached-minus.
      
      Move the input end address change to inclusive up into
      mtrr_type_lookup(), before checking for the top of memory in either
      mtrr_type_lookup_{variable,fixed}() helpers.
      
       [ bp: Massage commit message. ]
      
      Fixes: 0cc705f5 ("x86/mm/mtrr: Clean up mtrr_type_lookup()")
      Signed-off-by: NYing-Tsun Huang <ying-tsun.huang@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      87fb483d
    • D
      dmaengine: idxd: off by one in cleanup code · 06164803
      Dan Carpenter 提交于
      stable inclusion
      from stable-5.10.7
      commit 6e3c67976eda30959833d852bc13c7d0a342cfa9
      bugzilla: 47429
      
      --------------------------------
      
      commit ff58f7dd upstream.
      
      The clean up is off by one so this will start at "i" and it should start
      with "i - 1" and then it doesn't unregister the zeroeth elements in the
      array.
      
      Fixes: c52ca478 ("dmaengine: idxd: add configuration component of driver")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NDave Jiang <dave.jiang@intel.com>
      Link: https://lore.kernel.org/r/X9nFeojulsNqUSnG@mwandaSigned-off-by: NVinod Koul <vkoul@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      06164803
    • P
      netfilter: nft_dynset: report EOPNOTSUPP on missing set feature · 44a4771c
      Pablo Neira Ayuso 提交于
      stable inclusion
      from stable-5.10.7
      commit 8b109f4cd1dc2224f900702483be81d61beab864
      bugzilla: 47429
      
      --------------------------------
      
      commit 95cd4bca upstream.
      
      If userspace requests a feature which is not available the original set
      definition, then bail out with EOPNOTSUPP. If userspace sends
      unsupported dynset flags (new feature not supported by this kernel),
      then report EOPNOTSUPP to userspace. EINVAL should be only used to
      report malformed netlink messages from userspace.
      
      Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      44a4771c