1. 08 7月, 2021 2 次提交
  2. 06 7月, 2021 9 次提交
    • L
      iommu/vt-d: Fix general protection fault in aux_detach_device() · 172b3700
      Liu Yi L 提交于
      mainline inclusion
      from mainline-v5.11-rc3
      commit 18abda7a
      category: bugfix
      bugzilla: 108082
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18abda7a2d555783d28ea1701f3ec95e96237a86
      
      -------------------------------------------------------------------------
      
      The aux-domain attach/detach are not tracked, some data structures might
      be used after free. This causes general protection faults when multiple
      subdevices are created and assigned to a same guest machine:
      
        | general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
        | RIP: 0010:intel_iommu_aux_detach_device+0x12a/0x1f0
        | [...]
        | Call Trace:
        |  iommu_aux_detach_device+0x24/0x70
        |  vfio_mdev_detach_domain+0x3b/0x60
        |  ? vfio_mdev_set_domain+0x50/0x50
        |  iommu_group_for_each_dev+0x4f/0x80
        |  vfio_iommu_detach_group.isra.0+0x22/0x30
        |  vfio_iommu_type1_detach_group.cold+0x71/0x211
        |  ? find_exported_symbol_in_section+0x4a/0xd0
        |  ? each_symbol_section+0x28/0x50
        |  __vfio_group_unset_container+0x4d/0x150
        |  vfio_group_try_dissolve_container+0x25/0x30
        |  vfio_group_put_external_user+0x13/0x20
        |  kvm_vfio_group_put_external_user+0x27/0x40 [kvm]
        |  kvm_vfio_destroy+0x45/0xb0 [kvm]
        |  kvm_put_kvm+0x1bb/0x2e0 [kvm]
        |  kvm_vm_release+0x22/0x30 [kvm]
        |  __fput+0xcc/0x260
        |  ____fput+0xe/0x10
        |  task_work_run+0x8f/0xb0
        |  do_exit+0x358/0xaf0
        |  ? wake_up_state+0x10/0x20
        |  ? signal_wake_up_state+0x1a/0x30
        |  do_group_exit+0x47/0xb0
        |  __x64_sys_exit_group+0x18/0x20
        |  do_syscall_64+0x57/0x1d0
        |  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fix the crash by tracking the subdevices when attaching and detaching
      aux-domains.
      
      Fixes: 67b8e02b ("iommu/vt-d: Aux-domain specific domain attach/detach")
      Co-developed-by: NXin Zeng <xin.zeng@intel.com>
      Signed-off-by: NXin Zeng <xin.zeng@intel.com>
      Signed-off-by: NLiu Yi L <yi.l.liu@intel.com>
      Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/1609949037-25291-3-git-send-email-yi.l.liu@intel.comSigned-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      172b3700
    • Y
      seccomp/cache: Report cache data through /proc/pid/seccomp_cache · 39be1ac0
      YiFei Zhu 提交于
      stable inclusion
      from stable-5.11-rc1
      commit 0d8315dd
      bugzilla: 167382
      CVE: N/A
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0d8315dddd2899f519fe1ca3d4d5cdaf44ea421e
      
      -------------------------------------------------
      
      Currently the kernel does not provide an infrastructure to translate
      architecture numbers to a human-readable name. Translating syscall
      numbers to syscall names is possible through FTRACE_SYSCALL
      infrastructure but it does not provide support for compat syscalls.
      
      This will create a file for each PID as /proc/pid/seccomp_cache.
      The file will be empty when no seccomp filters are loaded, or be
      in the format of:
      <arch name> <decimal syscall number> <ALLOW | FILTER>
      where ALLOW means the cache is guaranteed to allow the syscall,
      and filter means the cache will pass the syscall to the BPF filter.
      
      For the docker default profile on x86_64 it looks like:
      x86_64 0 ALLOW
      x86_64 1 ALLOW
      x86_64 2 ALLOW
      x86_64 3 ALLOW
      [...]
      x86_64 132 ALLOW
      x86_64 133 ALLOW
      x86_64 134 FILTER
      x86_64 135 FILTER
      x86_64 136 FILTER
      x86_64 137 ALLOW
      x86_64 138 ALLOW
      x86_64 139 FILTER
      x86_64 140 ALLOW
      x86_64 141 ALLOW
      [...]
      
      This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default
      of N because I think certain users of seccomp might not want the
      application to know which syscalls are definitely usable. For
      the same reason, it is also guarded by CAP_SYS_ADMIN.
      Suggested-by: NJann Horn <jannh@google.com>
      Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/Signed-off-by: NYiFei Zhu <yifeifz2@illinois.edu>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/94e663fa53136f5a11f432c661794d1ee7060779.1605101222.git.yifeifz2@illinois.eduSigned-off-by: NGONG, Ruiqi <gongruiqi1@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      39be1ac0
    • P
      mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare · d568aeab
      Peter Xu 提交于
      stable inclusion
      from stable-5.10.46
      commit 12eb3c2c1a4f6e7c30de2aa0a09cb1b9e19fa9c0
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      commit 099dd687 upstream.
      
      I found it by pure code review, that pte_same_as_swp() of unuse_vma()
      didn't take uffd-wp bit into account when comparing ptes.
      pte_same_as_swp() returning false negative could cause failure to
      swapoff swap ptes that was wr-protected by userfaultfd.
      
      Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com
      Fixes: f45ec5ff ("userfaultfd: wp: support swap and page migration")
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>	[5.7+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      d568aeab
    • F
      mm: relocate 'write_protect_seq' in struct mm_struct · c6746072
      Feng Tang 提交于
      stable inclusion
      from stable-5.10.46
      commit 103c4a08baec6723cf2d4999c873a1634f8d6bc0
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 2e302543 ]
      
      0day robot reported a 9.2% regression for will-it-scale mmap1 test
      case[1], caused by commit 57efa1fe ("mm/gup: prevent gup_fast from
      racing with COW during fork").
      
      Further debug shows the regression is due to that commit changes the
      offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
      cache alignment changes.
      
      From the perf data, the contention for 'mmap_lock' is very severe and
      takes around 95% cpu cycles, and it is a rw_semaphore
      
              struct rw_semaphore {
                      atomic_long_t count;	/* 8 bytes */
                      atomic_long_t owner;	/* 8 bytes */
                      struct optimistic_spin_queue osq; /* spinner MCS lock */
                      ...
      
      Before commit 57efa1fe adds the 'write_protect_seq', it happens to
      have a very optimal cache alignment layout, as Linus explained:
      
       "and before the addition of the 'write_protect_seq' field, the
        mmap_sem was at offset 120 in 'struct mm_struct'.
      
        Which meant that count and owner were in two different cachelines,
        and then when you have contention and spend time in
        rwsem_down_write_slowpath(), this is probably *exactly* the kind
        of layout you want.
      
        Because first the rwsem_write_trylock() will do a cmpxchg on the
        first cacheline (for the optimistic fast-path), and then in the
        case of contention, rwsem_down_write_slowpath() will just access
        the second cacheline.
      
        Which is probably just optimal for a load that spends a lot of
        time contended - new waiters touch that first cacheline, and then
        they queue themselves up on the second cacheline."
      
      After the commit, the rw_semaphore is at offset 128, which means the
      'count' and 'owner' fields are now in the same cacheline, and causes
      more cache bouncing.
      
      Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
      affect its offset:
      
        CONFIG_MMU
        CONFIG_MEMBARRIER
        CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
      
      The layout above is on 64 bits system with 0day's default kernel config
      (similar to RHEL-8.3's config), in which all these 3 options are 'y'.
      And the layout can vary with different kernel configs.
      
      Relayouting a structure is usually a double-edged sword, as sometimes it
      can helps one case, but hurt other cases.  For this case, one solution
      is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
      (when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
      hole in 'mm_struct' will not change other fields' alignment, while
      restoring the regression.
      
      Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c6746072
    • A
      regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting · f66b9197
      Axel Lin 提交于
      stable inclusion
      from stable-5.10.46
      commit 0609c36696e7668d265c29ee88bad079201f700f
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 0514582a ]
      
      The valid selectors for bd70528 bucks are 0 ~ 0xf, so the .n_voltages
      should be 16 (0x10). Use 0x10 to make it consistent with BD70528_LDO_VOLTS.
      Also remove redundant defines for BD70528_BUCK_VOLTS.
      Signed-off-by: NAxel Lin <axel.lin@ingics.com>
      Acked-by: NMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
      Link: https://lore.kernel.org/r/20210523071045.2168904-1-axel.lin@ingics.comSigned-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f66b9197
    • J
      ptp: improve max_adj check against unreasonable values · c8b68d15
      Jakub Kicinski 提交于
      stable inclusion
      from stable-5.10.46
      commit 9a479495629246c5dcfec55f7f425f5149f29ac0
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 475b92f9 ]
      
      Scaled PPM conversion to PPB may (on 64bit systems) result
      in a value larger than s32 can hold (freq/scaled_ppm is a long).
      This means the kernel will not correctly reject unreasonably
      high ->freq values (e.g. > 4294967295ppb, 281474976645 scaled PPM).
      
      The conversion is equivalent to a division by ~66 (65.536),
      so the value of ppb is always smaller than ppm, but not small
      enough to assume narrowing the type from long -> s32 is okay.
      
      Note that reasonable user space (e.g. ptp4l) will not use such
      high values, anyway, 4289046510ppb ~= 4.3x, so the fix is
      somewhat pedantic.
      
      Fixes: d39a7435 ("ptp: validate the requested frequency adjustment.")
      Fixes: d94ba80e ("ptp: Added a brand new class driver for ptp clocks.")
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c8b68d15
    • C
      net: make get_net_ns return error if NET_NS is disabled · 991bf60b
      Changbin Du 提交于
      stable inclusion
      from stable-5.10.46
      commit 4abfd597fe60bfa677bfe177e3a6a551e3a3f792
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit ea6932d7 ]
      
      There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled.
      The reason is that nsfs tries to access ns->ops but the proc_ns_operations
      is not implemented in this case.
      
      [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
      [7.670268] pgd = 32b54000
      [7.670544] [00000010] *pgd=00000000
      [7.671861] Internal error: Oops: 5 [#1] SMP ARM
      [7.672315] Modules linked in:
      [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2 #16
      [7.673309] Hardware name: Generic DT based system
      [7.673642] PC is at nsfs_evict+0x24/0x30
      [7.674486] LR is at clear_inode+0x20/0x9c
      
      The same to tun SIOCGSKNS command.
      
      To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is
      disabled. Meanwhile move it to right place net/core/net_namespace.c.
      Signed-off-by: NChangbin Du <changbin.du@gmail.com>
      Fixes: c62cce2c ("net: add an ioctl to get a socket network namespace")
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      991bf60b
    • D
      net/mlx5e: Fix page reclaim for dead peer hairpin · c17cc0e6
      Dima Chumak 提交于
      stable inclusion
      from stable-5.10.46
      commit be7f3f401d224e1efe8112b2fa8b837eeb8c5e52
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a3e5fd93 ]
      
      When adding a hairpin flow, a firmware-side send queue is created for
      the peer net device, which claims some host memory pages for its
      internal ring buffer. If the peer net device is removed/unbound before
      the hairpin flow is deleted, then the send queue is not destroyed which
      leads to a stack trace on pci device remove:
      
      [ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
      [ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110
      [ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0
      [ 748.002171] ------------[ cut here ]------------
      [ 748.001177] FW pages counter is 4 after reclaiming all pages
      [ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]                      [  +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core]
      [ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1
      [ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]
      [ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9
      [ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286
      [ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000
      [ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51
      [ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8
      [ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30
      [ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000
      [ 748.001429] FS:  00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000
      [ 748.001695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0
      [ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 748.001654] Call Trace:
      [ 748.000576]  ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core]
      [ 748.001416]  ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core]
      [ 748.001354]  ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core]
      [ 748.001203]  mlx5_function_teardown+0x30/0x60 [mlx5_core]
      [ 748.001275]  mlx5_uninit_one+0xa7/0xc0 [mlx5_core]
      [ 748.001200]  remove_one+0x5f/0xc0 [mlx5_core]
      [ 748.001075]  pci_device_remove+0x9f/0x1d0
      [ 748.000833]  device_release_driver_internal+0x1e0/0x490
      [ 748.001207]  unbind_store+0x19f/0x200
      [ 748.000942]  ? sysfs_file_ops+0x170/0x170
      [ 748.001000]  kernfs_fop_write_iter+0x2bc/0x450
      [ 748.000970]  new_sync_write+0x373/0x610
      [ 748.001124]  ? new_sync_read+0x600/0x600
      [ 748.001057]  ? lock_acquire+0x4d6/0x700
      [ 748.000908]  ? lockdep_hardirqs_on_prepare+0x400/0x400
      [ 748.001126]  ? fd_install+0x1c9/0x4d0
      [ 748.000951]  vfs_write+0x4d0/0x800
      [ 748.000804]  ksys_write+0xf9/0x1d0
      [ 748.000868]  ? __x64_sys_read+0xb0/0xb0
      [ 748.000811]  ? filp_open+0x50/0x50
      [ 748.000919]  ? syscall_enter_from_user_mode+0x1d/0x50
      [ 748.001223]  do_syscall_64+0x3f/0x80
      [ 748.000892]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 748.001026] RIP: 0033:0x7f58bcfb22f7
      [ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7
      [ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003
      [ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001
      [ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d
      [ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700
      [ 748.001564] irq event stamp: 0
      [ 748.000787] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0
      [ 748.001854] softirqs last  enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0
      [ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [ 748.001492] ---[ end trace a6fabd773d1c51ae ]---
      
      Fix by destroying the send queue of a hairpin peer net device that is
      being removed/unbound, which returns the allocated ring buffer pages to
      the host.
      
      Fixes: 4d8fcf21 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules")
      Signed-off-by: NDima Chumak <dchumak@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c17cc0e6
    • M
      blk-mq: add new API of blk_mq_hctx_set_fq_lock_class · 5476d7ce
      Ming Lei 提交于
      mainline inclusion
      from mainline-5.11-rc1
      commit fb01a293
      category: bugfix
      bugzilla: 108493
      CVE: NA
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fb01a2932e81a1fb2273f87ff92dc8172b8880ee
      
      ---------------------------
      
      flush_end_io() may be called recursively from some driver, such as
      nvme-loop, so lockdep may complain 'possible recursive locking'.
      Commit b3c6a599("block: Fix a lockdep complaint triggered by
      request queue flushing") tried to address this issue by assigning
      dynamically allocated per-flush-queue lock class. This solution
      adds synchronize_rcu() for each hctx's release handler, and causes
      horrible SCSI MQ probe delay(more than half an hour on megaraid sas).
      
      Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so
      we just need to use driver specific lock class for avoiding the
      lockdep warning of 'possible recursive locking'.
      Tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Reported-by: NQian Cai <cai@redhat.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Hannes Reinecke <hare@suse.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5476d7ce
  3. 03 7月, 2021 13 次提交
  4. 15 6月, 2021 7 次提交
  5. 04 6月, 2021 9 次提交