1. 10 6月, 2021 2 次提交
  2. 27 5月, 2021 4 次提交
  3. 26 5月, 2021 2 次提交
    • T
      SUNRPC: More fixes for backlog congestion · e86be3a0
      Trond Myklebust 提交于
      Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
      as socket based transports.
      Ensure we always initialise the request after waking up from the backlog
      list.
      
      Fixes: e877a88d ("SUNRPC in case of backlog, hand free slots directly to waiting task")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      e86be3a0
    • V
      net: zero-initialize tc skb extension on allocation · 9453d45e
      Vlad Buslov 提交于
      Function skb_ext_add() doesn't initialize created skb extension with any
      value and leaves it up to the user. However, since extension of type
      TC_SKB_EXT originally contained only single value tc_skb_ext->chain its
      users used to just assign the chain value without setting whole extension
      memory to zero first. This assumption changed when TC_SKB_EXT extension was
      extended with additional fields but not all users were updated to
      initialize the new fields which leads to use of uninitialized memory
      afterwards. UBSAN log:
      
      [  778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28
      [  778.301495] load of value 107 is not a valid value for type '_Bool'
      [  778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2
      [  778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  778.307901] Call Trace:
      [  778.308680]  <IRQ>
      [  778.309358]  dump_stack+0xbb/0x107
      [  778.310307]  ubsan_epilogue+0x5/0x40
      [  778.311167]  __ubsan_handle_load_invalid_value.cold+0x43/0x48
      [  778.312454]  ? memset+0x20/0x40
      [  778.313230]  ovs_flow_key_extract.cold+0xf/0x14 [openvswitch]
      [  778.314532]  ovs_vport_receive+0x19e/0x2e0 [openvswitch]
      [  778.315749]  ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch]
      [  778.317188]  ? create_prof_cpu_mask+0x20/0x20
      [  778.318220]  ? arch_stack_walk+0x82/0xf0
      [  778.319153]  ? secondary_startup_64_no_verify+0xb0/0xbb
      [  778.320399]  ? stack_trace_save+0x91/0xc0
      [  778.321362]  ? stack_trace_consume_entry+0x160/0x160
      [  778.322517]  ? lock_release+0x52e/0x760
      [  778.323444]  netdev_frame_hook+0x323/0x610 [openvswitch]
      [  778.324668]  ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch]
      [  778.325950]  __netif_receive_skb_core+0x771/0x2db0
      [  778.327067]  ? lock_downgrade+0x6e0/0x6f0
      [  778.328021]  ? lock_acquire+0x565/0x720
      [  778.328940]  ? generic_xdp_tx+0x4f0/0x4f0
      [  778.329902]  ? inet_gro_receive+0x2a7/0x10a0
      [  778.330914]  ? lock_downgrade+0x6f0/0x6f0
      [  778.331867]  ? udp4_gro_receive+0x4c4/0x13e0
      [  778.332876]  ? lock_release+0x52e/0x760
      [  778.333808]  ? dev_gro_receive+0xcc8/0x2380
      [  778.334810]  ? lock_downgrade+0x6f0/0x6f0
      [  778.335769]  __netif_receive_skb_list_core+0x295/0x820
      [  778.336955]  ? process_backlog+0x780/0x780
      [  778.337941]  ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core]
      [  778.339613]  ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0
      [  778.341033]  ? kvm_clock_get_cycles+0x14/0x20
      [  778.342072]  netif_receive_skb_list_internal+0x5f5/0xcb0
      [  778.343288]  ? __kasan_kmalloc+0x7a/0x90
      [  778.344234]  ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core]
      [  778.345676]  ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core]
      [  778.347140]  ? __netif_receive_skb_list_core+0x820/0x820
      [  778.348351]  ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core]
      [  778.349688]  ? napi_gro_flush+0x26c/0x3c0
      [  778.350641]  napi_complete_done+0x188/0x6b0
      [  778.351627]  mlx5e_napi_poll+0x373/0x1b80 [mlx5_core]
      [  778.352853]  __napi_poll+0x9f/0x510
      [  778.353704]  ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core]
      [  778.355158]  net_rx_action+0x34c/0xa40
      [  778.356060]  ? napi_threaded_poll+0x3d0/0x3d0
      [  778.357083]  ? sched_clock_cpu+0x18/0x190
      [  778.358041]  ? __common_interrupt+0x8e/0x1a0
      [  778.359045]  __do_softirq+0x1ce/0x984
      [  778.359938]  __irq_exit_rcu+0x137/0x1d0
      [  778.360865]  irq_exit_rcu+0xa/0x20
      [  778.361708]  common_interrupt+0x80/0xa0
      [  778.362640]  </IRQ>
      [  778.363212]  asm_common_interrupt+0x1e/0x40
      [  778.364204] RIP: 0010:native_safe_halt+0xe/0x10
      [  778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00
      [  778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246
      [  778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468
      [  778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e
      [  778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb
      [  778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000
      [  778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000
      [  778.378491]  ? rcu_eqs_enter.constprop.0+0xb8/0xe0
      [  778.379606]  ? default_idle_call+0x5e/0xe0
      [  778.380578]  default_idle+0xa/0x10
      [  778.381406]  default_idle_call+0x96/0xe0
      [  778.382350]  do_idle+0x3d4/0x550
      [  778.383153]  ? arch_cpu_idle_exit+0x40/0x40
      [  778.384143]  cpu_startup_entry+0x19/0x20
      [  778.385078]  start_kernel+0x3c7/0x3e5
      [  778.385978]  secondary_startup_64_no_verify+0xb0/0xbb
      
      Fix the issue by providing new function tc_skb_ext_alloc() that allocates
      tc skb extension and initializes its memory to 0 before returning it to the
      caller. Change all existing users to use new API instead of calling
      skb_ext_add() directly.
      
      Fixes: 038ebb1a ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct")
      Fixes: d29334c1 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Acked-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9453d45e
  4. 25 5月, 2021 2 次提交
  5. 23 5月, 2021 1 次提交
  6. 22 5月, 2021 1 次提交
  7. 20 5月, 2021 1 次提交
  8. 19 5月, 2021 8 次提交
  9. 18 5月, 2021 1 次提交
    • D
      NFC: nci: fix memory leak in nci_allocate_device · e0652f8b
      Dongliang Mu 提交于
      nfcmrvl_disconnect fails to free the hci_dev field in struct nci_dev.
      Fix this by freeing hci_dev in nci_free_device.
      
      BUG: memory leak
      unreferenced object 0xffff888111ea6800 (size 1024):
        comm "kworker/1:0", pid 19, jiffies 4294942308 (age 13.580s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 60 fd 0c 81 88 ff ff  .........`......
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000004bc25d43>] kmalloc include/linux/slab.h:552 [inline]
          [<000000004bc25d43>] kzalloc include/linux/slab.h:682 [inline]
          [<000000004bc25d43>] nci_hci_allocate+0x21/0xd0 net/nfc/nci/hci.c:784
          [<00000000c59cff92>] nci_allocate_device net/nfc/nci/core.c:1170 [inline]
          [<00000000c59cff92>] nci_allocate_device+0x10b/0x160 net/nfc/nci/core.c:1132
          [<00000000006e0a8e>] nfcmrvl_nci_register_dev+0x10a/0x1c0 drivers/nfc/nfcmrvl/main.c:153
          [<000000004da1b57e>] nfcmrvl_probe+0x223/0x290 drivers/nfc/nfcmrvl/usb.c:345
          [<00000000d506aed9>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396
          [<00000000bc632c92>] really_probe+0x159/0x4a0 drivers/base/dd.c:554
          [<00000000f5009125>] driver_probe_device+0x84/0x100 drivers/base/dd.c:740
          [<000000000ce658ca>] __device_attach_driver+0xee/0x110 drivers/base/dd.c:846
          [<000000007067d05f>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:431
          [<00000000f8e13372>] __device_attach+0x122/0x250 drivers/base/dd.c:914
          [<000000009cf68860>] bus_probe_device+0xc6/0xe0 drivers/base/bus.c:491
          [<00000000359c965a>] device_add+0x5be/0xc30 drivers/base/core.c:3109
          [<00000000086e4bd3>] usb_set_configuration+0x9d9/0xb90 drivers/usb/core/message.c:2164
          [<00000000ca036872>] usb_generic_driver_probe+0x8c/0xc0 drivers/usb/core/generic.c:238
          [<00000000d40d36f6>] usb_probe_device+0x5c/0x140 drivers/usb/core/driver.c:293
          [<00000000bc632c92>] really_probe+0x159/0x4a0 drivers/base/dd.c:554
      
      Reported-by: syzbot+19bcfc64a8df1318d1c3@syzkaller.appspotmail.com
      Fixes: 11f54f22 ("NFC: nci: Add HCI over NCI protocol support")
      Signed-off-by: NDongliang Mu <mudongliangabcd@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0652f8b
  10. 15 5月, 2021 5 次提交
    • M
      mm/filemap: fix readahead return types · 076171a6
      Matthew Wilcox (Oracle) 提交于
      A readahead request will not allocate more memory than can be represented
      by a size_t, even on systems that have HIGHMEM available.  Change the
      length functions from returning an loff_t to a size_t.
      
      Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org
      Fixes: 32c0a6bc ("btrfs: add and use readahead_batch_length")
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      076171a6
    • M
      mm: fix struct page layout on 32-bit systems · 9ddb3c14
      Matthew Wilcox (Oracle) 提交于
      32-bit architectures which expect 8-byte alignment for 8-byte integers and
      need 64-bit DMA addresses (arm, mips, ppc) had their struct page
      inadvertently expanded in 2019.  When the dma_addr_t was added, it forced
      the alignment of the union to 8 bytes, which inserted a 4 byte gap between
      'flags' and the union.
      
      Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
      This restores the alignment to that of an unsigned long.  We always
      store the low bits in the first word to prevent the PageTail bit from
      being inadvertently set on a big endian platform.  If that happened,
      get_user_pages_fast() racing against a page which was freed and
      reallocated to the page_pool could dereference a bogus compound_head(),
      which would be hard to trace back to this cause.
      
      Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org
      Fixes: c25fff71 ("mm: add dma_addr_t to struct page")
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NMatteo Croce <mcroce@linux.microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ddb3c14
    • P
      mm/hugetlb: fix F_SEAL_FUTURE_WRITE · 22247efd
      Peter Xu 提交于
      Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.
      
      Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
      hugetlbfs, which I can easily verify using the memfd_test program, which
      seems that the program is hardly run with hugetlbfs pages (as by default
      shmem).
      
      Meanwhile I found another probably even more severe issue on that hugetlb
      fork won't wr-protect child cow pages, so child can potentially write to
      parent private pages.  Patch 2 addresses that.
      
      After this series applied, "memfd_test hugetlbfs" should start to pass.
      
      This patch (of 2):
      
      F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
      There is a test program for that and it fails constantly.
      
      $ ./memfd_test hugetlbfs
      memfd-hugetlb: CREATE
      memfd-hugetlb: BASIC
      memfd-hugetlb: SEAL-WRITE
      memfd-hugetlb: SEAL-FUTURE-WRITE
      mmap() didn't fail as expected
      Aborted (core dumped)
      
      I think it's probably because no one is really running the hugetlbfs test.
      
      Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we
      do in shmem_mmap().  Generalize a helper for that.
      
      Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com
      Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com
      Fixes: ab3948f5 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22247efd
    • Y
      net: sched: fix tx action rescheduling issue during deactivation · 102b55ee
      Yunsheng Lin 提交于
      Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
      qdisc before calling __qdisc_run(), which ultimately clear the
      STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
      is set before clearing STATE_MISSED, there may be rescheduling
      of net_tx_action() at the end of qdisc_run_end(), see below:
      
      CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
                .                   .                     .
                .            set STATE_MISSED             .
                .           __netif_schedule()            .
                .                   .           set STATE_DEACTIVATED
                .                   .                qdisc_reset()
                .                   .                     .
                .<---------------   .              synchronize_net()
      clear __QDISC_STATE_SCHED  |  .                     .
                .                |  .                     .
                .                |  .            some_qdisc_is_busy()
                .                |  .               return *false*
                .                |  .                     .
        test STATE_DEACTIVATED   |  .                     .
      __qdisc_run() *not* called |  .                     .
                .                |  .                     .
         test STATE_MISS         |  .                     .
       __netif_schedule()--------|  .                     .
                .                   .                     .
                .                   .                     .
      
      __qdisc_run() is not called by net_tx_atcion() in CPU0 because
      CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
      STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
      is called at the end of qdisc_run_end(), causing tx action
      rescheduling problem.
      
      qdisc_run() called by net_tx_action() runs in the softirq context,
      which should has the same semantic as the qdisc_run() called by
      __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
      synchronize_net() between STATE_DEACTIVATED flag being set and
      qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
      bail out for the deactived lockless qdisc in net_tx_action(), and
      qdisc_reset() will reset all skb not dequeued yet.
      
      So add the rcu_read_lock() explicitly to protect the qdisc_run()
      and do the STATE_DEACTIVATED checking in net_tx_action() before
      calling qdisc_run_begin(). Another option is to do the checking in
      the qdisc_run_end(), but it will add unnecessary overhead for
      non-tx_action case, because __dev_queue_xmit() will not see qdisc
      with STATE_DEACTIVATED after synchronize_net(), the qdisc with
      STATE_DEACTIVATED can only be seen by net_tx_action() because of
      __netif_schedule().
      
      The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
      between net_tx_action() and qdisc_reset(), see:
      commit d518d2ed ("net/sched: fix race between deactivation
      and dequeue for NOLOCK qdisc"). As the bailout added above for
      deactived lockless qdisc in net_tx_action() provides better
      protection for the race without calling qdisc_run() at all, so
      remove the STATE_DEACTIVATED checking in qdisc_run().
      
      After qdisc_reset(), there is no skb in qdisc to be dequeued, so
      clear the STATE_MISSED in dev_reset_queue() too.
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      V8: Clearing STATE_MISSED before calling __netif_schedule() has
          avoid the endless rescheduling problem, but there may still
          be a unnecessary rescheduling, so adjust the commit log.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      102b55ee
    • Y
      net: sched: fix packet stuck problem for lockless qdisc · a90c57f2
      Yunsheng Lin 提交于
      Lockless qdisc has below concurrent problem:
          cpu0                 cpu1
           .                     .
      q->enqueue                 .
           .                     .
      qdisc_run_begin()          .
           .                     .
      dequeue_skb()              .
           .                     .
      sch_direct_xmit()          .
           .                     .
           .                q->enqueue
           .             qdisc_run_begin()
           .            return and do nothing
           .                     .
      qdisc_run_end()            .
      
      cpu1 enqueue a skb without calling __qdisc_run() because cpu0
      has not released the lock yet and spin_trylock() return false
      for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
      enqueued by cpu1 when calling dequeue_skb() because cpu1 may
      enqueue the skb after cpu0 calling dequeue_skb() and before
      cpu0 calling qdisc_run_end().
      
      Lockless qdisc has below another concurrent problem when
      tx_action is involved:
      
      cpu0(serving tx_action)     cpu1             cpu2
                .                   .                .
                .              q->enqueue            .
                .            qdisc_run_begin()       .
                .              dequeue_skb()         .
                .                   .            q->enqueue
                .                   .                .
                .             sch_direct_xmit()      .
                .                   .         qdisc_run_begin()
                .                   .       return and do nothing
                .                   .                .
       clear __QDISC_STATE_SCHED    .                .
       qdisc_run_begin()            .                .
       return and do nothing        .                .
                .                   .                .
                .            qdisc_run_end()         .
      
      This patch fixes the above data race by:
      1. If the first spin_trylock() return false and STATE_MISSED is
         not set, set STATE_MISSED and retry another spin_trylock() in
         case other CPU may not see STATE_MISSED after it releases the
         lock.
      2. reschedule if STATE_MISSED is set after the lock is released
         at the end of qdisc_run_end().
      
      For tx_action case, STATE_MISSED is also set when cpu1 is at the
      end if qdisc_run_end(), so tx_action will be rescheduled again
      to dequeue the skb enqueued by cpu2.
      
      Clear STATE_MISSED before retrying a dequeuing when dequeuing
      returns NULL in order to reduce the overhead of the second
      spin_trylock() and __netif_schedule() calling.
      
      Also clear the STATE_MISSED before calling __netif_schedule()
      at the end of qdisc_run_end() to avoid doing another round of
      dequeuing in the pfifo_fast_dequeue().
      
      The performance impact of this patch, tested using pktgen and
      dummy netdev with pfifo_fast qdisc attached:
      
       threads  without+this_patch   with+this_patch      delta
          1        2.61Mpps            2.60Mpps           -0.3%
          2        3.97Mpps            3.82Mpps           -3.7%
          4        5.62Mpps            5.59Mpps           -0.5%
          8        2.78Mpps            2.77Mpps           -0.3%
         16        2.22Mpps            2.22Mpps           -0.0%
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a90c57f2
  11. 14 5月, 2021 4 次提交
  12. 13 5月, 2021 2 次提交
  13. 12 5月, 2021 3 次提交
  14. 11 5月, 2021 4 次提交
    • O
      kyber: fix out of bounds access when preempted · efed9a33
      Omar Sandoval 提交于
      __blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and
      passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx
      for the current CPU again and uses that to get the corresponding Kyber
      context in the passed hctx. However, the thread may be preempted between
      the two calls to blk_mq_get_ctx(), and the ctx returned the second time
      may no longer correspond to the passed hctx. This "works" accidentally
      most of the time, but it can cause us to read garbage if the second ctx
      came from an hctx with more ctx's than the first one (i.e., if
      ctx->index_hw[hctx->type] > hctx->nr_ctx).
      
      This manifested as this UBSAN array index out of bounds error reported
      by Jakub:
      
      UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9
      index 13106 is out of range for type 'long unsigned int [128]'
      Call Trace:
       dump_stack+0xa4/0xe5
       ubsan_epilogue+0x5/0x40
       __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34
       queued_spin_lock_slowpath+0x476/0x480
       do_raw_spin_lock+0x1c2/0x1d0
       kyber_bio_merge+0x112/0x180
       blk_mq_submit_bio+0x1f5/0x1100
       submit_bio_noacct+0x7b0/0x870
       submit_bio+0xc2/0x3a0
       btrfs_map_bio+0x4f0/0x9d0
       btrfs_submit_data_bio+0x24e/0x310
       submit_one_bio+0x7f/0xb0
       submit_extent_page+0xc4/0x440
       __extent_writepage_io+0x2b8/0x5e0
       __extent_writepage+0x28d/0x6e0
       extent_write_cache_pages+0x4d7/0x7a0
       extent_writepages+0xa2/0x110
       do_writepages+0x8f/0x180
       __writeback_single_inode+0x99/0x7f0
       writeback_sb_inodes+0x34e/0x790
       __writeback_inodes_wb+0x9e/0x120
       wb_writeback+0x4d2/0x660
       wb_workfn+0x64d/0xa10
       process_one_work+0x53a/0xa80
       worker_thread+0x69/0x5b0
       kthread+0x20b/0x240
       ret_from_fork+0x1f/0x30
      
      Only Kyber uses the hctx, so fix it by passing the request_queue to
      ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can
      map the queues itself to avoid the mismatch.
      
      Fixes: a6088845 ("block: kyber: make kyber more friendly with merging")
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      efed9a33
    • A
      spi: Switch to signed types for *_native_cs SPI controller fields · 35f3f850
      Andy Shevchenko 提交于
      While fixing undefined behaviour the commit f60d7270 ("spi: Avoid
      undefined behaviour when counting unused native CSs") missed the case
      when all CSs are GPIOs and thus unused_native_cs will be evaluated to
      -1 in unsigned representation. This will falsely trigger a condition
      in the spi_get_gpio_descs().
      
      Switch to signed types for *_native_cs SPI controller fields to fix above.
      
      Fixes: f60d7270 ("spi: Avoid undefined behaviour when counting unused native CSs")
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Link: https://lore.kernel.org/r/20210510131242.49455-1-andriy.shevchenko@linux.intel.comSigned-off-by: NMark Brown <broonie@kernel.org>
      35f3f850
    • N
      stack: Replace "o" output with "r" input constraint · 2515dd6c
      Nick Desaulniers 提交于
      "o" isn't a common asm() constraint to use; it triggers an assertion in
      assert-enabled builds of LLVM that it's not recognized when targeting
      aarch64 (though it appears to fall back to "m"). It's fixed in LLVM 13 now,
      but there isn't really a good reason to use "o" in particular here. To
      avoid causing build issues for those using assert-enabled builds of earlier
      LLVM versions, the constraint needs changing.
      
      Instead, if the point is to retain the __builtin_alloca(), make ptr appear
      to "escape" via being an input to an empty inline asm block. This is
      preferable anyways, since otherwise this looks like a dead store.
      
      While the use of "r" was considered in
      
        https://lore.kernel.org/lkml/202104011447.2E7F543@keescook/
      
      it was only tested as an output (which looks like a dead store, and wasn't
      sufficient).
      
      Use "r" as an input constraint instead, which behaves correctly across
      compilers and architectures.
      
      Fixes: 39218ff4 ("stack: Optionally randomize kernel stack offset each syscall")
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKees Cook <keescook@chromium.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Link: https://reviews.llvm.org/D100412
      Link: https://bugs.llvm.org/show_bug.cgi?id=49956
      Link: https://lore.kernel.org/r/20210419231741.4084415-1-keescook@chromium.org
      2515dd6c
    • T
      PM: runtime: Fix unpaired parent child_count for force_resume · c745253e
      Tony Lindgren 提交于
      As pm_runtime_need_not_resume() relies also on usage_count, it can return
      a different value in pm_runtime_force_suspend() compared to when called in
      pm_runtime_force_resume(). Different return values can happen if anything
      calls PM runtime functions in between, and causes the parent child_count
      to increase on every resume.
      
      So far I've seen the issue only for omapdrm that does complicated things
      with PM runtime calls during system suspend for legacy reasons:
      
      omap_atomic_commit_tail() for omapdrm.0
       dispc_runtime_get()
        wakes up 58000000.dss as it's the dispc parent
         dispc_runtime_resume()
          rpm_resume() increases parent child_count
       dispc_runtime_put() won't idle, PM runtime suspend blocked
      pm_runtime_force_suspend() for 58000000.dss, !pm_runtime_need_not_resume()
       __update_runtime_status()
      system suspended
      pm_runtime_force_resume() for 58000000.dss, pm_runtime_need_not_resume()
       pm_runtime_enable() only called because of pm_runtime_need_not_resume()
      omap_atomic_commit_tail() for omapdrm.0
       dispc_runtime_get()
        wakes up 58000000.dss as it's the dispc parent
         dispc_runtime_resume()
          rpm_resume() increases parent child_count
       dispc_runtime_put() won't idle, PM runtime suspend blocked
      ...
      rpm_suspend for 58000000.dss but parent child_count is now unbalanced
      
      Let's fix the issue by adding a flag for needs_force_resume and use it in
      pm_runtime_force_resume() instead of pm_runtime_need_not_resume().
      
      Additionally omapdrm system suspend could be simplified later on to avoid
      lots of unnecessary PM runtime calls and the complexity it adds. The
      driver can just use internal functions that are shared between the PM
      runtime and system suspend related functions.
      
      Fixes: 4918e1f8 ("PM / runtime: Rework pm_runtime_force_suspend/resume()")
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NTomi Valkeinen <tomi.valkeinen@ideasonboard.com>
      Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c745253e