1. 08 1月, 2021 9 次提交
    • J
      blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED · 02f938e9
      John Garry 提交于
      Showing the hctx flags for when BLK_MQ_F_TAG_HCTX_SHARED is set gives
      something like:
      
      root@debian:/home/john# more /sys/kernel/debug/block/sda/hctx0/flags
      alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED|3
      
      Add the decoding for that flag.
      
      Fixes: 32bc15af ("blk-mq: Facilitate a shared sbitmap per tagset")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      02f938e9
    • J
      block/rnbd-clt: avoid module unload race with close confirmation · 3a21777c
      Jack Wang 提交于
      We had kernel panic, it is caused by unload module and last
      close confirmation.
      
      call trace:
      [1196029.743127]  free_sess+0x15/0x50 [rtrs_client]
      [1196029.743128]  rtrs_clt_close+0x4c/0x70 [rtrs_client]
      [1196029.743129]  ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
      [1196029.743130]  close_rtrs+0x25/0x50 [rnbd_client]
      [1196029.743131]  rnbd_client_exit+0x93/0xb99 [rnbd_client]
      [1196029.743132]  __x64_sys_delete_module+0x190/0x260
      
      And in the crashdump confirmation kworker is also running.
      PID: 6943   TASK: ffff9e2ac8098000  CPU: 4   COMMAND: "kworker/4:2"
       #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
       #1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
       #2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
       #3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
       #4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
       #5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
       #6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
       #7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
       #8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
       #9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]
      
      The problem is both code path try to close same session, which lead to
      panic.
      
      To fix it, just skip the sess if the refcount already drop to 0.
      
      Fixes: f7a7a5c2 ("block/rnbd: client: main functionality")
      Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Reviewed-by: NGioh Kim <gi-oh.kim@cloud.ionos.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a21777c
    • S
      block/rnbd: Adding name to the Contributors List · ef8048dd
      Swapnil Ingle 提交于
      Adding name to the Contributors List
      Signed-off-by: NSwapnil Ingle <ingleswapnil@gmail.com>
      Acked-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Acked-by: NDanil Kipnis <danil.kipnis@cloud.ionos.com>
      Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ef8048dd
    • G
      block/rnbd-clt: Fix sg table use after free · 80f99093
      Guoqing Jiang 提交于
      Since dynamically allocate sglist is used for rnbd_iu, we can't free sg
      table after send_usr_msg since the callback function (cqe.done) could
      still access the sglist.
      
      Otherwise KASAN reports UAF issue:
      
      [ 4856.600257] BUG: KASAN: use-after-free in dma_direct_unmap_sg+0x53/0x290
      [ 4856.600772] Read of size 4 at addr ffff888206af3a98 by task swapper/1/0
      
      [ 4856.601729] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G        W         5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
      [ 4856.601748] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
      [ 4856.601766] Call Trace:
      [ 4856.601785]  <IRQ>
      [ 4856.601822]  dump_stack+0x99/0xcb
      [ 4856.601856]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.601888]  print_address_description.constprop.7+0x1e/0x230
      [ 4856.601913]  ? freeze_kernel_threads+0x73/0x73
      [ 4856.601965]  ? mark_held_locks+0x29/0xa0
      [ 4856.602019]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.602039]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.602079]  kasan_report.cold.9+0x37/0x7c
      [ 4856.602188]  ? mlx5_ib_post_recv+0x430/0x520 [mlx5_ib]
      [ 4856.602209]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.602256]  dma_direct_unmap_sg+0x53/0x290
      [ 4856.602366]  complete_rdma_req+0x188/0x4b0 [rtrs_client]
      [ 4856.602451]  ? rtrs_clt_close+0x80/0x80 [rtrs_client]
      [ 4856.602535]  ? mlx5_ib_poll_cq+0x48b/0x16e0 [mlx5_ib]
      [ 4856.602589]  ? radix_tree_insert+0x3a0/0x3a0
      [ 4856.602610]  ? do_raw_spin_lock+0x119/0x1d0
      [ 4856.602647]  ? rwlock_bug.part.1+0x60/0x60
      [ 4856.602740]  rtrs_clt_rdma_done+0x3f7/0x670 [rtrs_client]
      [ 4856.602804]  ? rtrs_clt_rdma_cm_handler+0xda0/0xda0 [rtrs_client]
      [ 4856.602857]  ? check_flags.part.31+0x6c/0x1f0
      [ 4856.602927]  ? rcu_read_lock_sched_held+0xaf/0xe0
      [ 4856.602963]  ? rcu_read_lock_bh_held+0xc0/0xc0
      [ 4856.603137]  __ib_process_cq+0x10a/0x350 [ib_core]
      [ 4856.603309]  ib_poll_handler+0x41/0x1c0 [ib_core]
      [ 4856.603358]  irq_poll_softirq+0xe6/0x280
      [ 4856.603392]  ? lockdep_hardirqs_on_prepare+0x111/0x210
      [ 4856.603446]  __do_softirq+0x10d/0x646
      [ 4856.603540]  asm_call_irq_on_stack+0x12/0x20
      [ 4856.603563]  </IRQ>
      
      [ 4856.605096] Allocated by task 8914:
      [ 4856.605510]  kasan_save_stack+0x19/0x40
      [ 4856.605532]  __kasan_kmalloc.constprop.7+0xc1/0xd0
      [ 4856.605552]  __kmalloc+0x155/0x320
      [ 4856.605574]  __sg_alloc_table+0x155/0x1c0
      [ 4856.605594]  sg_alloc_table+0x1f/0x50
      [ 4856.605620]  send_msg_sess_info+0x119/0x2e0 [rnbd_client]
      [ 4856.605646]  remap_devs+0x71/0x210 [rnbd_client]
      [ 4856.605676]  init_sess+0xad8/0xe10 [rtrs_client]
      [ 4856.605706]  rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
      [ 4856.605728]  process_one_work+0x521/0xa90
      [ 4856.605748]  worker_thread+0x65/0x5b0
      [ 4856.605769]  kthread+0x1f2/0x210
      [ 4856.605789]  ret_from_fork+0x22/0x30
      
      [ 4856.606159] Freed by task 8914:
      [ 4856.606559]  kasan_save_stack+0x19/0x40
      [ 4856.606580]  kasan_set_track+0x1c/0x30
      [ 4856.606601]  kasan_set_free_info+0x1b/0x30
      [ 4856.606622]  __kasan_slab_free+0x108/0x150
      [ 4856.606642]  slab_free_freelist_hook+0x64/0x190
      [ 4856.606661]  kfree+0xe2/0x650
      [ 4856.606681]  __sg_free_table+0xa4/0x100
      [ 4856.606707]  send_msg_sess_info+0x1d6/0x2e0 [rnbd_client]
      [ 4856.606733]  remap_devs+0x71/0x210 [rnbd_client]
      [ 4856.606763]  init_sess+0xad8/0xe10 [rtrs_client]
      [ 4856.606792]  rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
      [ 4856.606813]  process_one_work+0x521/0xa90
      [ 4856.606833]  worker_thread+0x65/0x5b0
      [ 4856.606853]  kthread+0x1f2/0x210
      [ 4856.606872]  ret_from_fork+0x22/0x30
      
      The solution is to free iu's sgtable after the iu is not used anymore.
      And also move sg_alloc_table into rnbd_get_iu accordingly.
      
      Fixes: 5a1328d0 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
      Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      80f99093
    • J
      block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close · 1a84e7c6
      Jack Wang 提交于
      KASAN detect following BUG:
      [  778.215311] ==================================================================
      [  778.216696] BUG: KASAN: use-after-free in rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.219037] Read of size 8 at addr ffff88b1d6516c28 by task tee/8842
      
      [  778.220500] CPU: 37 PID: 8842 Comm: tee Kdump: loaded Not tainted 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
      [  778.220529] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
      [  778.220555] Call Trace:
      [  778.220609]  dump_stack+0x99/0xcb
      [  778.220667]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.220715]  print_address_description.constprop.7+0x1e/0x230
      [  778.220750]  ? freeze_kernel_threads+0x73/0x73
      [  778.220896]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.220932]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.220994]  kasan_report.cold.9+0x37/0x7c
      [  778.221066]  ? kobject_put+0x80/0x270
      [  778.221102]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.221184]  rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.221240]  rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
      [  778.221304]  ? sysfs_file_ops+0x90/0x90
      [  778.221353]  kernfs_fop_write+0x141/0x240
      [  778.221451]  vfs_write+0x142/0x4d0
      [  778.221553]  ksys_write+0xc0/0x160
      [  778.221602]  ? __ia32_sys_read+0x50/0x50
      [  778.221684]  ? lockdep_hardirqs_on_prepare+0x13d/0x210
      [  778.221718]  ? syscall_enter_from_user_mode+0x1c/0x50
      [  778.221821]  do_syscall_64+0x33/0x40
      [  778.221862]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  778.221896] RIP: 0033:0x7f4affdd9504
      [  778.221928] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
      [  778.221956] RSP: 002b:00007fffebb36b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  778.222011] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4affdd9504
      [  778.222038] RDX: 0000000000000002 RSI: 00007fffebb36c50 RDI: 0000000000000003
      [  778.222066] RBP: 00007fffebb36c50 R08: 0000556a151aa600 R09: 00007f4affeb1540
      [  778.222094] R10: fffffffffffffc19 R11: 0000000000000246 R12: 0000556a151aa520
      [  778.222121] R13: 0000000000000002 R14: 00007f4affea6760 R15: 0000000000000002
      
      [  778.222764] Allocated by task 3212:
      [  778.223285]  kasan_save_stack+0x19/0x40
      [  778.223316]  __kasan_kmalloc.constprop.7+0xc1/0xd0
      [  778.223347]  kmem_cache_alloc_trace+0x186/0x350
      [  778.223382]  rnbd_srv_rdma_ev+0xf16/0x1690 [rnbd_server]
      [  778.223422]  process_io_req+0x4d1/0x670 [rtrs_server]
      [  778.223573]  __ib_process_cq+0x10a/0x350 [ib_core]
      [  778.223709]  ib_cq_poll_work+0x31/0xb0 [ib_core]
      [  778.223743]  process_one_work+0x521/0xa90
      [  778.223773]  worker_thread+0x65/0x5b0
      [  778.223802]  kthread+0x1f2/0x210
      [  778.223833]  ret_from_fork+0x22/0x30
      
      [  778.224296] Freed by task 8842:
      [  778.224800]  kasan_save_stack+0x19/0x40
      [  778.224829]  kasan_set_track+0x1c/0x30
      [  778.224860]  kasan_set_free_info+0x1b/0x30
      [  778.224889]  __kasan_slab_free+0x108/0x150
      [  778.224919]  slab_free_freelist_hook+0x64/0x190
      [  778.224947]  kfree+0xe2/0x650
      [  778.224982]  rnbd_destroy_sess_dev+0x2fa/0x3b0 [rnbd_server]
      [  778.225011]  kobject_put+0xda/0x270
      [  778.225046]  rnbd_srv_sess_dev_force_close+0x30/0x60 [rnbd_server]
      [  778.225081]  rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
      [  778.225111]  kernfs_fop_write+0x141/0x240
      [  778.225140]  vfs_write+0x142/0x4d0
      [  778.225169]  ksys_write+0xc0/0x160
      [  778.225198]  do_syscall_64+0x33/0x40
      [  778.225227]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  778.226506] The buggy address belongs to the object at ffff88b1d6516c00
                      which belongs to the cache kmalloc-512 of size 512
      [  778.227464] The buggy address is located 40 bytes inside of
                      512-byte region [ffff88b1d6516c00, ffff88b1d6516e00)
      
      The problem is in the sess_dev release function we call
      rnbd_destroy_sess_dev, and could free the sess_dev already, but we still
      set the keep_id in rnbd_srv_sess_dev_force_close, which lead to use
      after free.
      
      To fix it, move the keep_id before the sysfs removal, and cache the
      rnbd_srv_session for lock accessing,
      
      Fixes: 78699805 ("block/rnbd-srv: close a mapped device from server side.")
      Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Reviewed-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1a84e7c6
    • J
      block/rnbd: Select SG_POOL for RNBD_CLIENT · 74acfa99
      Jack Wang 提交于
      lkp reboot following build error:
       drivers/block/rnbd/rnbd-clt.c: In function 'rnbd_softirq_done_fn':
      >> drivers/block/rnbd/rnbd-clt.c:387:2: error: implicit declaration of function 'sg_free_table_chained' [-Werror=implicit-function-declaration]
           387 |  sg_free_table_chained(&iu->sgt, RNBD_INLINE_SG_CNT);
               |  ^~~~~~~~~~~~~~~~~~~~~
      
      The reason is CONFIG_SG_POOL is not enabled in the config, to
      avoid such failure, select SG_POOL in Kconfig for RNBD_CLIENT.
      
      Fixes: 5a1328d0 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74acfa99
    • C
      block: pre-initialize struct block_device in bdev_alloc_inode · 2d2f6f1b
      Christoph Hellwig 提交于
      bdev_evict_inode and bdev_free_inode are also called for the root inode
      of bdevfs, for which bdev_alloc is never called.  Move the zeroing o
      f struct block_device and the initialization of the bd_bdi field into
      bdev_alloc_inode to make sure they are initialized for the root inode
      as well.
      
      Fixes: e6cb5382 ("block: initialize struct block_device in bdev_alloc")
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2d2f6f1b
    • J
      Merge tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme into block-5.11 · 04b1ecb6
      Jens Axboe 提交于
      Pull NVMe updates from Christoph:
      
      "nvme updates for 5.11:
      
       - fix a race in the nvme-tcp send code (Sagi Grimberg)
       - fix a list corruption in an nvme-rdma error path (Israel Rukshin)
       - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
       - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
       - fix two compiler warnings in nvme-fcloop (James Smart)
       - don't call sleeping functions from irq context in nvme-fc (James Smart)
       - remove an unused argument (Max Gurtovoy)
       - remove unused exports (Minwoo Im)"
      
      * tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme:
        nvme: remove the unused status argument from nvme_trace_bio_complete
        nvmet-rdma: Fix list_del corruption on queue establishment failure
        nvme: unexport functions with no external caller
        nvme: avoid possible double fetch in handling CQE
        nvme-tcp: Fix possible race of io_work and direct send
        nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
        nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
        nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context
      04b1ecb6
    • S
      fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb · 04a6a536
      Satya Tangirala 提交于
      freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
      whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
      bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
      bd_fsfreeze_sb.
      
      But this means a freeze_bdev() call followed by a thaw_bdev() call can
      leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
      zero. If freeze_bdev() is called again, and this time
      get_active_super() returns NULL (e.g. because the FS is unmounted),
      we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
      *untouched* - it stays the same (now garbage) value. A subsequent
      thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
      (since bd_fsfreeze_count > 0), and attempt to use it.
      
      Fix this by always setting bd_fsfreeze_sb to NULL when
      bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
      Alternatively, we could set bd_fsfreeze_sb to whatever
      get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
      is successfully incremented to 1 from 0 (which can be achieved cleanly
      by moving the line currently setting bd_fsfreeze_sb to immediately
      after the "sync:" label, but it might be a little too subtle/easily
      overlooked in future).
      
      This fixes the currently panicking xfstests generic/085.
      
      Fixes: 040f04bd ("fs: simplify freeze_bdev/thaw_bdev")
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      04a6a536
  2. 06 1月, 2021 11 次提交
    • M
      nvme: remove the unused status argument from nvme_trace_bio_complete · 2b59787a
      Max Gurtovoy 提交于
      The only used argument in this function is the "req".
      Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
      Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      2b59787a
    • I
      nvmet-rdma: Fix list_del corruption on queue establishment failure · 9ceb7863
      Israel Rukshin 提交于
      When a queue is in NVMET_RDMA_Q_CONNECTING state, it may has some
      requests at rsp_wait_list. In case a disconnect occurs at this
      state, no one will empty this list and will return the requests to
      free_rsps list. Normally nvmet_rdma_queue_established() free those
      requests after moving the queue to NVMET_RDMA_Q_LIVE state, but in
      this case __nvmet_rdma_queue_disconnect() is called before. The
      crash happens at nvmet_rdma_free_rsps() when calling
      list_del(&rsp->free_list), because the request exists only at
      the wait list. To fix the issue, simply clear rsp_wait_list when
      destroying the queue.
      Signed-off-by: NIsrael Rukshin <israelr@nvidia.com>
      Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9ceb7863
    • M
      nvme: unexport functions with no external caller · 9b66fc02
      Minwoo Im 提交于
      There are no callers for nvme_reset_ctrl_sync() and
      nvme_alloc_request_qid() so that we keep the symbols exported.
      
      Unexport those functions, mark them static and update the header file
      respectively.
      Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9b66fc02
    • L
      nvme: avoid possible double fetch in handling CQE · 62df8016
      Lalithambika Krishnakumar 提交于
      While handling the completion queue, keep a local copy of the command id
      from the DMA-accessible completion entry. This silences a time-of-check
      to time-of-use (TOCTOU) warning from KF/x[1], with respect to a
      Thunderclap[2] vulnerability analysis. The double-read impact appears
      benign.
      
      There may be a theoretical window for @command_id to be used as an
      adversary-controlled array-index-value for mounting a speculative
      execution attack, but that mitigation is saved for a potential follow-on.
      A man-in-the-middle attack on the data payload is out of scope for this
      analysis and is hopefully mitigated by filesystem integrity mechanisms.
      
      [1] https://github.com/intel/kernel-fuzzer-for-xen-project
      [2] http://thunderclap.io/thunderclap-paper-ndss2019.pdfSigned-off-by: NLalithambika Krishna Kumar <lalithambika.krishnakumar@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      62df8016
    • S
      nvme-tcp: Fix possible race of io_work and direct send · 5c11f7d9
      Sagi Grimberg 提交于
      We may send a request (with or without its data) from two paths:
      
        1. From our I/O context nvme_tcp_io_work which is triggered from:
          - queue_rq
          - r2t reception
          - socket data_ready and write_space callbacks
        2. Directly from queue_rq if the send_list is empty (because we want to
           save the context switch associated with scheduling our io_work).
      
      However, given that now we have the send_mutex, we may run into a race
      condition where none of these contexts will send the pending payload to
      the controller. Both io_work send path and queue_rq send path
      opportunistically attempt to acquire the send_mutex however queue_rq only
      attempts to send a single request, and if io_work context fails to
      acquire the send_mutex it will complete without rescheduling itself.
      
      The race can trigger with the following sequence:
      
        1. queue_rq sends request (no incapsule data) and blocks
        2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
           to the send_list and schedules io_work
        3. io_work triggers and cannot acquire the send_mutex - because of (1),
           ends without self rescheduling
        4. queue_rq completes the send, and completes
      
      ==> no context will send the h2cdata - timeout.
      
      Fix this by having queue_rq sending as much as it can from the send_list
      such that if it still has any left, its because the socket buffer is
      full and the socket write_space callback will trigger, thus guaranteeing
      that a context will be scheduled to send the h2cdata PDU.
      
      Fixes: db5ad6b7 ("nvme-tcp: try to send request in queue_rq context")
      Reported-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Reported-by: NSamuel Jones <sjones@kalrayinc.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      5c11f7d9
    • G
      nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN · 7ee5c78c
      Gopal Tiwari 提交于
      A system with more than one of these SSDs will only have one usable.
      Hence the kernel fails to detect nvme devices due to duplicate cntlids.
      
      [    6.274554] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting
      [    6.274566] nvme nvme1: Removing after probe failure status: -22
      
      Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.
      Signed-off-by: NGopal Tiwari <gtiwari@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      7ee5c78c
    • J
      nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings · 2b54996b
      James Smart 提交于
      Kernel robot had the following warnings:
      
      >> fcloop.c:1506:6: warning: %x in format string (no. 1) requires
      >> 'unsigned int *' but the argument type is 'signed int *'.
      >> [invalidScanfArgType_int]
      >>    if (sscanf(buf, "%x:%d:%d", &opcode, &starting, &amount) != 3)
      >>        ^
      
      Resolve by changing opcode from and int to an unsigned int
      
      and
      
      >>  fcloop.c:1632:32: warning: Uninitialized variable: lport [uninitvar]
      >>     ret = __wait_localport_unreg(lport);
      >>                                  ^
      
      >>  fcloop.c:1615:28: warning: Uninitialized variable: nport [uninitvar]
      >>     ret = __remoteport_unreg(nport, rport);
      >>                              ^
      
      These aren't actual issues as the values are assigned prior to use.
      It appears the tool doesn't understand list_first_entry_or_null().
      Regardless, quiet the tool by initializing the pointers to NULL at
      declaration.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      2b54996b
    • J
      nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context · 19fce047
      James Smart 提交于
      Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
      used to be called from a timeout or work context. Now it is being called
      in an io completion context, which can be an interrupt handler.
      Unfortunately, the abort outstanding ios routine attempts to stop nvme
      queues and nested routines that may try to sleep, which is in conflict
      with the interrupt handler.
      
      Correct replacing the direct call with a work element scheduling, and the
      abort outstanding ios routine will be called in the work element.
      
      Fixes: 95ced8a2 ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reported-by: NDaniel Wagner <dwagner@suse.de>
      Tested-by: NDaniel Wagner <dwagner@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      19fce047
    • M
      block: fix use-after-free in disk_part_iter_next · aebf5db9
      Ming Lei 提交于
      Make sure that bdgrab() is done on the 'block_device' instance before
      referring to it for avoiding use-after-free.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      aebf5db9
    • J
      bfq: Fix computation of shallow depth · 6d4d2735
      Jan Kara 提交于
      BFQ computes number of tags it allows to be allocated for each request type
      based on tag bitmap. However it uses 1 << bitmap.shift as number of
      available tags which is wrong. 'shift' is just an internal bitmap value
      containing logarithm of how many bits bitmap uses in each bitmap word.
      Thus number of tags allowed for some request types can be far to low.
      Use proper bitmap.depth which has the number of tags instead.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6d4d2735
    • T
      blk-iocost: fix NULL iocg deref from racing against initialization · d16baa3f
      Tejun Heo 提交于
      When initializing iocost for a queue, its rqos should be registered before
      the blkcg policy is activated to allow policy data initiailization to lookup
      the associated ioc. This unfortunately means that the rqos methods can be
      called on bios before iocgs are attached to all existing blkgs.
      
      While the race is theoretically possible on ioc_rqos_throttle(), it mostly
      happened in ioc_rqos_merge() due to the difference in how they lookup ioc.
      The former determines it from the passed in @rqos and then bails before
      dereferencing iocg if the looked up ioc is disabled, which most likely is
      the case if initialization is still in progress. The latter looked up ioc by
      dereferencing the possibly NULL iocg making it a lot more prone to actually
      triggering the bug.
      
      * Make ioc_rqos_merge() use the same method as ioc_rqos_throttle() to look
        up ioc for consistency.
      
      * Make ioc_rqos_throttle() and ioc_rqos_merge() test for NULL iocg before
        dereferencing it.
      
      * Explain the danger of NULL iocgs in blk_iocost_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJonathan Lemon <bsd@fb.com>
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d16baa3f
  3. 04 1月, 2021 2 次提交
  4. 30 12月, 2020 2 次提交
  5. 28 12月, 2020 8 次提交
  6. 27 12月, 2020 1 次提交
  7. 26 12月, 2020 7 次提交
    • L
      Merge tag 'pci-v5.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 40f78232
      Linus Torvalds 提交于
      Pull PCI fixes from Bjorn Helgaas:
      
       - Fix a tegra enumeration regression (Rob Herring)
      
       - Fix a designware-host check that warned on *success*, not failure
         (Alexander Lobakin)
      
      * tag 'pci-v5.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: dwc: Fix inverted condition of DMA mask setup warning
        PCI: tegra: Fix host link initialization
      40f78232
    • N
      mfd: ab8500-debugfs: Remove extraneous curly brace · c9a3c4e6
      Nathan Chancellor 提交于
      Clang errors:
      
        drivers/mfd/ab8500-debugfs.c:1526:2: error: non-void function does not return a value [-Werror,-Wreturn-type]
                }
                ^
        drivers/mfd/ab8500-debugfs.c:1528:2: error: expected identifier or '('
        return 0;
                ^
        drivers/mfd/ab8500-debugfs.c:1529:1: error: extraneous closing brace ('}')
        }
        ^
        3 errors generated.
      
      The cleanup in ab8500_interrupts_show left a curly brace around, remove
      it to fix the error.
      
      Fixes: 886c8121 ("mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc")
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9a3c4e6
    • A
      PCI: dwc: Fix inverted condition of DMA mask setup warning · 99e629f1
      Alexander Lobakin 提交于
      Commit 660c4865 ("PCI: dwc: Set 32-bit DMA mask for MSI target address
      allocation") added dma_mask_set() call to explicitly set 32-bit DMA mask
      for MSI message mapping, but for now it throws a warning on ret == 0, while
      dma_set_mask() returns 0 in case of success.
      
      Fix this by inverting the condition.
      
      [bhelgaas: join string to make it greppable]
      Fixes: 660c4865 ("PCI: dwc: Set 32-bit DMA mask for MSI target address allocation")
      Link: https://lore.kernel.org/r/20201222150708.67983-1-alobakin@pm.meSigned-off-by: NAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      99e629f1
    • R
      PCI: tegra: Fix host link initialization · 275e88b0
      Rob Herring 提交于
      Commit b9ac0f9d ("PCI: dwc: Move dw_pcie_setup_rc() to DWC common
      code") broke enumeration of downstream devices on Tegra:
      
      In non-working case (next-20201211):
      
        0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)
        0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)
        0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)
      
      In working case (v5.10-rc7):
      
        0001:00:00.0 PCI bridge: Molex Incorporated Device 1ad2 (rev a1)
        0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)
        0005:00:00.0 PCI bridge: Molex Incorporated Device 1ad0 (rev a1)
        0005:01:00.0 PCI bridge: PLX Technology, Inc. Device 3380 (rev ab)
        0005:02:02.0 PCI bridge: PLX Technology, Inc. Device 3380 (rev ab)
        0005:03:00.0 USB controller: PLX Technology, Inc. Device 3380 (rev ab)
      
      The problem seems to be dw_pcie_setup_rc() is now called twice before and
      after the link up handling. The fix is to move Tegra's link up handling to
      .start_link() function like other DWC drivers. Tegra is a bit more
      complicated than others as it re-inits the whole DWC controller to retry
      the link. With this, the initialization ordering is restored to match the
      prior sequence.
      
      Fixes: b9ac0f9d ("PCI: dwc: Move dw_pcie_setup_rc() to DWC common code")
      Link: https://lore.kernel.org/r/20201218143905.1614098-1-robh@kernel.orgReported-by: NMian Yousaf Kaukab <ykaukab@suse.de>
      Tested-by: NMian Yousaf Kaukab <ykaukab@suse.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Cc: Jonathan Hunter <jonathanh@nvidia.com>
      Cc: Vidya Sagar <vidyas@nvidia.com>
      275e88b0
    • L
      drm/amd/display: avoid uninitialized variable warning · 61d79136
      Linus Torvalds 提交于
      clang (quite rightly) complains fairly loudly about the newly added
      mpc1_get_mpc_out_mux() function returning an uninitialized value if the
      'opp_id' checks don't pass.
      
      This may not happen in practice, but the code really shouldn't return
      garbage if the sanity checks don't pass.
      
      So just initialize 'val' to zero to avoid the issue.
      
      Fixes: 110b055b ("drm/amd/display: add getter routine to retrieve mpcc mux")
      Cc: Josip Pavic <Josip.Pavic@amd.com>
      Cc: Bindu Ramamurthy <bindu.r@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61d79136
    • L
      Merge tag 'perf-tools-2020-12-24' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux · 5814bc2d
      Linus Torvalds 提交于
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
      
       - Refactor 'perf stat' per CPU/socket/die/thread aggregation fixing use
         cases in ARM machines.
      
       - Fix memory leak when synthesizing SDT probes in 'perf probe'.
      
       - Update kernel header copies related to KVM, epol_pwait. msr-index and
         powerpc and s390 syscall tables.
      
      * tag 'perf-tools-2020-12-24' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (24 commits)
        perf probe: Fix memory leak when synthesizing SDT probes
        perf stat aggregation: Add separate thread member
        perf stat aggregation: Add separate core member
        perf stat aggregation: Add separate die member
        perf stat aggregation: Add separate socket member
        perf stat aggregation: Add separate node member
        perf stat aggregation: Start using cpu_aggr_id in map
        perf cpumap: Drop in cpu_aggr_map struct
        perf cpumap: Add new map type for aggregation
        perf stat: Replace aggregation ID with a struct
        perf cpumap: Add new struct for cpu aggregation
        perf cpumap: Use existing allocator to avoid using malloc
        perf tests: Improve topology test to check all aggregation types
        perf tools: Update s390's syscall.tbl copy from the kernel sources
        perf tools: Update powerpc's syscall.tbl copy from the kernel sources
        perf s390: Move syscall.tbl check into check-headers.sh
        perf powerpc: Move syscall.tbl check to check-headers.sh
        tools headers UAPI: Synch KVM's svm.h header with the kernel
        tools kvm headers: Update KVM headers from the kernel sources
        tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
        ...
      5814bc2d
    • L
      Merge branch 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux · 42dc45e8
      Linus Torvalds 提交于
      Pull coccinelle updates from Julia Lawall.
      
      * 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
        scripts: coccicheck: Correct usage of make coccicheck
        coccinelle: update expiring email addresses
        coccinnelle: Remove ptr_ret script
        kbuild: do not use scripts/ld-version.sh for checking spatch version
        remove boolinit.cocci
      42dc45e8