1. 21 12月, 2019 1 次提交
    • M
      dm mpath: remove harmful bio-based optimization · 980b632f
      Mike Snitzer 提交于
      commit dbaf971c9cdf10843071a60dcafc1aaab3162354 upstream.
      
      Removes the branching for edge-case where no SCSI device handler
      exists.  The __map_bio_fast() method was far too limited, by only
      selecting a new pathgroup or path IFF there was a path failure, fix this
      be eliminating it in favor of __map_bio().  __map_bio()'s extra SCSI
      device handler specific MPATHF_PG_INIT_REQUIRED test is not in the fast
      path anyway.
      
      This change restores full path selector functionality for bio-based
      configurations that don't haave a SCSI device handler.  But it should be
      noted that the path selectors do have an impact on performance for
      certain networks that are extremely fast (and don't require frequent
      switching).
      
      Fixes: 8d47e659 ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks")
      Cc: stable@vger.kernel.org
      Reported-by: NDrew Hastings <dhastings@crucialwebhost.com>
      Suggested-by: NMartin Wilck <mwilck@suse.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      980b632f
  2. 16 9月, 2019 1 次提交
    • Y
      dm mpath: fix missing call of path selector type->end_io · 69409854
      Yufen Yu 提交于
      [ Upstream commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b ]
      
      After commit 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via
      blk_insert_cloned_request feedback"), map_request() will requeue the tio
      when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.
      
      Thus, if device driver status is error, a tio may be requeued multiple
      times until the return value is not DM_MAPIO_REQUEUE.  That means
      type->start_io may be called multiple times, while type->end_io is only
      called when IO complete.
      
      In fact, even without commit 396eaf21, setup_clone() failure can
      also cause tio requeue and associated missed call to type->end_io.
      
      The service-time path selector selects path based on in_flight_size,
      which is increased by st_start_io() and decreased by st_end_io().
      Missed calls to st_end_io() can lead to in_flight_size count error and
      will cause the selector to make the wrong choice.  In addition,
      queue-length path selector will also be affected.
      
      To fix the problem, call type->end_io in ->release_clone_rq before tio
      requeue.  map_info is passed to ->release_clone_rq() for map_request()
      error path that result in requeue.
      
      Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
      Cc: stable@vger.kernl.org
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      69409854
  3. 26 5月, 2019 1 次提交
  4. 18 9月, 2018 1 次提交
    • M
      dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer · b592211c
      Mike Snitzer 提交于
      Commit e8f74a0f ("dm mpath: eliminate need to use
      scsi_device_from_queue") introduced 2 regressions:
      1) memory leak occurs if attached_handler_name is not assigned to
         m->hw_handler_name
      2) m->hw_handler_name can become a dangling pointer if the
         RETAIN_ATTACHED_HW_HANDLER flag is set and scsi_dh_attach() returns
         -EBUSY.
      
      Fix both of these by clearing 'attached_handler_name' pointer passed to
      setup_scsi_dh() after it is assigned to m->hw_handler_name.  And if
      setup_scsi_dh() doesn't consume 'attached_handler_name' parse_path()
      will kfree() it.
      
      Fixes: e8f74a0f ("dm mpath: eliminate need to use scsi_device_from_queue")
      Cc: stable@vger.kernel.org # 4.16+
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b592211c
  5. 14 5月, 2018 1 次提交
  6. 05 4月, 2018 1 次提交
  7. 04 4月, 2018 2 次提交
  8. 30 3月, 2018 1 次提交
  9. 15 3月, 2018 1 次提交
  10. 14 3月, 2018 2 次提交
  11. 07 3月, 2018 1 次提交
    • M
      dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks · 8d47e659
      Mike Snitzer 提交于
      This eliminates the "queue_mode" configuration's "nvme" mode.  There
      wasn't anything NVMe-specific about that mode.  It was named "nvme"
      because it was a short name for the mode.  But the entire point of the
      mode was to optimize the multipath target for underlying devices that
      are _not_ SCSI-based.  Devices that aren't SCSI have no need for the
      various SCSI device handler (scsi_dh) specific code in DM multipath.
      
      But rather than narrowly define this scsi_dh vs not branching in terms
      of "nvme": invert the logic so that we're just checking whether a
      multipath device is layered on SCSI devices with scsi_dh attached.
      
      This allows any future storage technology to avoid scsi_dh specific code
      in the multipath target too.
      
      Fixes: 848b8aef ("dm mpath: optimize NVMe bio-based support")
      Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8d47e659
  12. 30 1月, 2018 1 次提交
  13. 17 1月, 2018 3 次提交
  14. 11 1月, 2018 1 次提交
  15. 07 1月, 2018 2 次提交
  16. 05 1月, 2018 1 次提交
    • M
      dm mpath: implement NVMe bio-based support · cd025384
      Mike Snitzer 提交于
      This DM multipath NVMe bio-based support requires CONFIG_NVME_MULTIPATH
      to not be set.  In the future hopefully NVMe multipath and DM multipath
      can co-exist more seemlessly.  But as is, if CONFIG_NVME_MULTIPATH=Y
      then all the individal NVMe paths will remain hidden to upper layers and
      as such DM multipath will not be able to manage them.
      
      Though NVMe's native multipathing doesn't multipath namespaces across
      subsystems; so technically a user _could_ use CONFIG_NVME_MULTIPATH=Y
      and also use DM multipath to multipath across subsystems.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      cd025384
  17. 03 1月, 2018 1 次提交
  18. 20 12月, 2017 3 次提交
  19. 17 12月, 2017 1 次提交
  20. 08 12月, 2017 1 次提交
    • M
      dm mpath: fix bio-based multipath queue_if_no_path handling · c1fd0abe
      Mike Snitzer 提交于
      Commit ca5beb76 ("dm mpath: micro-optimize the hot path relative to
      MPATHF_QUEUE_IF_NO_PATH") caused bio-based DM-multipath to fail mptest's
      "test_02_sdev_delete".
      
      Restoring the logic that existed prior to commit ca5beb76 fixes this
      bio-based DM-multipath regression.  Also verified all mptest tests pass
      with request-based DM-multipath.
      
      This commit effectively reverts commit ca5beb76 -- but it does so
      without reintroducing the need to take the m->lock spinlock in
      must_push_back_{rq,bio}.
      
      Fixes: ca5beb76 ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH")
      Cc: stable@vger.kernel.org # 4.12+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c1fd0abe
  21. 04 12月, 2017 1 次提交
    • M
      dm: fix various targets to dm_register_target after module __init resources created · 7e6358d2
      monty_pavel@sina.com 提交于
      A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>"
      processes race to load the dm-thin-pool module:
      
       PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange"
        #0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9
        0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992
        0000002 [ffff883cd743d730] oops_end at ffffffff81515c90
        0000003 [ffff883cd743d760] no_context at ffffffff81049f1b
        0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5
        0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce
        0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f
        0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae
        0000008 [ffff883cd743d980] page_fault at ffffffff81514f95
           [exception RIP: kmem_cache_alloc+108]
           RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046
           RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0
           RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
           RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000
           R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0
           R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246
           ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
        0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5
       0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083
       0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4
       0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool]
       0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod]
       0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod]
       0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod]
      
      The race results in a NULL pointer because:
      
      Process A (vgchange -ay -K):
       	a. send DM_LIST_VERSIONS_CMD ioctl;
       	b. pool_target not registered;
       	c. modprobe dm_thin_pool and wait until end.
      
      Process B (vgchange -ay -K):
       	a. send DM_LIST_VERSIONS_CMD ioctl;
       	b. pool_target registered;
       	c. table_load->dm_table_add_target->pool_ctr;
       	d. _new_mapping_cache is NULL and panic.
      Note:
       	1. process A and process B are two concurrent processes.
       	2. pool_target can be detected by process B but
       	_new_mapping_cache initialization has not ended.
      
      To fix dm-thin-pool, and other targets (cache, multipath, and snapshot)
      with the same problem, simply dm_register_target() after all resources
      created during module init (as labelled with __init) are finished.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: Nmonty <monty_pavel@sina.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7e6358d2
  22. 17 11月, 2017 1 次提交
  23. 24 10月, 2017 1 次提交
  24. 20 10月, 2017 1 次提交
    • L
      bitops: Introduce assign_bit() · 5307e2ad
      Lukas Wunner 提交于
      A common idiom is to assign a value to a bit with:
      
          if (value)
              set_bit(nr, addr);
          else
              clear_bit(nr, addr);
      
      Likewise common is the one-line expression variant:
      
          value ? set_bit(nr, addr) : clear_bit(nr, addr);
      
      Commit 9a8ac3ae ("dm mpath: cleanup QUEUE_IF_NO_PATH bit
      manipulation by introducing assign_bit()") introduced assign_bit()
      to the md subsystem for brevity.
      
      Make it available to others, specifically gpiolib and the upcoming
      driver for Maxim MAX3191x industrial serializer chips.
      
      As requested by Peter Zijlstra, change the argument order to reflect
      traditional "dst = src" in C, hence "assign_bit(nr, addr, value)".
      
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Neil Brown <neilb@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      5307e2ad
  25. 28 8月, 2017 5 次提交
  26. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  27. 14 6月, 2017 1 次提交
  28. 09 6月, 2017 2 次提交
    • C
      block: switch bios to blk_status_t · 4e4cbee9
      Christoph Hellwig 提交于
      Replace bi_error with a new bi_status to allow for a clear conversion.
      Note that device mapper overloaded bi_error with a private value, which
      we'll have to keep arround at least for now and thus propagate to a
      proper blk_status_t value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4e4cbee9
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca