1. 28 1月, 2020 1 次提交
  2. 13 11月, 2019 3 次提交
  3. 07 11月, 2019 1 次提交
  4. 23 8月, 2019 1 次提交
    • M
      dm: make dm_table_find_target return NULL · 123d87d5
      Mikulas Patocka 提交于
      Currently, if we pass too high sector number to dm_table_find_target, it
      returns zeroed dm_target structure and callers test if the structure is
      zeroed with the macro dm_target_is_valid.
      
      However, returning NULL is common practice to indicate errors.
      
      This patch refactors the dm code, so that dm_table_find_target returns
      NULL and its callers test the returned value for NULL. The macro
      dm_target_is_valid is deleted. In alloc_targets, we no longer allocate an
      extra zeroed target.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      123d87d5
  5. 12 7月, 2019 1 次提交
  6. 06 7月, 2019 2 次提交
  7. 22 5月, 2019 1 次提交
    • M
      dm: make sure to obey max_io_len_target_boundary · 51b86f9a
      Michael Lass 提交于
      Commit 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM
      target interface") incorrectly removed code from
      __send_changing_extent_only() that is required to impose a per-target IO
      boundary on IO that exceeds max_io_len_target_boundary().  Otherwise
      "special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond
      where allowed.
      
      Fix this by restoring the max_io_len_target_boundary() limit in
      __send_changing_extent_only()
      
      Fixes: 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM target interface")
      Cc: stable@vger.kernel.org # 5.1+
      Signed-off-by: NMichael Lass <bevan@bi-co.net>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      51b86f9a
  8. 21 5月, 2019 1 次提交
    • D
      dax: Arrange for dax_supported check to span multiple devices · 7bf7eac8
      Dan Williams 提交于
      Pankaj reports that starting with commit ad428cdb "dax: Check the
      end of the block-device capacity with dax_direct_access()" device-mapper
      no longer allows dax operation. This results from the stricter checks in
      __bdev_dax_supported() that validate that the start and end of a
      block-device map to the same 'pagemap' instance.
      
      Teach the dax-core and device-mapper to validate the 'pagemap' on a
      per-target basis. This is accomplished by refactoring the
      bdev_dax_supported() internals into generic_fsdax_supported() which
      takes a sector range to validate. Consequently generic_fsdax_supported()
      is suitable to be used in a device-mapper ->iterate_devices() callback.
      A new ->dax_supported() operation is added to allow composite devices to
      split and route upper-level bdev_dax_supported() requests.
      
      Fixes: ad428cdb ("dax: Check the end of the block-device...")
      Cc: <stable@vger.kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reported-by: NPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7bf7eac8
  9. 16 5月, 2019 1 次提交
  10. 26 4月, 2019 1 次提交
  11. 05 4月, 2019 1 次提交
    • M
      dm: disable DISCARD if the underlying storage no longer supports it · bcb44433
      Mike Snitzer 提交于
      Storage devices which report supporting discard commands like
      WRITE_SAME_16 with unmap, but reject discard commands sent to the
      storage device.  This is a clear storage firmware bug but it doesn't
      change the fact that should a program cause discards to be sent to a
      multipath device layered on this buggy storage, all paths can end up
      failed at the same time from the discards, causing possible I/O loss.
      
      The first discard to a path will fail with Illegal Request, Invalid
      field in cdb, e.g.:
       kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
       kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
       kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
       kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
       kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
      
      The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
      device disables its support for discard on this path, and because of the
      BLK_STS_TARGET error multipath fails the discard without failing any
      path or retrying down a different path.  But subsequent discards can
      cause path failures.  Any discards sent to the path which already failed
      a discard ends up failing with EIO from blk_cloned_rq_check_limits with
      an "over max size limit" error since the discard limit was set to 0 by
      the sd driver for the path.  As the error is EIO, this now fails the
      path and multipath tries to send the discard down the next path.  This
      cycle continues as discards are sent until all paths fail.
      
      Fix this by training DM core to disable DISCARD if the underlying
      storage already did so.
      
      Also, fix branching in dm_done() and clone_endio() to reflect the
      mutually exclussive nature of the IO operations in question.
      
      Cc: stable@vger.kernel.org
      Reported-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      bcb44433
  12. 02 4月, 2019 1 次提交
    • M
      dm: revert 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") · 75ae1936
      Mikulas Patocka 提交于
      The limit was already incorporated to dm-crypt with commit 4e870e94
      ("dm crypt: fix error with too large bios"), so we don't need to apply
      it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
      wrong anyway because the variable ti->max_io_len it is supposed to be in
      the units of 512-byte sectors not in bytes.
      
      Reduction of the limit to 1048576 sectors could even cause data
      corruption in rare cases - suppose that we have a dm-striped device with
      stripe size 768MiB. The target will call dm_set_target_max_io_len with
      the value 1572864. The buggy code would reduce it to 1048576. Now, the
      dm-core will errorneously split the bios on 1048576-sector boundary
      insetad of 1572864-sector boundary and pass these stripe-crossing bios
      to the striped target.
      
      Cc: stable@vger.kernel.org # v4.16+
      Fixes: 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      75ae1936
  13. 06 3月, 2019 2 次提交
  14. 21 2月, 2019 1 次提交
  15. 20 2月, 2019 1 次提交
  16. 07 2月, 2019 2 次提交
    • M
      dm: don't use bio_trim() afterall · fa8db494
      Mike Snitzer 提交于
      bio_trim() has an early return, which makes it _not_ idempotent, if the
      offset is 0 and the bio's bi_size already matches the requested size.
      Prior to DM, all users of bio_trim() were fine with this.  But DM has
      exposed the fact that bio_trim()'s early return is incompatible with a
      cloned bio whose integrity payload must be trimmed via
      bio_integrity_trim().
      
      Fix this by reverting DM back to doing the equivalent of bio_trim() but
      in an idempotent manner (so bio_integrity_trim is always performed).
      
      Follow-on work is needed to assess what benefit bio_trim()'s early
      return is providing to its existing callers.
      Reported-by: NMilan Broz <gmazyland@gmail.com>
      Fixes: 57c36519 ("dm: fix clone_bio() to trigger blk_recount_segments()")
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      fa8db494
    • M
      dm: add memory barrier before waitqueue_active · 645efa84
      Mikulas Patocka 提交于
      Block core changes to switch bio-based IO accounting to be percpu had a
      side-effect of altering DM core to now rely on calling waitqueue_active
      (in both bio-based and request-based) to check if another task is in
      dm_wait_for_completion().
      
      A memory barrier is needed before calling waitqueue_active().  DM core
      doesn't piggyback on a preceding memory barrier so it must explicitly
      use its own.
      
      For more details on why using waitqueue_active() without a preceding
      barrier is unsafe, please see the comment before the waitqueue_active()
      definition in include/linux/wait.h.
      
      Add the missing memory barrier by switching to using wq_has_sleeper().
      
      Fixes: 6f757231 ("dm: remove the pending IO accounting")
      Fixes: c4576aed ("dm: fix request-based dm's use of dm_wait_for_completion")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      645efa84
  17. 23 1月, 2019 2 次提交
  18. 22 1月, 2019 2 次提交
    • M
      dm: fix redundant IO accounting for bios that need splitting · a1e1cb72
      Mike Snitzer 提交于
      The risk of redundant IO accounting was not taken into consideration
      when commit 18a25da8 ("dm: ensure bio submission follows a
      depth-first tree walk") introduced IO splitting in terms of recursion
      via generic_make_request().
      
      Fix this by subtracting the split bio's payload from the IO stats that
      were already accounted for by start_io_acct() upon dm_make_request()
      entry.  This repeat oscillation of the IO accounting, up then down,
      isn't ideal but refactoring DM core's IO splitting to pre-split bios
      _before_ they are accounted turned out to be an excessive amount of
      change that will need a full development cycle to refine and verify.
      
      Before this fix:
      
        /dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
        bios are split on 32k boundaries.
      
        # fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
          	--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers
      
        with debugging added:
        [103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
        [103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
        [103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
        ...
      
        16M written yet 136M (278528 * 512b) accounted:
        # cat /sys/block/dm-2/stat | awk '{ print $7 }'
        278528
      
      After this fix:
      
        16M written and 16M (32768 * 512b) accounted:
        # cat /sys/block/dm-2/stat | awk '{ print $7 }'
        32768
      
      Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
      Cc: stable@vger.kernel.org # 4.16+
      Reported-by: NBryan Gurney <bgurney@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a1e1cb72
    • M
      dm: fix clone_bio() to trigger blk_recount_segments() · 57c36519
      Mike Snitzer 提交于
      DM's clone_bio() now benefits from using bio_trim() by fixing the fact
      that clone_bio() wasn't clearing BIO_SEG_VALID like bio_trim() does;
      which triggers blk_recount_segments() via bio_phys_segments().
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      57c36519
  19. 20 12月, 2018 1 次提交
    • J
      dm: don't reuse bio for flushes · dbe3ece1
      Jens Axboe 提交于
      DM currently has a statically allocated bio that it uses to issue empty
      flushes. It doesn't submit this bio, it just uses it for maintaining
      state while setting up clones. Multiple users can access this bio at the
      same time. This wasn't previously an issue, even if it was a bit iffy,
      but with the blkg associations it can become one.
      
      We setup the blkg association, then clone bio's and submit, then remove
      the blkg assocation again. But since we can have multiple tasks doing
      this at the same time, against multiple blkg's, then we can either lose
      references to a blkg, or put it twice. The latter causes complaints on
      the percpu ref being <= 0 when released, and can cause use-after-free as
      well. Ming reports that xfstest generic/475 triggers this:
      
      ------------[ cut here ]------------
      percpu ref (blkg_release) <= 0 (0) after switching to atomic
      WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0
      
      Switch to just using an on-stack bio for this, and get rid of the
      embedded bio.
      
      Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
      Reported-by: NMing Lei <ming.lei@redhat.com>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dbe3ece1
  20. 18 12月, 2018 3 次提交
  21. 11 12月, 2018 2 次提交
    • M
      dm: fix request-based dm's use of dm_wait_for_completion · c4576aed
      Mike Snitzer 提交于
      The md->wait waitqueue is used by both bio-based and request-based DM.
      Commit dbd3bbd2 ("dm rq: leverage blk_mq_queue_busy() to check for
      outstanding IO") lost sight of the requirement that
      dm_wait_for_completion() must work with all types of DM devices.
      
      Fix md_in_flight() to call the blk-mq or bio-based method accordingly.
      
      Fixes: dbd3bbd2 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO")
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c4576aed
    • J
      dm: fix inflight IO check · b7934ba4
      Jens Axboe 提交于
      After switching to percpu inflight counters, the inflight check
      is totally buggy. It's perfectly valid for some counters to be
      non-zero while having a total inflight IO count of 0, that's how
      these kinds of counters work (inc on one CPU, dec on another).
      Fix the md_in_flight() check to sum all counters before returning
      a false positive, potentially.
      
      While at it, remove the inflight read for IO completion. We don't
      need it, just wake anyone that's waiting for the IO count to drop
      to zero. The caller needs to re-check that value anyway when woken,
      which it does.
      
      Fixes: 6f757231 ("dm: remove the pending IO accounting")
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b7934ba4
  22. 10 12月, 2018 2 次提交
  23. 08 12月, 2018 2 次提交
  24. 16 11月, 2018 1 次提交
  25. 26 10月, 2018 1 次提交
    • C
      block: add a report_zones method · e76239a3
      Christoph Hellwig 提交于
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e76239a3
  26. 17 10月, 2018 1 次提交
  27. 11 10月, 2018 2 次提交