1. 24 7月, 2020 1 次提交
    • M
      dm integrity: fix integrity recalculation that is improperly skipped · 5df96f2b
      Mikulas Patocka 提交于
      Commit adc0daad ("dm: report suspended
      device during destroy") broke integrity recalculation.
      
      The problem is dm_suspended() returns true not only during suspend,
      but also during resume. So this race condition could occur:
      1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
      2. integrity_recalc (&ic->recalc_work) preempts the current thread
      3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
      4. integrity_recalc exits and no recalculating is done.
      
      To fix this race condition, add a function dm_post_suspending that is
      only true during the postsuspend phase and use it instead of
      dm_suspended().
      
      Signed-off-by: Mikulas Patocka <mpatocka redhat com>
      Fixes: adc0daad ("dm: report suspended device during destroy")
      Cc: stable vger kernel org # v4.18+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5df96f2b
  2. 09 7月, 2020 1 次提交
  3. 21 5月, 2020 1 次提交
  4. 15 5月, 2020 1 次提交
  5. 03 4月, 2020 1 次提交
  6. 13 11月, 2019 1 次提交
  7. 06 11月, 2019 1 次提交
    • G
      dm stripe: use struct_size() in kmalloc() · 8adeac3b
      Gustavo A. R. Silva 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct stripe_c {
              ...
              struct stripe stripe[0];
      };
      
      In this case alloc_context() and dm_array_too_big() are removed and
      replaced by the direct use of the struct_size() helper in kmalloc().
      
      Notice that open-coded form is prone to type mistakes.
      
      This code was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8adeac3b
  8. 18 7月, 2019 1 次提交
  9. 12 7月, 2019 1 次提交
  10. 26 4月, 2019 1 次提交
    • Y
      dm mpath: fix missing call of path selector type->end_io · 5de719e3
      Yufen Yu 提交于
      After commit 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via
      blk_insert_cloned_request feedback"), map_request() will requeue the tio
      when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.
      
      Thus, if device driver status is error, a tio may be requeued multiple
      times until the return value is not DM_MAPIO_REQUEUE.  That means
      type->start_io may be called multiple times, while type->end_io is only
      called when IO complete.
      
      In fact, even without commit 396eaf21, setup_clone() failure can
      also cause tio requeue and associated missed call to type->end_io.
      
      The service-time path selector selects path based on in_flight_size,
      which is increased by st_start_io() and decreased by st_end_io().
      Missed calls to st_end_io() can lead to in_flight_size count error and
      will cause the selector to make the wrong choice.  In addition,
      queue-length path selector will also be affected.
      
      To fix the problem, call type->end_io in ->release_clone_rq before tio
      requeue.  map_info is passed to ->release_clone_rq() for map_request()
      error path that result in requeue.
      
      Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
      Cc: stable@vger.kernl.org
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5de719e3
  11. 06 3月, 2019 2 次提交
    • H
      dm: add support to directly boot to a mapped device · 6bbc923d
      Helen Koike 提交于
      Add a "create" module parameter, which allows device-mapper targets to
      be configured at boot time. This enables early use of DM targets in the
      boot process (as the root device or otherwise) without the need of an
      initramfs.
      
      The syntax used in the boot param is based on the concise format from
      the dmsetup tool to follow the rule of least surprise:
      
      	dmsetup table --concise /dev/mapper/lroot
      
      Which is:
      	dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
      
      Where,
      	<name>		::= The device name.
      	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
      	<minor>		::= The device minor number | ""
      	<flags>		::= "ro" | "rw"
      	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
      	<target_type>	::= "verity" | "linear" | ...
      
      For example, the following could be added in the boot parameters:
      dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
      
      Only the targets that were tested are allowed and the ones that don't
      change any block device when the device is create as read-only. For
      example, mirror and cache targets are not allowed. The rationale behind
      this is that if the user makes a mistake, choosing the wrong device to
      be the mirror or the cache can corrupt data.
      
      The only targets initially allowed are:
      * crypt
      * delay
      * linear
      * snapshot-origin
      * striped
      * verity
      Co-developed-by: NWill Drewry <wad@chromium.org>
      Co-developed-by: NKees Cook <keescook@chromium.org>
      Co-developed-by: NEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Signed-off-by: NHelen Koike <helen.koike@collabora.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6bbc923d
    • N
      dm: fix to_sector() for 32bit · 0bdb50c5
      NeilBrown 提交于
      A dm-raid array with devices larger than 4GB won't assemble on
      a 32 bit host since _check_data_dev_sectors() was added in 4.16.
      This is because to_sector() treats its argument as an "unsigned long"
      which is 32bits (4GB) on a 32bit host.  Using "unsigned long long"
      is more correct.
      
      Kernels as early as 4.2 can have other problems due to to_sector()
      being used on the size of a device.
      
      Fixes: 0cf45031 ("dm raid: add support for the MD RAID0 personality")
      cc: stable@vger.kernel.org (v4.2+)
      Reported-and-tested-by: NGuillaume Perréal <gperreal@free.fr>
      Signed-off-by: NNeilBrown <neil@brown.name>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0bdb50c5
  12. 21 2月, 2019 1 次提交
  13. 26 10月, 2018 1 次提交
    • C
      block: add a report_zones method · e76239a3
      Christoph Hellwig 提交于
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e76239a3
  14. 19 10月, 2018 1 次提交
  15. 11 10月, 2018 1 次提交
  16. 23 5月, 2018 1 次提交
    • D
      dax: Introduce a ->copy_to_iter dax operation · b3a9a0c3
      Dan Williams 提交于
      Similar to the ->copy_from_iter() operation, a platform may want to
      deploy an architecture or device specific routine for handling reads
      from a dax_device like /dev/pmemX. On x86 this routine will point to a
      machine check safe version of copy_to_iter(). For now, add the plumbing
      to device-mapper and the dax core.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3a9a0c3
  17. 05 4月, 2018 1 次提交
  18. 04 4月, 2018 2 次提交
  19. 18 3月, 2018 1 次提交
    • B
      block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> · 233bde21
      Bart Van Assche 提交于
      It happens often while I'm preparing a patch for a block driver that
      I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
      available for this driver? Do I have to introduce definitions of these
      constants before I can use these constants? To avoid this confusion,
      move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
      <linux/blkdev.h> header file such that these become available for all
      block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
      header file conditional to avoid that including that header file after
      <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
      redefinition.
      
      Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
      not been removed from uapi header files nor from NAND drivers in
      which these constants are used for another purpose than converting
      block layer offsets and sizes into a number of sectors.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      233bde21
  20. 30 1月, 2018 1 次提交
  21. 17 1月, 2018 1 次提交
  22. 20 12月, 2017 1 次提交
    • M
      dm: introduce DM_TYPE_NVME_BIO_BASED · 22c11858
      Mike Snitzer 提交于
      If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
      all devices in the DM table do not support partial completions.  Also,
      the table has a single immutable target that doesn't require DM core to
      split bios.
      
      This will enable adding NVMe optimizations to bio-based DM.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      22c11858
  23. 17 12月, 2017 1 次提交
    • M
      dm: improve performance by moving dm_io structure to per-bio-data · 64f52b0e
      Mike Snitzer 提交于
      Eliminates need for a separate mempool to allocate 'struct dm_io'
      objects from.  As such, it saves an extra mempool allocation for each
      original bio that DM core is issued.
      
      This complicates the per-bio-data accessor functions by needing to
      conditonally add extra padding to get to a target's per-bio-data.  But
      in the end this provides a decent performance improvement for all
      bio-based DM devices.
      
      On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
      DM linear performance improved by 2% (went from 2665 to 2777 MB/s).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      64f52b0e
  24. 14 12月, 2017 1 次提交
  25. 11 9月, 2017 1 次提交
    • M
      dax: remove the pmem_dax_ops->flush abstraction · c3ca015f
      Mikulas Patocka 提交于
      Commit abebfbe2 ("dm: add ->flush() dax operation support") is
      buggy. A DM device may be composed of multiple underlying devices and
      all of them need to be flushed. That commit just routes the flush
      request to the first device and ignores the other devices.
      
      It could be fixed by adding more complex logic to the device mapper. But
      there is only one implementation of the method pmem_dax_ops->flush - that
      is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
      don't need the pmem_dax_ops->flush abstraction at all, we can call
      arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
      can't ever reach anything different from arch_wb_cache_pmem().
      
      It should be also pointed out that for some uses of persistent memory it
      is needed to flush only a very small amount of data (such as 1 cacheline),
      and it would be overkill if we go through that device mapper machinery for
      a single flushed cache line.
      
      Fix this by removing the pmem_dax_ops->flush abstraction and call
      arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
      mapper code that forwards the flushes.
      
      Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c3ca015f
  26. 28 8月, 2017 2 次提交
  27. 19 6月, 2017 3 次提交
    • D
      dm: introduce dm_remap_zone_report() · 10999307
      Damien Le Moal 提交于
      A target driver support zoned block devices and exposing it as such may
      receive REQ_OP_ZONE_REPORT request for the user to determine the mapped
      device zone configuration. To process properly such request, the target
      driver may need to remap the zone descriptors provided in the report
      reply. The helper function dm_remap_zone_report() does this generically
      using only the target start offset and length and the start offset
      within the target device.
      
      dm_remap_zone_report() will remap the start sector of all zones
      reported. If the report includes sequential zones, the write pointer
      position of these zones will also be remapped.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      10999307
    • D
      dm table: add zoned block devices validation · dd88d313
      Damien Le Moal 提交于
      1) Introduce DM_TARGET_ZONED_HM feature flag:
      
      The target drivers currently available will not operate correctly if a
      table target maps onto a host-managed zoned block device.
      
      To avoid problems, introduce the new feature flag DM_TARGET_ZONED_HM to
      allow a target to explicitly state that it supports host-managed zoned
      block devices.  This feature is checked for all targets in a table if
      any of the table's block devices are host-managed.
      
      Note that as host-aware zoned block devices are backward compatible with
      regular block devices, they can be used by any of the current target
      types.  This new feature is thus restricted to host-managed zoned block
      devices.
      
      2) Check device area zone alignment:
      
      If a target maps to a zoned block device, check that the device area is
      aligned on zone boundaries to avoid problems with REQ_OP_ZONE_RESET
      operations (resetting a partially mapped sequential zone would not be
      possible).  This also facilitates the processing of zone report with
      REQ_OP_ZONE_REPORT bios.
      
      3) Check block devices zone model compatibility
      
      When setting the DM device's queue limits, several possibilities exists
      for zoned block devices:
      1) The DM target driver may want to expose a different zone model
      (e.g. host-managed device emulation or regular block device on top of
      host-managed zoned block devices)
      2) Expose the underlying zone model of the devices as-is
      
      To allow both cases, the underlying block device zone model must be set
      in the target limits in dm_set_device_limits() and the compatibility of
      all devices checked similarly to the logical block size alignment.  For
      this last check, introduce validate_hardware_zoned_model() to check that
      all targets of a table have the same zone model and that the zone size
      of the target devices are equal.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      [Mike Snitzer refactored Damien's original work to simplify the code]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dd88d313
    • J
      dm: convert DM printk macros to pr_<level> macros · d2c3c8dc
      Joe Perches 提交于
      Using pr_<level> is the more common logging style.
      
      Standardize style and use new macro DM_FMT.
      Use no_printk in DMDEBUG macros when CONFIG_DM_DEBUG is not #defined.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d2c3c8dc
  28. 16 6月, 2017 1 次提交
  29. 10 6月, 2017 1 次提交
  30. 09 6月, 2017 3 次提交
    • C
      block: switch bios to blk_status_t · 4e4cbee9
      Christoph Hellwig 提交于
      Replace bi_error with a new bi_status to allow for a clear conversion.
      Note that device mapper overloaded bi_error with a private value, which
      we'll have to keep arround at least for now and thus propagate to a
      proper blk_status_t value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4e4cbee9
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
    • C
      dm: change ->end_io calling convention · 1be56909
      Christoph Hellwig 提交于
      Turn the error paramter into a pointer so that target drivers can change
      the value, and make sure only DM_ENDIO_* values are returned from the
      methods.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1be56909
  31. 02 5月, 2017 2 次提交
  32. 28 4月, 2017 1 次提交