1. 16 11月, 2018 1 次提交
  2. 26 10月, 2018 1 次提交
    • C
      block: add a report_zones method · e76239a3
      Christoph Hellwig 提交于
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e76239a3
  3. 17 10月, 2018 1 次提交
  4. 11 10月, 2018 2 次提交
  5. 10 10月, 2018 1 次提交
    • D
      dm: fix report zone remapping to account for partition offset · 9864cd5d
      Damien Le Moal 提交于
      If dm-linear or dm-flakey are layered on top of a partition of a zoned
      block device, remapping of the start sector and write pointer position
      of the zones reported by a report zones BIO must be modified to account
      for the target table entry mapping (start offset within the device and
      entry mapping with the dm device).  If the target's backing device is a
      partition of a whole disk, the start sector on the physical device of
      the partition must also be accounted for when modifying the zone
      information.  However, dm_remap_zone_report() was not considering this
      last case, resulting in incorrect zone information remapping with
      targets using disk partitions.
      
      Fix this by calculating the target backing device start sector using
      the position of the completed report zones BIO and the unchanged
      position and size of the original report zone BIO. With this value
      calculated, the start sector and write pointer position of the target
      zones can be correctly remapped.
      
      Fixes: 10999307 ("dm: introduce dm_remap_zone_report()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9864cd5d
  6. 18 7月, 2018 1 次提交
    • M
      block: Add and use op_stat_group() for indexing disk_stat fields. · ddcf35d3
      Michael Callahan 提交于
      Add and use a new op_stat_group() function for indexing partition stat
      fields rather than indexing them by rq_data_dir() or bio_data_dir().
      This function works similarly to op_is_sync() in that it takes the
      request::cmd_flags or bio::bi_opf flags and determines which stats
      should et updated.
      
      In addition, the second parameter to generic_start_io_acct() and
      generic_end_io_acct() is now a REQ_OP rather than simply a read or
      write bit and it uses op_stat_group() on the parameter to determine
      the stat group.
      
      Note that the partition in_flight counts are not part of the per-cpu
      statistics and as such are not indexed via this function.  It's now
      indexed by op_is_write().
      
      tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.
      Signed-off-by: NMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Matias Bjorling <mb@lightnvm.io>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ddcf35d3
  7. 29 6月, 2018 1 次提交
    • R
      dm: prevent DAX mounts if not supported · dbc62659
      Ross Zwisler 提交于
      Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX
      flag is set on the device's request queue to decide whether or not the
      device supports filesystem DAX.  Really we should be using
      bdev_dax_supported() like filesystems do at mount time.  This performs
      other tests like checking to make sure the dax_direct_access() path works.
      
      We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if
      any of the underlying devices do not support DAX.  This makes the handling
      of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags
      in dm_table_set_restrictions().
      
      Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this
      will ensure that filesystems built upon DM devices will only be able to
      mount with DAX if all underlying devices also support DAX.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Fixes: commit 545ed20e ("dm: add infrastructure for DAX support")
      Cc: stable@vger.kernel.org
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dbc62659
  8. 23 6月, 2018 1 次提交
    • M
      dm: use bio_split() when splitting out the already processed bio · f21c601a
      Mike Snitzer 提交于
      Use of bio_clone_bioset() is inefficient if there is no need to clone
      the original bio's bio_vec array.  Best to use the bio_clone_fast()
      variant.  Also, just using bio_advance() is only part of what is needed
      to properly setup the clone -- it doesn't account for the various
      bio_integrity() related work that also needs to be performed (see
      bio_split).
      
      Address both of these issues by switching from bio_clone_bioset() to
      bio_split().
      
      Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
      Cc: stable@vger.kernel.org # 4.15+, requires removal of '&' before md->queue->bio_split
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f21c601a
  9. 08 6月, 2018 1 次提交
  10. 31 5月, 2018 2 次提交
  11. 23 5月, 2018 1 次提交
    • D
      dax: Introduce a ->copy_to_iter dax operation · b3a9a0c3
      Dan Williams 提交于
      Similar to the ->copy_from_iter() operation, a platform may want to
      deploy an architecture or device specific routine for handling reads
      from a dax_device like /dev/pmemX. On x86 this routine will point to a
      machine check safe version of copy_to_iter(). For now, add the plumbing
      to device-mapper and the dax core.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3a9a0c3
  12. 01 5月, 2018 1 次提交
  13. 05 4月, 2018 2 次提交
    • M
      dm: remove fmode_t argument from .prepare_ioctl hook · 5bd5e8d8
      Mike Snitzer 提交于
      Use the fmode_t that is passed to dm_blk_ioctl() rather than
      inconsistently (varies across targets) drop it on the floor by
      overriding it with the fmode_t stored in 'struct dm_dev'.
      
      All the persistent reservation functions weren't using the fmode_t they
      got back from .prepare_ioctl so remove them.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5bd5e8d8
    • M
      dm: hold DM table for duration of ioctl rather than use blkdev_get · 971888c4
      Mike Snitzer 提交于
      Commit 519049af ("dm: use blkdev_get rather than bdgrab when issuing
      pass-through ioctl") inadvertantly introduced a regression relative to
      users of device cgroups that issue ioctls (e.g. libvirt).  Using
      blkdev_get() in DM's passthrough ioctl support implicitly introduced a
      cgroup permissions check that would fail unless care were taken to add
      all devices in the IO stack to the device cgroup.  E.g. rather than just
      adding the top-level DM multipath device to the cgroup all the
      underlying devices would need to be allowed.
      
      Fix this, to no longer require allowing all underlying devices, by
      simply holding the live DM table (which includes the table's original
      blkdev_get() reference on the blockdevice that the ioctl will be issued
      to) for the duration of the ioctl.
      
      Also, bump the DM ioctl version so a user can know that their device
      cgroup allow workaround is no longer needed.
      Reported-by: NMichal Privoznik <mprivozn@redhat.com>
      Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
      Fixes: 519049af ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
      Cc: stable@vger.kernel.org # 4.16
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      971888c4
  14. 04 4月, 2018 2 次提交
  15. 03 4月, 2018 1 次提交
  16. 30 3月, 2018 1 次提交
    • M
      dm: fix dropped return code from dm_get_bdev_for_ioctl · da5dadb4
      Mike Snitzer 提交于
      dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from
      prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it
      wasn't).  Unfortunately commit 519049af ("dm: use blkdev_get rather
      than bdgrab when issuing pass-through ioctl") reused the variable 'r'
      to store the return from blkdev_get() that follows prepare_ioctl()
      -- whereby dropping prepare_ioctl()'s result on the floor.
      
      This can lead to an ioctl or persistent reservation being issued to a
      partition going unnoticed, which implies the extra permission check for
      CAP_SYS_RAWIO is skipped.
      
      Fix this by using a different variable to store blkdev_get()'s return.
      
      Fixes: 519049af ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
      Reported-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      da5dadb4
  17. 07 3月, 2018 1 次提交
  18. 01 3月, 2018 1 次提交
  19. 16 2月, 2018 1 次提交
    • N
      dm: correctly handle chained bios in dec_pending() · 8dd601fa
      NeilBrown 提交于
      dec_pending() is given an error status (possibly 0) to be recorded
      against a bio.  It can be called several times on the one 'struct
      dm_io', and it is careful to only assign a non-zero error to
      io->status.  However when it then assigned io->status to bio->bi_status,
      it is not careful and could overwrite a genuine error status with 0.
      
      This can happen when chained bios are in use.  If a bio is chained
      beneath the bio that this dm_io is handling, the child bio might
      complete and set bio->bi_status before the dm_io completes.
      
      This has been possible since chained bios were introduced in 3.14, and
      has become a lot easier to trigger with commit 18a25da8 ("dm: ensure
      bio submission follows a depth-first tree walk") as that commit caused
      dm to start using chained bios itself.
      
      A particular failure mode is that if a bio spans an 'error' target and a
      working target, the 'error' fragment will complete instantly and set the
      ->bi_status, and the other fragment will normally complete a little
      later, and will clear ->bi_status.
      
      The fix is simply to only assign io_error to bio->bi_status when
      io_error is not zero.
      Reported-and-tested-by: NMilan Broz <gmazyland@gmail.com>
      Cc: stable@vger.kernel.org (v3.14+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8dd601fa
  20. 30 1月, 2018 1 次提交
  21. 17 1月, 2018 1 次提交
  22. 15 1月, 2018 1 次提交
    • M
      dm: fix incomplete request_queue initialization · c100ec49
      Mike Snitzer 提交于
      DM is no longer prone to having its request_queue be improperly
      initialized.
      
      Summary of changes:
      
      - defer DM's blk_register_queue() from add_disk()-time until
        dm_setup_md_queue() by using add_disk_no_queue_reg() in alloc_dev().
      
      - dm_setup_md_queue() is updated to fully initialize DM's request_queue
        (_after_ all table loads have occurred and the request_queue's type,
        features and limits are known).
      
      A very welcome side-effect of these changes is DM no longer needs to:
      1) backfill the "mq" sysfs entry (because historically DM didn't
      initialize the request_queue to use blk-mq until _after_
      blk_register_queue() was called via add_disk()).
      2) call elv_register_queue() to get .request_fn request-based DM
      device's "iosched" exposed in syfs.
      
      In addition, blk-mq debugfs support is now made available because
      request-based DM's blk-mq request_queue is now properly initialized
      before dm_setup_md_queue() calls blk_register_queue().
      
      These changes also stave off the need to introduce new DM-specific
      workarounds in block core, e.g. this proposal:
      https://patchwork.kernel.org/patch/10067961/
      
      In the end DM devices should be less unicorn in nature (relative to
      initialization and availability of block core infrastructure provided by
      the request_queue).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c100ec49
  23. 07 1月, 2018 1 次提交
  24. 20 12月, 2017 2 次提交
    • M
      dm: optimize bio-based NVMe IO submission · 978e51ba
      Mike Snitzer 提交于
      Upper level bio-based drivers that stack immediately ontop of NVMe can
      leverage direct_make_request().  In addition DM's NVMe bio-based
      will initially only ever have one NVMe device that it submits IO to at a
      time.  There is no splitting needed.  Enhance DM core so that
      DM_TYPE_NVME_BIO_BASED's IO submission takes advantage of both of these
      characteristics.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      978e51ba
    • M
      dm: introduce DM_TYPE_NVME_BIO_BASED · 22c11858
      Mike Snitzer 提交于
      If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
      all devices in the DM table do not support partial completions.  Also,
      the table has a single immutable target that doesn't require DM core to
      split bios.
      
      This will enable adding NVMe optimizations to bio-based DM.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      22c11858
  25. 18 12月, 2017 1 次提交
    • M
      dm: simplify start of block stats accounting for bio-based · f3986374
      Mike Snitzer 提交于
      No apparent need to generic_start_io_acct() until before the IO is ready
      for submission.  start_io_acct() is the proper place to do this
      accounting -- it is also where DM accounts for pending IO and, if
      enabled, starts dm-stats accounting.
      
      Replace start_io_acct()'s part_round_stats() with generic_start_io_acct().
      This eliminates needing to take part_stat_lock() multiple times when
      starting an IO on bio-based devices.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f3986374
  26. 17 12月, 2017 5 次提交
  27. 14 12月, 2017 5 次提交