1. 11 8月, 2021 1 次提交
    • T
      dm ima: measure data on table load · 91ccbbac
      Tushar Sugandhi 提交于
      DM configures a block device with various target specific attributes
      passed to it as a table.  DM loads the table, and calls each target’s
      respective constructors with the attributes as input parameters.
      Some of these attributes are critical to ensure the device meets
      certain security bar.  Thus, IMA should measure these attributes, to
      ensure they are not tampered with, during the lifetime of the device.
      So that the external services can have high confidence in the
      configuration of the block-devices on a given system.
      
      Some devices may have large tables.  And a given device may change its
      state (table-load, suspend, resume, rename, remove, table-clear etc.)
      many times.  Measuring these attributes each time when the device
      changes its state will significantly increase the size of the IMA logs.
      Further, once configured, these attributes are not expected to change
      unless a new table is loaded, or a device is removed and recreated.
      Therefore the clear-text of the attributes should only be measured
      during table load, and the hash of the active/inactive table should be
      measured for the remaining device state changes.
      
      Export IMA function ima_measure_critical_data() to allow measurement
      of DM device parameters, as well as target specific attributes, during
      table load.  Compute the hash of the inactive table and store it for
      measurements during future state change.  If a load is called multiple
      times, update the inactive table hash with the hash of the latest
      populated table.  So that the correct inactive table hash is measured
      when the device transitions to different states like resume, remove,
      rename, etc.
      Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
      Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      91ccbbac
  2. 05 6月, 2021 2 次提交
    • D
      dm: introduce zone append emulation · bb37d772
      Damien Le Moal 提交于
      For zoned targets that cannot support zone append operations, implement
      an emulation using regular write operations. If the original BIO
      submitted by the user is a zone append operation, change its clone into
      a regular write operation directed at the target zone write pointer
      position.
      
      To do so, an array of write pointer offsets (write pointer position
      relative to the start of a zone) is added to struct mapped_device. All
      operations that modify a sequential zone write pointer (writes, zone
      reset, zone finish and zone append) are intersepted in __map_bio() and
      processed using the new functions dm_zone_map_bio().
      
      Detection of the target ability to natively support zone append
      operations is done from dm_table_set_restrictions() by calling the
      function dm_set_zones_restrictions(). A target that does not support
      zone append operation, either by explicitly declaring it using the new
      struct dm_target field zone_append_not_supported, or because the device
      table contains a non-zoned device, has its mapped device marked with the
      new flag DMF_ZONE_APPEND_EMULATED. The helper function
      dm_emulate_zone_append() is introduced to test a mapped device for this
      new flag.
      
      Atomicity of the zones write pointer tracking and updates is done using
      a zone write locking mechanism based on a bitmap. This is similar to
      the block layer method but based on BIOs rather than struct request.
      A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
      an operation type that changes the BIO target zone write pointer
      position. The zone write lock is released if the clone BIO is failed
      before submission or when dm_zone_endio() is called when the clone BIO
      completes.
      
      The zone write lock bitmap of the mapped device, together with a bitmap
      indicating zone types (conv_zones_bitmap) and the write pointer offset
      array (zwp_offset) are allocated and initialized with a full device zone
      report in dm_set_zones_restrictions() using the function
      dm_revalidate_zones().
      
      For failed operations that may have modified a zone write pointer, the
      zone write pointer offset is marked as invalid in dm_zone_endio().
      Zones with an invalid write pointer offset are checked and the write
      pointer updated using an internal report zone operation when the
      faulty zone is accessed again by the user.
      
      All functions added for this emulation have a minimal overhead for
      zoned targets natively supporting zone append operations. Regular
      device targets are also not affected. The added code also does not
      impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
      dm zone related functions.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      bb37d772
    • D
      dm: Introduce dm_report_zones() · 912e8875
      Damien Le Moal 提交于
      To simplify the implementation of the report_zones operation of a zoned
      target, introduce the function dm_report_zones() to set a target
      mapping start sector in struct dm_report_zones_args and call
      blkdev_report_zones(). This new function is exported and the report
      zones callback function dm_report_zones_cb() is not.
      
      dm-linear, dm-flakey and dm-crypt are modified to use dm_report_zones().
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      912e8875
  3. 20 4月, 2021 1 次提交
  4. 23 3月, 2021 1 次提交
    • S
      dm table: Fix zoned model check and zone sectors check · 2d669ceb
      Shin'ichiro Kawasaki 提交于
      Commit 24f6b603 ("dm table: fix zoned iterate_devices based device
      capability checks") triggered dm table load failure when dm-zoned device
      is set up for zoned block devices and a regular device for cache.
      
      The commit inverted logic of two callback functions for iterate_devices:
      device_is_zoned_model() and device_matches_zone_sectors(). The logic of
      device_is_zoned_model() was inverted then all destination devices of all
      targets in dm table are required to have the expected zoned model. This
      is fine for dm-linear, dm-flakey and dm-crypt on zoned block devices
      since each target has only one destination device. However, this results
      in failure for dm-zoned with regular cache device since that target has
      both regular block device and zoned block devices.
      
      As for device_matches_zone_sectors(), the commit inverted the logic to
      require all zoned block devices in each target have the specified
      zone_sectors. This check also fails for regular block device which does
      not have zones.
      
      To avoid the check failures, fix the zone model check and the zone
      sectors check. For zone model check, introduce the new feature flag
      DM_TARGET_MIXED_ZONED_MODEL, and set it to dm-zoned target. When the
      target has this flag, allow it to have destination devices with any
      zoned model. For zone sectors check, skip the check if the destination
      device is not a zoned block device. Also add comments and improve an
      error message to clarify expectations to the two checks.
      
      Fixes: 24f6b603 ("dm table: fix zoned iterate_devices based device capability checks")
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2d669ceb
  5. 11 2月, 2021 3 次提交
    • M
      dm: fix deadlock when swapping to encrypted device · a666e5c0
      Mikulas Patocka 提交于
      The system would deadlock when swapping to a dm-crypt device. The reason
      is that for each incoming write bio, dm-crypt allocates memory that holds
      encrypted data. These excessive allocations exhaust all the memory and the
      result is either deadlock or OOM trigger.
      
      This patch limits the number of in-flight swap bios, so that the memory
      consumed by dm-crypt is limited. The limit is enforced if the target set
      the "limit_swap_bios" variable and if the bio has REQ_SWAP set.
      
      Non-swap bios are not affected becuase taking the semaphore would cause
      performance degradation.
      
      This is similar to request-based drivers - they will also block when the
      number of requests is over the limit.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a666e5c0
    • M
      dm: simplify target code conditional on CONFIG_BLK_DEV_ZONED · e3290b94
      Mike Snitzer 提交于
      Allow removal of CONFIG_BLK_DEV_ZONED conditionals in target_type
      definition of various targets.
      Suggested-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e3290b94
    • S
      dm: add support for passing through inline crypto support · aa6ce87a
      Satya Tangirala 提交于
      Update the device-mapper core to support exposing the inline crypto
      support of the underlying device(s) through the device-mapper device.
      
      This works by creating a "passthrough keyslot manager" for the dm
      device, which declares support for encryption settings which all
      underlying devices support.  When a supported setting is used, the bio
      cloning code handles cloning the crypto context to the bios for all the
      underlying devices.  When an unsupported setting is used, the blk-crypto
      fallback is used as usual.
      
      Crypto support on each underlying device is ignored unless the
      corresponding dm target opts into exposing it.  This is needed because
      for inline crypto to semantically operate on the original bio, the data
      must not be transformed by the dm target.  Thus, targets like dm-linear
      can expose crypto support of the underlying device, but targets like
      dm-crypt can't.  (dm-crypt could use inline crypto itself, though.)
      
      A DM device's table can only be changed if the "new" inline encryption
      capabilities are a (*not* necessarily strict) superset of the "old" inline
      encryption capabilities.  Attempts to make changes to the table that result
      in some inline encryption capability becoming no longer supported will be
      rejected.
      
      For the sake of clarity, key eviction from underlying devices will be
      handled in a future patch.
      Co-developed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      aa6ce87a
  6. 08 10月, 2020 1 次提交
  7. 25 9月, 2020 1 次提交
  8. 24 7月, 2020 1 次提交
    • M
      dm integrity: fix integrity recalculation that is improperly skipped · 5df96f2b
      Mikulas Patocka 提交于
      Commit adc0daad ("dm: report suspended
      device during destroy") broke integrity recalculation.
      
      The problem is dm_suspended() returns true not only during suspend,
      but also during resume. So this race condition could occur:
      1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
      2. integrity_recalc (&ic->recalc_work) preempts the current thread
      3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
      4. integrity_recalc exits and no recalculating is done.
      
      To fix this race condition, add a function dm_post_suspending that is
      only true during the postsuspend phase and use it instead of
      dm_suspended().
      
      Signed-off-by: Mikulas Patocka <mpatocka redhat com>
      Fixes: adc0daad ("dm: report suspended device during destroy")
      Cc: stable vger kernel org # v4.18+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5df96f2b
  9. 09 7月, 2020 1 次提交
  10. 21 5月, 2020 1 次提交
  11. 15 5月, 2020 1 次提交
  12. 03 4月, 2020 1 次提交
  13. 13 11月, 2019 1 次提交
  14. 06 11月, 2019 1 次提交
    • G
      dm stripe: use struct_size() in kmalloc() · 8adeac3b
      Gustavo A. R. Silva 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct stripe_c {
              ...
              struct stripe stripe[0];
      };
      
      In this case alloc_context() and dm_array_too_big() are removed and
      replaced by the direct use of the struct_size() helper in kmalloc().
      
      Notice that open-coded form is prone to type mistakes.
      
      This code was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8adeac3b
  15. 18 7月, 2019 1 次提交
  16. 12 7月, 2019 1 次提交
  17. 26 4月, 2019 1 次提交
    • Y
      dm mpath: fix missing call of path selector type->end_io · 5de719e3
      Yufen Yu 提交于
      After commit 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via
      blk_insert_cloned_request feedback"), map_request() will requeue the tio
      when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.
      
      Thus, if device driver status is error, a tio may be requeued multiple
      times until the return value is not DM_MAPIO_REQUEUE.  That means
      type->start_io may be called multiple times, while type->end_io is only
      called when IO complete.
      
      In fact, even without commit 396eaf21, setup_clone() failure can
      also cause tio requeue and associated missed call to type->end_io.
      
      The service-time path selector selects path based on in_flight_size,
      which is increased by st_start_io() and decreased by st_end_io().
      Missed calls to st_end_io() can lead to in_flight_size count error and
      will cause the selector to make the wrong choice.  In addition,
      queue-length path selector will also be affected.
      
      To fix the problem, call type->end_io in ->release_clone_rq before tio
      requeue.  map_info is passed to ->release_clone_rq() for map_request()
      error path that result in requeue.
      
      Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
      Cc: stable@vger.kernl.org
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5de719e3
  18. 06 3月, 2019 2 次提交
    • H
      dm: add support to directly boot to a mapped device · 6bbc923d
      Helen Koike 提交于
      Add a "create" module parameter, which allows device-mapper targets to
      be configured at boot time. This enables early use of DM targets in the
      boot process (as the root device or otherwise) without the need of an
      initramfs.
      
      The syntax used in the boot param is based on the concise format from
      the dmsetup tool to follow the rule of least surprise:
      
      	dmsetup table --concise /dev/mapper/lroot
      
      Which is:
      	dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
      
      Where,
      	<name>		::= The device name.
      	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
      	<minor>		::= The device minor number | ""
      	<flags>		::= "ro" | "rw"
      	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
      	<target_type>	::= "verity" | "linear" | ...
      
      For example, the following could be added in the boot parameters:
      dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
      
      Only the targets that were tested are allowed and the ones that don't
      change any block device when the device is create as read-only. For
      example, mirror and cache targets are not allowed. The rationale behind
      this is that if the user makes a mistake, choosing the wrong device to
      be the mirror or the cache can corrupt data.
      
      The only targets initially allowed are:
      * crypt
      * delay
      * linear
      * snapshot-origin
      * striped
      * verity
      Co-developed-by: NWill Drewry <wad@chromium.org>
      Co-developed-by: NKees Cook <keescook@chromium.org>
      Co-developed-by: NEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Signed-off-by: NHelen Koike <helen.koike@collabora.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6bbc923d
    • N
      dm: fix to_sector() for 32bit · 0bdb50c5
      NeilBrown 提交于
      A dm-raid array with devices larger than 4GB won't assemble on
      a 32 bit host since _check_data_dev_sectors() was added in 4.16.
      This is because to_sector() treats its argument as an "unsigned long"
      which is 32bits (4GB) on a 32bit host.  Using "unsigned long long"
      is more correct.
      
      Kernels as early as 4.2 can have other problems due to to_sector()
      being used on the size of a device.
      
      Fixes: 0cf45031 ("dm raid: add support for the MD RAID0 personality")
      cc: stable@vger.kernel.org (v4.2+)
      Reported-and-tested-by: NGuillaume Perréal <gperreal@free.fr>
      Signed-off-by: NNeilBrown <neil@brown.name>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0bdb50c5
  19. 21 2月, 2019 1 次提交
  20. 26 10月, 2018 1 次提交
    • C
      block: add a report_zones method · e76239a3
      Christoph Hellwig 提交于
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e76239a3
  21. 19 10月, 2018 1 次提交
  22. 11 10月, 2018 1 次提交
  23. 23 5月, 2018 1 次提交
    • D
      dax: Introduce a ->copy_to_iter dax operation · b3a9a0c3
      Dan Williams 提交于
      Similar to the ->copy_from_iter() operation, a platform may want to
      deploy an architecture or device specific routine for handling reads
      from a dax_device like /dev/pmemX. On x86 this routine will point to a
      machine check safe version of copy_to_iter(). For now, add the plumbing
      to device-mapper and the dax core.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3a9a0c3
  24. 05 4月, 2018 1 次提交
  25. 04 4月, 2018 2 次提交
  26. 18 3月, 2018 1 次提交
    • B
      block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> · 233bde21
      Bart Van Assche 提交于
      It happens often while I'm preparing a patch for a block driver that
      I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
      available for this driver? Do I have to introduce definitions of these
      constants before I can use these constants? To avoid this confusion,
      move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
      <linux/blkdev.h> header file such that these become available for all
      block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
      header file conditional to avoid that including that header file after
      <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
      redefinition.
      
      Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
      not been removed from uapi header files nor from NAND drivers in
      which these constants are used for another purpose than converting
      block layer offsets and sizes into a number of sectors.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      233bde21
  27. 30 1月, 2018 1 次提交
  28. 17 1月, 2018 1 次提交
  29. 20 12月, 2017 1 次提交
    • M
      dm: introduce DM_TYPE_NVME_BIO_BASED · 22c11858
      Mike Snitzer 提交于
      If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
      all devices in the DM table do not support partial completions.  Also,
      the table has a single immutable target that doesn't require DM core to
      split bios.
      
      This will enable adding NVMe optimizations to bio-based DM.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      22c11858
  30. 17 12月, 2017 1 次提交
    • M
      dm: improve performance by moving dm_io structure to per-bio-data · 64f52b0e
      Mike Snitzer 提交于
      Eliminates need for a separate mempool to allocate 'struct dm_io'
      objects from.  As such, it saves an extra mempool allocation for each
      original bio that DM core is issued.
      
      This complicates the per-bio-data accessor functions by needing to
      conditonally add extra padding to get to a target's per-bio-data.  But
      in the end this provides a decent performance improvement for all
      bio-based DM devices.
      
      On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
      DM linear performance improved by 2% (went from 2665 to 2777 MB/s).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      64f52b0e
  31. 14 12月, 2017 1 次提交
  32. 11 9月, 2017 1 次提交
    • M
      dax: remove the pmem_dax_ops->flush abstraction · c3ca015f
      Mikulas Patocka 提交于
      Commit abebfbe2 ("dm: add ->flush() dax operation support") is
      buggy. A DM device may be composed of multiple underlying devices and
      all of them need to be flushed. That commit just routes the flush
      request to the first device and ignores the other devices.
      
      It could be fixed by adding more complex logic to the device mapper. But
      there is only one implementation of the method pmem_dax_ops->flush - that
      is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
      don't need the pmem_dax_ops->flush abstraction at all, we can call
      arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
      can't ever reach anything different from arch_wb_cache_pmem().
      
      It should be also pointed out that for some uses of persistent memory it
      is needed to flush only a very small amount of data (such as 1 cacheline),
      and it would be overkill if we go through that device mapper machinery for
      a single flushed cache line.
      
      Fix this by removing the pmem_dax_ops->flush abstraction and call
      arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
      mapper code that forwards the flushes.
      
      Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c3ca015f
  33. 28 8月, 2017 2 次提交
  34. 19 6月, 2017 1 次提交