1. 06 9月, 2013 2 次提交
  2. 23 8月, 2013 1 次提交
  3. 20 5月, 2013 1 次提交
    • A
      dm thin: fix metadata dev resize detection · 610bba8b
      Alasdair G Kergon 提交于
      Fix detection of the need to resize the dm thin metadata device.
      
      The code incorrectly tried to extend the metadata device when it
      didn't need to due to a merging error with patch 24347e95 ("dm thin:
      detect metadata device resizing").
      
        device-mapper: transaction manager: couldn't open metadata space map
        device-mapper: thin metadata: tm_open_with_sm failed
        device-mapper: thin: aborting transaction failed
        device-mapper: thin: switching pool to failure mode
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      610bba8b
  4. 10 5月, 2013 4 次提交
  5. 21 3月, 2013 2 次提交
    • J
      dm thin: fix non power of two discard granularity calc · 58051b94
      Joe Thornber 提交于
      Fix a discard granularity calculation to work for non power of 2 block sizes.
      
      In order for thinp to passdown discard bios to the underlying data
      device, the data device must have a discard granularity that is a
      factor of the thinp block size.  Originally this check was done by
      using bitops since the block_size was known to be a power of two.
      
      Introduced by commit f13945d7
      ("dm thin: support a non power of 2 discard_granularity").
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      58051b94
    • J
      dm thin: fix discard corruption · f046f89a
      Joe Thornber 提交于
      Fix a bug in dm_btree_remove that could leave leaf values with incorrect
      reference counts.  The effect of this was that removal of a shared block
      could result in the space maps thinking the block was no longer used.
      More concretely, if you have a thin device and a snapshot of it, sending
      a discard to a shared region of the thin could corrupt the snapshot.
      
      Thinp uses a 2-level nested btree to store it's mappings.  This first
      level is indexed by thin device, and the second level by logical
      block.
      
      Often when we're removing an entry in this mapping tree we need to
      rebalance nodes, which can involve shadowing them, possibly creating a
      copy if the block is shared.  If we do create a copy then children of
      that node need to have their reference counts incremented.  In this
      way reference counts percolate down the tree as shared trees diverge.
      
      The rebalance functions were incrementing the children at the
      appropriate time, but they were always assuming the children were
      internal nodes.  This meant the leaf values (in our case packed
      block/flags entries) were not being incremented.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f046f89a
  6. 02 3月, 2013 7 次提交
    • J
      dm thin: remove cells from stack · 025b9685
      Joe Thornber 提交于
      This patch takes advantage of the new bio-prison interface where the
      memory is now passed in rather than using a mempool in bio-prison.
      This allows the map function to avoid performing potentially-blocking
      allocations that could lead to deadlocks: We want to avoid the cell
      allocation that is done in bio_detain.
      
      (The potential for mempool deadlocks still remains in other functions
      that use bio_detain.)
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      025b9685
    • J
      dm bio prison: pass cell memory in · 6beca5eb
      Joe Thornber 提交于
      Change the dm_bio_prison interface so that instead of allocating memory
      internally, dm_bio_detain is supplied with a pre-allocated cell each
      time it is called.
      
      This enables a subsequent patch to move the allocation of the struct
      dm_bio_prison_cell outside the thin target's mapping function so it can
      no longer block there.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6beca5eb
    • M
      dm kcopyd: introduce configurable throttling · df5d2e90
      Mikulas Patocka 提交于
      This patch allows the administrator to reduce the rate at which kcopyd
      issues I/O.
      
      Each module that uses kcopyd acquires a throttle parameter that can be
      set in /sys/module/*/parameters.
      
      We maintain a history of kcopyd usage by each module in the variables
      io_period and total_period in struct dm_kcopyd_throttle. The actual
      kcopyd activity is calculated as a percentage of time equal to
      "(100 * io_period / total_period)".  This is compared with the user-defined
      throttle percentage threshold and if it is exceeded, we sleep.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      df5d2e90
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm thin: use block_size_is_power_of_two · 58f77a21
      Mike Snitzer 提交于
      Use block_size_is_power_of_two() rather than checking
      sectors_per_block_shift directly.  Also introduce local pool variable in
      get_bio_block() to eliminate redundant tc->pool dereferences.
      
      No functional change.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      58f77a21
    • M
      dm thin: support a non power of 2 discard_granularity · f13945d7
      Mike Snitzer 提交于
      Support a non-power-of-2 discard granularity in dm-thin, now that the block
      layer supports this(via 8dd2cb7e "block:
      discard granularity might not be power of 2" and
      59771079 "blk: avoid divide-by-zero with zero
      discard granularity").
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f13945d7
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e
  7. 31 1月, 2013 1 次提交
    • M
      dm thin: fix queue limits stacking · 0f640dca
      Mike Snitzer 提交于
      thin_io_hints() is blindly copying the queue limits from the thin-pool
      which can lead to incorrect limits being set.  The fix here simply
      deletes the thin_io_hints() hook which leaves the existing stacking
      infrastructure to set the limits correctly.
      
      When a thin-pool uses an MD device for the data device a thin device
      from the thin-pool must respect MD's constraints about disallowing a bio
      from spanning multiple chunks.  Otherwise we can see problems.  If the raid0
      chunksize is 1152K and thin-pool chunksize is 256K I see the following
      md/raid0 error (with extra debug tracing added to thin_endio) when
      mkfs.xfs is executed against the thin device:
      
      md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127
      device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0
      
      This extra DM debugging shows that the failing bio is spanning across
      the first and second logical 1152K chunk (sector 2080 + 255 takes the
      bio beyond the first chunk's boundary of sector 2304).  So the bio
      splitting that DM is doing clearly isn't respecting the MD limits.
      
      max_hw_sectors_kb is 127 for both the thin-pool and thin device
      (queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of
      precision).  So this explains why bi_size is 130560.
      
      But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given
      that it doesn't have a .merge function (for bio_add_page to consult
      indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD
      device that has a compulsory merge_bvec_fn.  This scenario is exactly
      why DM must resort to sending single PAGE_SIZE bios to the underlying
      layer. Some additional context for this is available in the header for
      commit 8cbeb67a ("dm: avoid unsupported spanning of md stripe boundaries").
      
      Long story short, the reason a thin device doesn't properly get
      configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that
      thin_io_hints() is blindly copying the queue limits from the thin-pool
      device directly to the thin device's queue limits.
      
      Fix this by eliminating thin_io_hints.  Doing so is safe because the
      block layer's queue limits stacking already enables the upper level thin
      device to inherit the thin-pool device's discard and minimum_io_size and
      optimal_io_size limits that get set in pool_io_hints.  But avoiding the
      queue limits copy allows the thin and thin-pool limits to be different
      where it is important, namely max_hw_sectors_kb.
      Reported-by: NDaniel Browning <db@kavod.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0f640dca
  8. 22 12月, 2012 10 次提交
    • M
      dm: remove map_info · 7de3ee57
      Mikulas Patocka 提交于
      This patch removes map_info from bio-based device mapper targets.
      map_info is still used for request-based targets.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7de3ee57
    • M
      dm thin: dont use map_context · 59c3d2c6
      Mikulas Patocka 提交于
      This patch removes endio_hook_pool from dm-thin and uses per-bio data instead.
      
      This patch removes any use of map_info in preparation for the next patch
      that removes map_info from bio-based device mapper.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      59c3d2c6
    • M
      dm kcopyd: add WRITE SAME support to dm_kcopyd_zero · 70d6c400
      Mike Snitzer 提交于
      Add WRITE SAME support to dm-io and make it accessible to
      dm_kcopyd_zero().  dm_kcopyd_zero() provides an asynchronous interface
      whereas the blkdev_issue_write_same() interface is synchronous.
      
      WRITE SAME is a SCSI command that can be leveraged for more efficient
      zeroing of a specified logical extent of a device which supports it.
      Only a single zeroed logical block is transfered to the target for each
      WRITE SAME and the target then writes that same block across the
      specified extent.
      
      The dm thin target uses this.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      70d6c400
    • M
      dm thin: use DMERR_LIMIT for errors · c397741c
      Mike Snitzer 提交于
      Throttle all errors logged from the IO path by dm thin.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c397741c
    • J
      dm thin: cleanup dead code · 2aab3850
      Joe Thornber 提交于
      Remove unused @data_block parameter from cell_defer.
      Change thin_bio_map to use many returns rather than setting a variable.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      2aab3850
    • J
      dm thin: rename cell_defer_except to cell_defer_no_holder · f286ba0e
      Joe Thornber 提交于
      Rename cell_defer_except() to cell_defer_no_holder() which describes
      its function more clearly.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f286ba0e
    • M
      dm thin: emit ignore_discard in status when discards disabled · 018debea
      Mike Snitzer 提交于
      If "ignore_discard" is specified when creating the thin pool device then
      discard support is disabled for that device.  The pool device's status
      should reflect this fact rather than stating "no_discard_passdown"
      (which implies discards are enabled but passdown is disabled).
      Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      018debea
    • J
      dm thin: wake worker when discard is prepared · 563af186
      Joe Thornber 提交于
      When discards are prepared it is best to directly wake the worker that
      will process them.  The worker will be woken anyway, via periodic
      commit, but there is no reason to not wake_worker here.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      563af186
    • J
      dm thin: fix race between simultaneous io and discards to same block · e8088073
      Joe Thornber 提交于
      There is a race when discard bios and non-discard bios are issued
      simultaneously to the same block.
      
      Discard support is expensive for all thin devices precisely because you
      have to be careful to quiesce the area you're discarding.  DM thin must
      handle this conflicting IO pattern (simultaneous non-discard vs discard)
      even though a sane application shouldn't be issuing such IO.
      
      The race manifests as follows:
      
      1. A non-discard bio is mapped in thin_bio_map.
         This doesn't lock out parallel activity to the same block.
      
      2. A discard bio is issued to the same block as the non-discard bio.
      
      3. The discard bio is locked in a dm_bio_prison_cell in process_discard
         to lock out parallel activity against the same block.
      
      4. The non-discard bio's mapping continues and its all_io_entry is
         incremented so the bio is accounted for in the thin pool's all_io_ds
         which is a dm_deferred_set used to track time locality of non-discard IO.
      
      5. The non-discard bio is finally locked in a dm_bio_prison_cell in
         process_bio.
      
      The race can result in deadlock, leaving the block layer hanging waiting
      for completion of a discard bio that never completes, e.g.:
      
      INFO: task ruby:15354 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      ruby            D ffffffff8160f0e0     0 15354  15314 0x00000000
       ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900
       ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900
       ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0
      Call Trace:
       [<ffffffff814e5a19>] schedule+0x29/0x70
       [<ffffffff814e3d85>] schedule_timeout+0x195/0x220
       [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod]
       [<ffffffff814e589e>] wait_for_common+0x11e/0x190
       [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0
       [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20
       [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260
       [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0
       [<ffffffff8119a65c>] block_ioctl+0x3c/0x40
       [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340
       [<ffffffff8119a547>] ? block_llseek+0x67/0xb0
       [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0
       [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0
       [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b
      
      The thinp-test-suite's test_discard_random_sectors reliably hits this
      deadlock on fast SSD storage.
      
      The fix for this race is that the all_io_entry for a bio must be
      incremented whilst the dm_bio_prison_cell is held for the bio's
      associated virtual and physical blocks.  That cell locking wasn't
      occurring early enough in thin_bio_map.  This patch fixes this.
      
      Care is taken to always call the new function inc_all_io_entry() with
      the relevant cells locked, but they are generally unlocked before
      calling issue() to try to avoid holding the cells locked across
      generic_submit_request.
      
      Also, now that thin_bio_map may lock bios in a cell, process_bio() is no
      longer the only thread that will do so.  Because of this we must be sure
      to use cell_defer_except() to release all non-holder entries, that
      were added by the other thread, because they must be deferred.
      
      This patch depends on "dm thin: replace dm_cell_release_singleton with
      cell_defer_except".
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: stable@vger.kernel.org
      e8088073
    • J
      dm thin: replace dm_cell_release_singleton with cell_defer_except · b7ca9c92
      Joe Thornber 提交于
      Change existing users of the function dm_cell_release_singleton to share
      cell_defer_except instead, and then remove the now-unused function.
      
      Everywhere that calls dm_cell_release_singleton, the bio in question
      is the holder of the cell.
      
      If there are no non-holder entries in the cell then cell_defer_except
      behaves exactly like dm_cell_release_singleton.  Conversely, if there
      *are* non-holder entries then dm_cell_release_singleton must not be used
      because those entries would need to be deferred.
      
      Consequently, it is safe to replace use of dm_cell_release_singleton
      with cell_defer_except.
      
      This patch is a pre-requisite for "dm thin: fix race between
      simultaneous io and discards to same block".
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b7ca9c92
  9. 13 10月, 2012 3 次提交
  10. 27 9月, 2012 3 次提交
    • M
      dm thin: fix discard support for data devices · 0424caa1
      Mike Snitzer 提交于
      The discard limits that get established for a thin-pool or thin device
      may be incompatible with the pool's data device.  Avoid this by checking
      the discard limits of the pool's data device.  If an incompatibility is
      found then the pool's 'discard passdown' feature is disabled.
      
      Change thin_io_hints to ensure that a thin device always uses the same
      queue limits as its pool device.
      
      Introduce requested_pf to track whether or not the table line originally
      contained the no_discard_passdown flag and use this directly for table
      output.  We prepare the correct setting for discard_passdown directly in
      bind_control_target (called from pool_io_hints) and store it in
      adjusted_pf rather than waiting until we have access to pool->pf in
      pool_preresume.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0424caa1
    • M
      dm thin: tidy discard support · 9bc142dd
      Mike Snitzer 提交于
      A little thin discard code refactoring to make the next patch (dm thin:
      fix discard support for data devices) more readable.
      Pull out a couple of functions (and uses bools instead of unsigned for
      features).
      
      No functional changes.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9bc142dd
    • M
      dm thin: do not set discard_zeroes_data · 307615a2
      Mike Snitzer 提交于
      The dm thin pool target claims to support the zeroing of discarded
      data areas.  This turns out to be incorrect when processing discards
      that do not exactly cover a complete number of blocks, so the target
      must always set discard_zeroes_data_unsupported.
      
      The thin pool target will zero blocks when they are allocated if the
      skip_block_zeroing feature is not specified.  The block layer
      may send a discard that only partly covers a block.  If a thin pool
      block is partially discarded then there is no guarantee that the
      discarded data will get zeroed before it is accessed again.
      Due to this, thin devices cannot claim discards will always zero data.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      307615a2
  11. 27 7月, 2012 6 次提交