1. 20 11月, 2014 2 次提交
  2. 13 11月, 2014 2 次提交
  3. 11 11月, 2014 14 次提交
  4. 05 11月, 2014 1 次提交
  5. 02 8月, 2014 3 次提交
  6. 12 6月, 2014 1 次提交
    • L
      dm thin: update discard_granularity to reflect the thin-pool blocksize · 09869de5
      Lukas Czerner 提交于
      DM thinp already checks whether the discard_granularity of the data
      device is a factor of the thin-pool block size.  But when using the
      dm-thin-pool's discard passdown support, DM thinp was not selecting the
      max of the underlying data device's discard_granularity and the
      thin-pool's block size.
      
      Update set_discard_limits() to set discard_granularity to the max of
      these values.  This enables blkdev_issue_discard() to properly align the
      discards that are sent to the DM thin device on a full block boundary.
      As such each discard will now cover an entire DM thin-pool block and the
      block will be reclaimed.
      Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      09869de5
  7. 04 6月, 2014 2 次提交
  8. 21 5月, 2014 1 次提交
    • M
      dm thin: add 'no_space_timeout' dm-thin-pool module param · 80c57893
      Mike Snitzer 提交于
      Commit 85ad643b ("dm thin: add timeout to stop out-of-data-space mode
      holding IO forever") introduced a fixed 60 second timeout.  Users may
      want to either disable or modify this timeout.
      
      Allow the out-of-data-space timeout to be configured using the
      'no_space_timeout' dm-thin-pool module param.  Setting it to 0 will
      disable the timeout, resulting in IO being queued until more data space
      is added to the thin-pool.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.14+
      80c57893
  9. 15 5月, 2014 2 次提交
  10. 29 4月, 2014 1 次提交
  11. 08 4月, 2014 2 次提交
    • J
      dm thin: fix rcu_read_lock being held in code that can sleep · b10ebd34
      Joe Thornber 提交于
      Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
      introduced the use of an rculist for all active thin devices.  The use
      of rcu_read_lock() in process_deferred_bios() can result in a BUG if a
      dm_bio_prison_cell must be allocated as a side-effect of bio_detain():
      
       BUG: sleeping function called from invalid context at mm/mempool.c:203
       in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
       3 locks held by kworker/u8:0/6:
         #0:  ("dm-" "thin"){.+.+..}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
         #1:  ((&pool->worker)){+.+...}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
         #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816360b5>] do_worker+0x5/0x4d0
      
      We can't process deferred bios with the rcu lock held, since
      dm_bio_prison_cell allocation may block if the bio-prison's cell mempool
      is exhausted.
      
      To fix:
      
      - Introduce a refcount and completion field to each thin_c
      
      - Add thin_get/put methods for adjusting the refcount.  If the refcount
        hits zero then the completion is triggered.
      
      - Initialise refcount to 1 when creating thin_c
      
      - When iterating the active_thins list we thin_get() whilst the rcu
        lock is held.
      
      - After the rcu lock is dropped we process the deferred bios for that
        thin.
      
      - When destroying a thin_c we thin_put() and then wait for the
        completion -- to avoid a race between the worker thread iterating
        from that thin_c and destroying the thin_c.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b10ebd34
    • J
      dm thin: irqsave must always be used with the pool->lock spinlock · 5e3283e2
      Joe Thornber 提交于
      Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
      incorrectly stopped disabling irqs when taking the pool's spinlock.
      
      Irqs must be disabled when taking the pool's spinlock otherwise a thread
      could spin_lock(), then get interrupted to service thin_endio() in
      interrupt context, which would then deadlock in spin_lock_irqsave().
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5e3283e2
  12. 05 4月, 2014 1 次提交
    • M
      dm thin: sort the per thin deferred bios using an rb_tree · 67324ea1
      Mike Snitzer 提交于
      A thin-pool will allocate blocks using FIFO order for all thin devices
      which share the thin-pool.  Because of this simplistic allocation the
      thin-pool's space can become fragmented quite easily; especially when
      multiple threads are requesting blocks in parallel.
      
      Sort each thin device's deferred_bio_list based on logical sector to
      help reduce fragmentation of the thin-pool's ondisk layout.
      
      The following tables illustrate the realized gains/potential offered by
      sorting each thin device's deferred_bio_list.  An "io size"-sized random
      read of the device would result in "seeks/io" fragments being read, with
      an average "distance/seek" between each fragment.
      
      Data was written to a single thin device using multiple threads via
      iozone (8 threads, 64K for both the block_size and io_size).
      
      unsorted:
      
           io size   seeks/io distance/seek
        --------------------------------------
                4k    0.000   0b
               16k    0.013   11m
               64k    0.065   11m
              256k    0.274   10m
                1m    1.109   10m
                4m    4.411   10m
               16m    17.097  11m
               64m    60.055  13m
              256m    148.798 25m
                1g    809.929 21m
      
      sorted:
      
           io size   seeks/io distance/seek
        --------------------------------------
                4k    0.000   0b
               16k    0.000   1g
               64k    0.001   1g
              256k    0.003   1g
                1m    0.011   1g
                4m    0.045   1g
               16m    0.181   1g
               64m    0.747   1011m
              256m    3.299   1g
                1g    14.373  1g
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      67324ea1
  13. 01 4月, 2014 1 次提交
    • M
      dm thin: use per thin device deferred bio lists · c140e1c4
      Mike Snitzer 提交于
      The thin-pool previously only had a single deferred_bios list that would
      collect bios for all thin devices in the pool.  Split this per-pool
      deferred_bios list out to per-thin deferred_bios_list -- doing so
      enables increased parallelism when processing deferred bios.  And now
      that each thin device has it's own deferred_bios_list we can sort all
      bios in the list using logical sector.  The requeue code in error
      handling path is also cleaner as a side-effect.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      c140e1c4
  14. 31 3月, 2014 1 次提交
  15. 29 3月, 2014 1 次提交
  16. 06 3月, 2014 4 次提交
    • J
      dm thin: fix noflush suspend IO queueing · 738211f7
      Joe Thornber 提交于
      i) by the time DM core calls the postsuspend hook the dm_noflush flag
      has been cleared.  So the old thin_postsuspend did nothing.  We need to
      use the presuspend hook instead.
      
      ii) There was a race between bios leaving DM core and arriving in the
      deferred queue.
      
      thin_presuspend now sets a 'requeue' flag causing all bios destined for
      that thin to be requeued back to DM core.  Then it requeues all held IO,
      and all IO on the deferred queue (destined for that thin).  Finally
      postsuspend clears the 'requeue' flag.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      738211f7
    • J
      dm thin: fix deadlock in __requeue_bio_list · 18adc577
      Joe Thornber 提交于
      The spin lock in requeue_io() was held for too long, allowing deadlock.
      Don't worry, due to other issues addressed in the following "dm thin:
      fix noflush suspend IO queueing" commit, this code was never called.
      
      Fix this by taking the spin lock for a much shorter period of time.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      18adc577
    • J
      dm thin: fix out of data space handling · 3e1a0699
      Joe Thornber 提交于
      Ideally a thin pool would never run out of data space; the low water
      mark would trigger userland to extend the pool before we completely run
      out of space.  However, many small random IOs to unprovisioned space can
      consume data space at an alarming rate.  Adjust your low water mark if
      you're frequently seeing "out-of-data-space" mode.
      
      Before this fix, if data space ran out the pool would be put in
      PM_READ_ONLY mode which also aborted the pool's current metadata
      transaction (data loss for any changes in the transaction).  This had a
      side-effect of needlessly compromising data consistency.  And retry of
      queued unserviceable bios, once the data pool was resized, could
      initiate changes to potentially inconsistent pool metadata.
      
      Now when the pool's data space is exhausted transition to a new pool
      mode (PM_OUT_OF_DATA_SPACE) that allows metadata to be changed but data
      may not be allocated.  This allows users to remove thin volumes or
      discard data to recover data space.
      
      The pool is no longer put in PM_READ_ONLY mode in response to the pool
      running out of data space.  And PM_READ_ONLY mode no longer aborts the
      pool's current metadata transaction.  Also, set_pool_mode() will now
      notify userspace when the pool mode is changed.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3e1a0699
    • M
      dm thin: ensure user takes action to validate data and metadata consistency · 07f2b6e0
      Mike Snitzer 提交于
      If a thin metadata operation fails the current transaction will abort,
      whereby causing potential for IO layers up the stack (e.g. filesystems)
      to have data loss.  As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the
      thin metadata's superblock which:
      1) requires the user verify the thin metadata is consistent (e.g. use
         thin_check, etc)
      2) suggests the user verify the thin data is consistent (e.g. use fsck)
      
      The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is
      to run thin_repair.
      
      On metadata operation failure: abort current metadata transaction, set
      pool in read-only mode, and now set the needs_check flag.
      
      As part of this change, constraints are introduced or relaxed:
      * don't allow a pool to transition to write mode if needs_check is set
      * don't allow data or metadata space to be resized if needs_check is set
      * if a thin pool's metadata space is exhausted: the kernel will now
        force the user to take the pool offline for repair before the kernel
        will allow the metadata space to be extended.
      
      Also, update Documentation to include information about when the thin
      provisioning target commits metadata, how it handles metadata failures
      and running out of space.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      07f2b6e0
  17. 05 3月, 2014 1 次提交
    • M
      dm thin: synchronize the pool mode during suspend · cdc2b415
      Mike Snitzer 提交于
      Commit b5330655 ("dm thin: handle metadata failures more consistently")
      increased potential for the pool's mode to be changed in response to
      metadata operation failures.
      
      When the pool mode is changed it isn't synchronized with the mode in
      pool_features stored in the target's context (ti->private) that is used
      as the basis for (re)establishing the pool mode during resume via
      bind_control_target.
      
      It is important that we synchronize the pool mode when it is changed
      otherwise the pool may experience and unexpected mode transition on the
      next resume (especially if there was no new table load).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      cdc2b415