1. 22 12月, 2012 2 次提交
    • J
      dm thin: fix race between simultaneous io and discards to same block · e8088073
      Joe Thornber 提交于
      There is a race when discard bios and non-discard bios are issued
      simultaneously to the same block.
      
      Discard support is expensive for all thin devices precisely because you
      have to be careful to quiesce the area you're discarding.  DM thin must
      handle this conflicting IO pattern (simultaneous non-discard vs discard)
      even though a sane application shouldn't be issuing such IO.
      
      The race manifests as follows:
      
      1. A non-discard bio is mapped in thin_bio_map.
         This doesn't lock out parallel activity to the same block.
      
      2. A discard bio is issued to the same block as the non-discard bio.
      
      3. The discard bio is locked in a dm_bio_prison_cell in process_discard
         to lock out parallel activity against the same block.
      
      4. The non-discard bio's mapping continues and its all_io_entry is
         incremented so the bio is accounted for in the thin pool's all_io_ds
         which is a dm_deferred_set used to track time locality of non-discard IO.
      
      5. The non-discard bio is finally locked in a dm_bio_prison_cell in
         process_bio.
      
      The race can result in deadlock, leaving the block layer hanging waiting
      for completion of a discard bio that never completes, e.g.:
      
      INFO: task ruby:15354 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      ruby            D ffffffff8160f0e0     0 15354  15314 0x00000000
       ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900
       ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900
       ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0
      Call Trace:
       [<ffffffff814e5a19>] schedule+0x29/0x70
       [<ffffffff814e3d85>] schedule_timeout+0x195/0x220
       [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod]
       [<ffffffff814e589e>] wait_for_common+0x11e/0x190
       [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0
       [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20
       [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260
       [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0
       [<ffffffff8119a65c>] block_ioctl+0x3c/0x40
       [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340
       [<ffffffff8119a547>] ? block_llseek+0x67/0xb0
       [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0
       [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0
       [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b
      
      The thinp-test-suite's test_discard_random_sectors reliably hits this
      deadlock on fast SSD storage.
      
      The fix for this race is that the all_io_entry for a bio must be
      incremented whilst the dm_bio_prison_cell is held for the bio's
      associated virtual and physical blocks.  That cell locking wasn't
      occurring early enough in thin_bio_map.  This patch fixes this.
      
      Care is taken to always call the new function inc_all_io_entry() with
      the relevant cells locked, but they are generally unlocked before
      calling issue() to try to avoid holding the cells locked across
      generic_submit_request.
      
      Also, now that thin_bio_map may lock bios in a cell, process_bio() is no
      longer the only thread that will do so.  Because of this we must be sure
      to use cell_defer_except() to release all non-holder entries, that
      were added by the other thread, because they must be deferred.
      
      This patch depends on "dm thin: replace dm_cell_release_singleton with
      cell_defer_except".
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: stable@vger.kernel.org
      e8088073
    • J
      dm thin: replace dm_cell_release_singleton with cell_defer_except · b7ca9c92
      Joe Thornber 提交于
      Change existing users of the function dm_cell_release_singleton to share
      cell_defer_except instead, and then remove the now-unused function.
      
      Everywhere that calls dm_cell_release_singleton, the bio in question
      is the holder of the cell.
      
      If there are no non-holder entries in the cell then cell_defer_except
      behaves exactly like dm_cell_release_singleton.  Conversely, if there
      *are* non-holder entries then dm_cell_release_singleton must not be used
      because those entries would need to be deferred.
      
      Consequently, it is safe to replace use of dm_cell_release_singleton
      with cell_defer_except.
      
      This patch is a pre-requisite for "dm thin: fix race between
      simultaneous io and discards to same block".
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b7ca9c92
  2. 13 10月, 2012 3 次提交
  3. 27 9月, 2012 3 次提交
    • M
      dm thin: fix discard support for data devices · 0424caa1
      Mike Snitzer 提交于
      The discard limits that get established for a thin-pool or thin device
      may be incompatible with the pool's data device.  Avoid this by checking
      the discard limits of the pool's data device.  If an incompatibility is
      found then the pool's 'discard passdown' feature is disabled.
      
      Change thin_io_hints to ensure that a thin device always uses the same
      queue limits as its pool device.
      
      Introduce requested_pf to track whether or not the table line originally
      contained the no_discard_passdown flag and use this directly for table
      output.  We prepare the correct setting for discard_passdown directly in
      bind_control_target (called from pool_io_hints) and store it in
      adjusted_pf rather than waiting until we have access to pool->pf in
      pool_preresume.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0424caa1
    • M
      dm thin: tidy discard support · 9bc142dd
      Mike Snitzer 提交于
      A little thin discard code refactoring to make the next patch (dm thin:
      fix discard support for data devices) more readable.
      Pull out a couple of functions (and uses bools instead of unsigned for
      features).
      
      No functional changes.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9bc142dd
    • M
      dm thin: do not set discard_zeroes_data · 307615a2
      Mike Snitzer 提交于
      The dm thin pool target claims to support the zeroing of discarded
      data areas.  This turns out to be incorrect when processing discards
      that do not exactly cover a complete number of blocks, so the target
      must always set discard_zeroes_data_unsupported.
      
      The thin pool target will zero blocks when they are allocated if the
      skip_block_zeroing feature is not specified.  The block layer
      may send a discard that only partly covers a block.  If a thin pool
      block is partially discarded then there is no guarantee that the
      discarded data will get zeroed before it is accessed again.
      Due to this, thin devices cannot claim discards will always zero data.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      307615a2
  4. 27 7月, 2012 15 次提交
  5. 20 7月, 2012 1 次提交
    • M
      dm thin: do not send discards to shared blocks · 650d2a06
      Mikulas Patocka 提交于
      When process_discard receives a partial discard that doesn't cover a
      full block, it sends this discard down to that block. Unfortunately, the
      block can be shared and the discard would corrupt the other snapshots
      sharing this block.
      
      This patch detects block sharing and ends the discard with success when
      sending it to the shared block.
      
      The above change means that if the device supports discard it can't be
      guaranteed that a discard request zeroes data. Therefore, we set
      ti->discard_zeroes_data_unsupported.
      
      Thin target discard support with this bug arrived in commit
      104655fd (dm thin: support discards).
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      650d2a06
  6. 03 7月, 2012 1 次提交
    • J
      dm thin: commit metadata before creating metadata snapshot · 0d200aef
      Joe Thornber 提交于
      Userland sometimes sees a corrupt metadata block if metadata is changing
      rapidly when a metadata snapshot is reserved for userland,  To make the
      problem go away, commit before we take the metadata snapshot (which is a
      sensible thing to do anyway).
      
      The checksums mean userland spots this corruption immediately so there's
      no risk of acting on incorrect data.  No corruption exists from the
      kernel's point of view, and thin_check passes after pool shutdown.
      
      I believe this is to do with shared blocks at the first level of the
      {device, mapping} btree.  Prior to the metadata-snap support no sharing
      at this level was possible, so this patch is only required after commit
      cc8394d8 ("dm thin: provide userspace
      access to pool metadata").
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0d200aef
  7. 03 6月, 2012 2 次提交
  8. 19 5月, 2012 1 次提交
    • M
      dm thin: fix table output when pool target disables discard passdown internally · f402693d
      Mike Snitzer 提交于
      When the thin pool target clears the discard_passdown parameter
      internally, it incorrectly changes the table line reported to userspace.
      This breaks dumb string comparisons on these table lines in generic
      userspace device-mapper library code and leads to tables being reloaded
      repeatedly when nothing is actually meant to be changing.
      
      This patch corrects this by no longer changing the table line when
      discard passdown was disabled.
      
      We can still tell when discard passdown is overridden by looking for the
      message "Discard unsupported by data device (sdX): Disabling discard passdown."
      
      This automatic detection is also moved from the 'load' to the 'resume'
      so that it is re-evaluated should the properties of underlying devices
      change.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f402693d
  9. 12 5月, 2012 3 次提交
    • A
      dm thin: correct module description · 7cab8bf1
      Alasdair G Kergon 提交于
      Remove duplicate copy of string "device-mapper" (DM_NAME) from
      MODULE_DESCRIPTION.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7cab8bf1
    • M
      dm thin: fix unprotected use of prepared_discards list · c3a0ce2e
      Mike Snitzer 提交于
      Fix two places in commit 104655fd ("dm thin: support discards") that
      didn't use pool->lock to protect against concurrent changes to the
      prepared_discards list.
      
      Without this fix, thin_endio() can race with process_discard(), leading
      to concurrent list_add()s that result in the processes locking up with
      an error like the following:
      
      WARNING: at lib/list_debug.c:32 __list_add+0x8f/0xa0()
      ...
      list_add corruption. next->prev should be prev (ffff880323b96140), but was ffff8801d2c48440. (next=ffff8801d2c485c0).
      ...
      Pid: 17205, comm: kworker/u:1 Tainted: G        W  O 3.4.0-rc3.snitm+ #1
      Call Trace:
       [<ffffffff8103ca1f>] warn_slowpath_common+0x7f/0xc0
       [<ffffffff8103cb16>] warn_slowpath_fmt+0x46/0x50
       [<ffffffffa04f6ce6>] ? bio_detain+0xc6/0x210 [dm_thin_pool]
       [<ffffffff8124ff3f>] __list_add+0x8f/0xa0
       [<ffffffffa04f70d2>] process_discard+0x2a2/0x2d0 [dm_thin_pool]
       [<ffffffffa04f6a78>] ? remap_and_issue+0x38/0x50 [dm_thin_pool]
       [<ffffffffa04f7c3b>] process_deferred_bios+0x7b/0x230 [dm_thin_pool]
       [<ffffffffa04f7df0>] ? process_deferred_bios+0x230/0x230 [dm_thin_pool]
       [<ffffffffa04f7e42>] do_worker+0x52/0x60 [dm_thin_pool]
       [<ffffffff81056fa9>] process_one_work+0x129/0x450
       [<ffffffff81059b9c>] worker_thread+0x17c/0x3c0
       [<ffffffff81059a20>] ? manage_workers+0x120/0x120
       [<ffffffff8105eabe>] kthread+0x9e/0xb0
       [<ffffffff814ceda4>] kernel_thread_helper+0x4/0x10
       [<ffffffff8105ea20>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff814ceda0>] ? gs_change+0x13/0x13
      ---[ end trace 7e0a523bc5e52692 ]---
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c3a0ce2e
    • M
      dm thin: reinstate missing mempool_free in cell_release_singleton · 03aaae7c
      Mike Snitzer 提交于
      Fix a significant memory leak inadvertently introduced during
      simplification of cell_release_singleton() in commit
      6f94a4c4 ("dm thin: fix stacked bi_next
      usage").
      
      A cell's hlist_del() must be accompanied by a mempool_free().
      Use __cell_release() to do this, like before.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      03aaae7c
  10. 29 3月, 2012 9 次提交