提交 · e8088073c9610af017fd47fddd104a2c3afb32e8 · openeuler / Kernel

22 12月, 2012 2 次提交

dm thin: fix race between simultaneous io and discards to same block · e8088073

由 Joe Thornber 提交于 12月 21, 2012

There is a race when discard bios and non-discard bios are issued
simultaneously to the same block.

Discard support is expensive for all thin devices precisely because you
have to be careful to quiesce the area you're discarding.  DM thin must
handle this conflicting IO pattern (simultaneous non-discard vs discard)
even though a sane application shouldn't be issuing such IO.

The race manifests as follows:

1. A non-discard bio is mapped in thin_bio_map.
   This doesn't lock out parallel activity to the same block.

2. A discard bio is issued to the same block as the non-discard bio.

3. The discard bio is locked in a dm_bio_prison_cell in process_discard
   to lock out parallel activity against the same block.

4. The non-discard bio's mapping continues and its all_io_entry is
   incremented so the bio is accounted for in the thin pool's all_io_ds
   which is a dm_deferred_set used to track time locality of non-discard IO.

5. The non-discard bio is finally locked in a dm_bio_prison_cell in
   process_bio.

The race can result in deadlock, leaving the block layer hanging waiting
for completion of a discard bio that never completes, e.g.:

INFO: task ruby:15354 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ruby            D ffffffff8160f0e0     0 15354  15314 0x00000000
 ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900
 ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900
 ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0
Call Trace:
 [<ffffffff814e5a19>] schedule+0x29/0x70
 [<ffffffff814e3d85>] schedule_timeout+0x195/0x220
 [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod]
 [<ffffffff814e589e>] wait_for_common+0x11e/0x190
 [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0
 [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20
 [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260
 [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0
 [<ffffffff8119a65c>] block_ioctl+0x3c/0x40
 [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340
 [<ffffffff8119a547>] ? block_llseek+0x67/0xb0
 [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0
 [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0
 [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b

The thinp-test-suite's test_discard_random_sectors reliably hits this
deadlock on fast SSD storage.

The fix for this race is that the all_io_entry for a bio must be
incremented whilst the dm_bio_prison_cell is held for the bio's
associated virtual and physical blocks.  That cell locking wasn't
occurring early enough in thin_bio_map.  This patch fixes this.

Care is taken to always call the new function inc_all_io_entry() with
the relevant cells locked, but they are generally unlocked before
calling issue() to try to avoid holding the cells locked across
generic_submit_request.

Also, now that thin_bio_map may lock bios in a cell, process_bio() is no
longer the only thread that will do so.  Because of this we must be sure
to use cell_defer_except() to release all non-holder entries, that
were added by the other thread, because they must be deferred.

This patch depends on "dm thin: replace dm_cell_release_singleton with
cell_defer_except".
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: stable@vger.kernel.org

e8088073

dm thin: replace dm_cell_release_singleton with cell_defer_except · b7ca9c92

由 Joe Thornber 提交于 12月 21, 2012

Change existing users of the function dm_cell_release_singleton to share
cell_defer_except instead, and then remove the now-unused function.

Everywhere that calls dm_cell_release_singleton, the bio in question
is the holder of the cell.

If there are no non-holder entries in the cell then cell_defer_except
behaves exactly like dm_cell_release_singleton.  Conversely, if there
*are* non-holder entries then dm_cell_release_singleton must not be used
because those entries would need to be deferred.

Consequently, it is safe to replace use of dm_cell_release_singleton
with cell_defer_except.

This patch is a pre-requisite for "dm thin: fix race between
simultaneous io and discards to same block".
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b7ca9c92

13 10月, 2012 3 次提交

dm thin: move bio_prison code to separate module · 4f81a417

由 Mike Snitzer 提交于 10月 12, 2012

The bio prison code will be useful to other future DM targets so
move it to a separate module.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4f81a417

dm thin: prepare to separate bio_prison code · 44feb387

由 Mike Snitzer 提交于 10月 12, 2012

The bio prison code will be useful to share with future DM targets.

Prepare to move this code into a separate module, adding a dm prefix
to structures and functions that will be exported.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

44feb387

dm thin: support discard with non power of two block size · 28eed34e

由 Mike Snitzer 提交于 10月 12, 2012

Support discards when the pool's block size is not a power of 2.
The block layer assumes discard_granularity is a power of 2 (in
blkdev_issue_discard), so we set this to the largest power of 2 that is
a divides into the number of sectors in each block, but never less than
DATA_DEV_BLOCK_SIZE_MIN_SECTORS.

This patch eliminates the "Discard support must be disabled when the
block size is not a power of 2" constraint that was imposed in commit
55f2b8bd ("dm thin: support for non power of 2 pool blocksize").  That
commit was incomplete: using a block size that is not a power of 2
shouldn't mean disabling discard support on the device completely.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

28eed34e

27 9月, 2012 3 次提交

dm thin: fix discard support for data devices · 0424caa1

由 Mike Snitzer 提交于 9月 26, 2012

The discard limits that get established for a thin-pool or thin device
may be incompatible with the pool's data device.  Avoid this by checking
the discard limits of the pool's data device.  If an incompatibility is
found then the pool's 'discard passdown' feature is disabled.

Change thin_io_hints to ensure that a thin device always uses the same
queue limits as its pool device.

Introduce requested_pf to track whether or not the table line originally
contained the no_discard_passdown flag and use this directly for table
output.  We prepare the correct setting for discard_passdown directly in
bind_control_target (called from pool_io_hints) and store it in
adjusted_pf rather than waiting until we have access to pool->pf in
pool_preresume.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0424caa1

dm thin: tidy discard support · 9bc142dd

由 Mike Snitzer 提交于 9月 26, 2012

A little thin discard code refactoring to make the next patch (dm thin:
fix discard support for data devices) more readable.
Pull out a couple of functions (and uses bools instead of unsigned for
features).

No functional changes.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

9bc142dd

dm thin: do not set discard_zeroes_data · 307615a2

由 Mike Snitzer 提交于 9月 26, 2012

The dm thin pool target claims to support the zeroing of discarded
data areas.  This turns out to be incorrect when processing discards
that do not exactly cover a complete number of blocks, so the target
must always set discard_zeroes_data_unsupported.

The thin pool target will zero blocks when they are allocated if the
skip_block_zeroing feature is not specified.  The block layer
may send a discard that only partly covers a block.  If a thin pool
block is partially discarded then there is no guarantee that the
discarded data will get zeroed before it is accessed again.
Due to this, thin devices cannot claim discards will always zero data.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

307615a2

27 7月, 2012 15 次提交

dm thin: commit before gathering status · 1f4e0ff0

由 Alasdair G Kergon 提交于 7月 27, 2012

Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.
Tested-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1f4e0ff0

dm thin: add read only and fail io modes · e49e5829

由 Joe Thornber 提交于 7月 27, 2012

Add read-only and fail-io modes to thin provisioning.

If a transaction commit fails the pool's metadata device will transition
to "read-only" mode.  If a commit fails once already in read-only mode
the transition to "fail-io" mode occurs.

Once in fail-io mode the pool and all associated thin devices will
report a status of "Fail".
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

e49e5829

dm thin: reduce number of metadata commits · 4afdd680

由 Joe Thornber 提交于 7月 27, 2012

Reduce the number of metadata commits by using
dm_thin_changed_this_transaction to check if metadata was changed on a
per thin device granularity.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4afdd680

dm thin metadata: add format option to dm_pool_metadata_open · 66b1edc0

由 Joe Thornber 提交于 7月 27, 2012

Add a parameter to dm_pool_metadata_open to indicate whether or not an
unformatted metadata area should be formatted.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

66b1edc0

dm: use bool bitfields in struct dm_target · 0ac55489

由 Alasdair G Kergon 提交于 7月 27, 2012

Use boolean bit fields for flags in struct dm_target.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0ac55489

dm thin: set flush_supported · 16ad3d10

由 Joe Thornber 提交于 7月 27, 2012

The thin provisioning target commits internal metadata on flush.  So it
should receive flushes regardless of whether the underlying devices
support them.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

16ad3d10

dm thin: avoid unnecessarily breaking sharing for flushes · 60049701

由 Joe Thornber 提交于 7月 27, 2012

There's no need to break sharing, triggering a copy, for a write that has no
data (i.e. a flush).
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

60049701

dm thin: fix memory leak in process_prepared_mapping error paths · 905386f8

由 Joe Thornber 提交于 7月 27, 2012

Fix memory leak in process_prepared_mapping by always freeing
the dm_thin_new_mapping structs from the mapping_pool mempool on
the error paths.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

905386f8

dm thin: optimize power of two block size · f9a8e0cd

由 Mikulas Patocka 提交于 7月 27, 2012

dm-thin will be most likely used with a block size that is a power of
two. So it should be optimized for this case.

This patch changes division and modulo operations to shifts and bit
masks if block size is a power of two.

A test that bi_sector is divisible by a block size is removed from
io_overlaps_block. Device mapper never sends bios that span a block
boundary. Consequently, if we tested that bi_size is equivalent to block
size, bi_sector must already be on a block boundary.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f9a8e0cd

dm thin: split discards on block boundary · 49296309

由 Mikulas Patocka 提交于 7月 27, 2012

This patch sets the variable "ti->split_discard_requests" for the dm thin
target so that device mapper core splits discard requests on a block
boundary.

Consequently, a discard request that spans multiple blocks is never sent
to dm-thin. The patch also removes some code in process_discard that
deals with discards that span multiple blocks.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

49296309

dm thin: support for non power of 2 pool blocksize · 55f2b8bd

由 Mike Snitzer 提交于 7月 27, 2012

Non power of 2 blocksize support is needed to properly align thinp IO
on storage that has non power of 2 optimal IO sizes (e.g. RAID6 10+2).

Use sector_div to support non power of 2 blocksize for the pool's
data device.  This provides comparable performance to the power of 2
math that was performed until now (as tested on modern x86_64 hardware).

The kernel currently assumes that limits->discard_granularity is a power
of two so the thin target only enables discard support if the block
size is a power of two.

Eliminate pool structure's 'block_shift', 'offset_mask' and
remaining 4 byte holes.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

55f2b8bd

dm: support non power of two target max_io_len · 542f9038

由 Mike Snitzer 提交于 7月 27, 2012

Remove the restriction that limits a target's specified maximum incoming
I/O size to be a power of 2.

Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'.
Change it from sector_t to uint32_t, which is plenty big enough, and
introduce a wrapper function dm_set_target_max_io_len() to set it.
Use sector_div() to process it now that it is not necessarily a power of 2.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

542f9038

dm thin: provide specific errors for two table load failure cases · f09996c9

由 Mike Snitzer 提交于 7月 27, 2012

Provide specific error message strings for two pool_ctr() failure cases
that currently give just "Unknown error".

Reference: test_two_pools_pointing_to_the_same_metadata_fails and
test_different_pool_cant_replace_pool in thinp-test-suite.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f09996c9

dm thin: clean up compiler warning · 17b7d63f

由 Mike Snitzer 提交于 7月 27, 2012

Clean up "warning: dubious: !x & y".  Also make it clear that
__snapshotted_since() returns a bool and that dm_thin_lookup_result's
'shared' member is a flag.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

17b7d63f

dm thin: reduce endio_hook pool size · 7768ed33

由 Alasdair G Kergon 提交于 7月 27, 2012

Reduce the slab size used for the dm_thin_endio_hook mempool.

Allocation has been seen to fail on machines with smaller amounts
of memory due to fragmentation.

  lvm: page allocation failure. order:5, mode:0xd0
  device-mapper: table: 253:38: thin-pool: Error creating pool's endio_hook mempool

Cc: stable@vger.kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

7768ed33

20 7月, 2012 1 次提交

dm thin: do not send discards to shared blocks · 650d2a06

由 Mikulas Patocka 提交于 7月 20, 2012

When process_discard receives a partial discard that doesn't cover a
full block, it sends this discard down to that block. Unfortunately, the
block can be shared and the discard would corrupt the other snapshots
sharing this block.

This patch detects block sharing and ends the discard with success when
sending it to the shared block.

The above change means that if the device supports discard it can't be
guaranteed that a discard request zeroes data. Therefore, we set
ti->discard_zeroes_data_unsupported.

Thin target discard support with this bug arrived in commit
104655fd (dm thin: support discards).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

650d2a06

03 7月, 2012 1 次提交

dm thin: commit metadata before creating metadata snapshot · 0d200aef

由 Joe Thornber 提交于 7月 03, 2012

Userland sometimes sees a corrupt metadata block if metadata is changing
rapidly when a metadata snapshot is reserved for userland,  To make the
problem go away, commit before we take the metadata snapshot (which is a
sensible thing to do anyway).

The checksums mean userland spots this corruption immediately so there's
no risk of acting on incorrect data.  No corruption exists from the
kernel's point of view, and thin_check passes after pool shutdown.

I believe this is to do with shared blocks at the first level of the
{device, mapping} btree.  Prior to the metadata-snap support no sharing
at this level was possible, so this patch is only required after commit
cc8394d8 ("dm thin: provide userspace
access to pool metadata").
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0d200aef

03 6月, 2012 2 次提交

dm thin: provide userspace access to pool metadata · cc8394d8

由 Joe Thornber 提交于 6月 03, 2012

This patch implements two new messages that can be sent to the thin
pool target allowing it to take a snapshot of the _metadata_.  This,
read-only snapshot can be accessed by userland, concurrently with the
live target.

Only one metadata snapshot can be held at a time.  The pool's status
line will give the block location for the current msnap.

Since version 0.1.5 of the userland thin provisioning tools, the
thin_dump program displays the msnap as follows:

    thin_dump -m <msnap root> <metadata dev>

Available here: https://github.com/jthornber/thin-provisioning-tools

Now that userland can access the metadata we can do various things
that have traditionally been kernel side tasks:

     i) Incremental backups.

     By using metadata snapshots we can work out what blocks have
     changed over time.  Combined with data snapshots we can ensure
     the data doesn't change while we back it up.

     A short proof of concept script can be found here:

     https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb

     ii) Migration of thin devices from one pool to another.

     iii) Merging snapshots back into an external origin.

     iv) Asyncronous replication.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

cc8394d8

dm thin: use slab mempools · a24c2569

由 Mike Snitzer 提交于 6月 03, 2012

Use dedicated caches prefixed with a "dm_" name rather than relying on
kmalloc mempools backed by generic slab caches so the memory usage of
thin provisioning (and any leaks) can be accounted for independently.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a24c2569

19 5月, 2012 1 次提交

dm thin: fix table output when pool target disables discard passdown internally · f402693d

由 Mike Snitzer 提交于 5月 19, 2012

When the thin pool target clears the discard_passdown parameter
internally, it incorrectly changes the table line reported to userspace.
This breaks dumb string comparisons on these table lines in generic
userspace device-mapper library code and leads to tables being reloaded
repeatedly when nothing is actually meant to be changing.

This patch corrects this by no longer changing the table line when
discard passdown was disabled.

We can still tell when discard passdown is overridden by looking for the
message "Discard unsupported by data device (sdX): Disabling discard passdown."

This automatic detection is also moved from the 'load' to the 'resume'
so that it is re-evaluated should the properties of underlying devices
change.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f402693d

12 5月, 2012 3 次提交

dm thin: correct module description · 7cab8bf1

由 Alasdair G Kergon 提交于 5月 12, 2012

Remove duplicate copy of string "device-mapper" (DM_NAME) from
MODULE_DESCRIPTION.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

7cab8bf1

dm thin: fix unprotected use of prepared_discards list · c3a0ce2e

由 Mike Snitzer 提交于 5月 12, 2012

Fix two places in commit 104655fd ("dm thin: support discards") that
didn't use pool->lock to protect against concurrent changes to the
prepared_discards list.

Without this fix, thin_endio() can race with process_discard(), leading
to concurrent list_add()s that result in the processes locking up with
an error like the following:

WARNING: at lib/list_debug.c:32 __list_add+0x8f/0xa0()
...
list_add corruption. next->prev should be prev (ffff880323b96140), but was ffff8801d2c48440. (next=ffff8801d2c485c0).
...
Pid: 17205, comm: kworker/u:1 Tainted: G        W  O 3.4.0-rc3.snitm+ #1
Call Trace:
 [<ffffffff8103ca1f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8103cb16>] warn_slowpath_fmt+0x46/0x50
 [<ffffffffa04f6ce6>] ? bio_detain+0xc6/0x210 [dm_thin_pool]
 [<ffffffff8124ff3f>] __list_add+0x8f/0xa0
 [<ffffffffa04f70d2>] process_discard+0x2a2/0x2d0 [dm_thin_pool]
 [<ffffffffa04f6a78>] ? remap_and_issue+0x38/0x50 [dm_thin_pool]
 [<ffffffffa04f7c3b>] process_deferred_bios+0x7b/0x230 [dm_thin_pool]
 [<ffffffffa04f7df0>] ? process_deferred_bios+0x230/0x230 [dm_thin_pool]
 [<ffffffffa04f7e42>] do_worker+0x52/0x60 [dm_thin_pool]
 [<ffffffff81056fa9>] process_one_work+0x129/0x450
 [<ffffffff81059b9c>] worker_thread+0x17c/0x3c0
 [<ffffffff81059a20>] ? manage_workers+0x120/0x120
 [<ffffffff8105eabe>] kthread+0x9e/0xb0
 [<ffffffff814ceda4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8105ea20>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff814ceda0>] ? gs_change+0x13/0x13
---[ end trace 7e0a523bc5e52692 ]---
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c3a0ce2e

dm thin: reinstate missing mempool_free in cell_release_singleton · 03aaae7c

由 Mike Snitzer 提交于 5月 12, 2012

Fix a significant memory leak inadvertently introduced during
simplification of cell_release_singleton() in commit
6f94a4c4 ("dm thin: fix stacked bi_next
usage").

A cell's hlist_del() must be accompanied by a mempool_free().
Use __cell_release() to do this, like before.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

03aaae7c

29 3月, 2012 9 次提交

dm thin: add pool target flags to control discard · 67e2e2b2

由 Joe Thornber 提交于 3月 28, 2012

Add dm thin target arguments to control discard support.

ignore_discard: Disables discard support

no_discard_passdown: Don't pass discards down to the underlying data
device, but just remove the mapping within the thin provisioning target.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

67e2e2b2

dm thin: support discards · 104655fd

由 Joe Thornber 提交于 3月 28, 2012

Support discards in the thin target.

On discard the corresponding mapping(s) are removed from the thin
device.  If the associated block(s) are no longer shared the discard
is passed to the underlying device.

All bios other than discards now have an associated deferred_entry
that is saved to the 'all_io_entry' in endio_hook.  When non-discard
IO completes and associated mappings are quiesced any discards that
were deferred, via ds_add_work() in process_discard(), will be queued
for processing by the worker thread.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

drivers/md/dm-thin.c |  173 ++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/md/dm-thin.c |  172 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 158 insertions(+), 14 deletions(-)

104655fd

dm thin: prepare to support discard · eb2aa48d

由 Joe Thornber 提交于 3月 28, 2012

This patch contains the ground work needed for dm-thin to support discard.

  - Adds endio function that replaces shared_read_endio.

  - Introduce an explicit 'quiesced' flag into the new_mapping structure.
    Before, this was implicitly indicated by m->list being empty.

  - The map_info->ptr remains constant for the duration of a bio's trip
    through the thin target.  Make it easier to reason about it.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

eb2aa48d

dm thin: use dm_target_offset · 6efd6e83

由 Alasdair G Kergon 提交于 3月 28, 2012

Use dm_target_offset wrapper instead of referencing the awkward ti->begin
explicitly.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6efd6e83

dm thin: support read only external snapshot origins · 2dd9c257

由 Joe Thornber 提交于 3月 28, 2012

Support the use of an external _read only_ device as an origin for a thin
device.

Any read to an unprovisioned area of the thin device will be passed
through to the origin.  Writes trigger allocation of new blocks as
usual.

One possible use case for this would be VM hosts that want to run
guests on thinly-provisioned volumes but have the base image on another
device (possibly shared between many VMs).
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2dd9c257

dm thin: relax hard limit on the maximum size of a metadata device · c4a69ecd

由 Mike Snitzer 提交于 3月 28, 2012

The thin metadata format can only make use of a device that is <=
THIN_METADATA_MAX_SECTORS (currently 15.9375 GB).  Therefore, there is no
practical benefit to using a larger device.

However, it may be that other factors impose a certain granularity for
the space that is allocated to a device (E.g. lvm2 can impose a coarse
granularity through the use of large, >= 1 GB, physical extents).

Rather than reject a larger metadata device, during thin-pool device
construction, switch to allowing it but issue a warning if a device
larger than THIN_METADATA_MAX_SECTORS_WARNING (16 GB) is
provided.  Any space over 15.9375 GB will not be used.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c4a69ecd

dm thin: commit outstanding data every second · 905e51b3

由 Joe Thornber 提交于 3月 28, 2012

Commit unwritten data every second to prevent too much building up.

Released blocks don't become available until after the next commit
(for crash resilience).  Prior to this patch commits were only
triggered by a message to the target or a REQ_{FLUSH,FUA} bio.  This
allowed far too big a position to build up.

The interval is hard-coded to 1 second.  This is a sensible setting.
I'm not making this user configurable, since there isn't much to be
gained by tweaking this - and a lot lost by setting it far too high.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

905e51b3

dm thin: correct comments · fe878f34

由 Joe Thornber 提交于 3月 28, 2012

Remove documentation for unimplemented 'trim' message.

I'd planned a 'trim' target message for shrinking thin devices, but
this is better handled via the discard ioctl.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fe878f34

dm thin: fix stacked bi_next usage · 6f94a4c4

由 Joe Thornber 提交于 3月 28, 2012

Avoid using the bi_next field for the holder of a cell when deferring
bios because a stacked device below might change it.  Store the
holder in a new field in struct cell instead.

When a cell is created, the bio that triggered creation (the holder) was
added to the same bio list as subsequent bios.  In some cases we pass
this holder bio directly to devices underneath.  If those devices use
the bi_next field there will be trouble...

This also simplifies some code that had to work out which bio was the
holder.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6f94a4c4

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功