提交 · 1e32134a5a404e80bfb47fad8a94e9bbfcbdacc5 · openeuler / Kernel

02 12月, 2014 7 次提交

dm cache: dirty flag was mistakenly being cleared when promoting via overwrite · 1e32134a

由 Joe Thornber 提交于 11月 27, 2014

If the incoming bio is a WRITE and completely covers a block then we
don't bother to do any copying for a promotion operation.  Once this is
done the cache block and origin block will be different, so we need to
set it to 'dirty'.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

1e32134a

dm cache: only use overwrite optimisation for promotion when in writeback mode · f29a3147

由 Joe Thornber 提交于 11月 27, 2014

Overwrite causes the cache block and origin blocks to diverge, which
is only allowed in writeback mode.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

f29a3147

dm cache: discard block size must be a multiple of cache block size · 2bb812df

由 Joe Thornber 提交于 11月 26, 2014

Otherwise the cache blocks may span two discard blocks, which we don't
handle when doing the discard lookup.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2bb812df

dm cache: fix a harmless race when working out if a block is discarded · 43c32bf2

由 Joe Thornber 提交于 11月 25, 2014

It is more correct to hold the cell before checking the discard state.
These flags are only used as hints to the policy so this change will
have negligable effect.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

43c32bf2

dm cache: when reloading a discard bitset allow for a different discard block size · 3e2e1c30

由 Joe Thornber 提交于 11月 24, 2014

The discard block size can change if the origin changes size or if an
old DM cache is upgraded from using a discard block size that was equal
to cache block size.

To fix this an extent of discarded blocks is established for the purpose
of translating the old discard block size to the new in-core discard
block size and set bits.  The old (potentially huge) discard bitset is
left ondisk until it is re-written using the new in-core information on
the next successful DM cache shutdown.

Fixes: 7ae34e77 ("dm cache: improve discard support")
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3e2e1c30

dm cache: fix some issues with the new discard range support · 2572629a

由 Joe Thornber 提交于 11月 24, 2014

Commit 7ae34e77 ("dm cache: improve discard support") needed to also:
- discontinue having DM core split the discard bios on cache block
  boundaries
- calculate the cache's discard_nr_blocks relative to the determined
  discard_block_size rather than using oblock_to_dblock()
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2572629a

dm array: if resizing the array is a noop set the new root to the old one · 8001e87d

由 Joe Thornber 提交于 11月 24, 2014

This could've been quite bad (to return success but not update the new
root to point at the old) but in practice the only known consumer of the
dm array code is the DM cache target.  And the DM cache target passes in
the same old root to array_resize() anyway.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8001e87d

24 11月, 2014 1 次提交

dm: use rcu_dereference_protected instead of rcu_dereference · a12f5d48

由 Eric Dumazet 提交于 11月 23, 2014

rcu_dereference() should be used in sections protected by rcu_read_lock.

For writers, holding some kind of mutex or lock,
rcu_dereference_protected() is the way to go, adding explicit lockdep
bits.

In __unbind(), we are the last user of this mapped device, so can use
the constant '1' instead of a lockdep_is_held(), not consistent with
other uses of rcu_dereference_protected() which use md->suspend_lock
mutex.
Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 33423974 ("dm: Use rcu_dereference() for accessing rcu pointer")
Cc: Pranith Kumar <bobby.prani@gmail.com>
[snitzer: allow lines longer than 80 columns, refine subject]
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a12f5d48

22 11月, 2014 1 次提交

dm thin: fix pool_io_hints to avoid looking at max_hw_sectors · d200c30e

由 Mike Snitzer 提交于 11月 20, 2014

Simplify the pool_io_hints code that works to establish a max_sectors
value that is a power-of-2 factor of the thin-pool's blocksize. The
biggest associated improvement is that the DM thin-pool is no longer
concerning itself with the data device's max_hw_sectors when adjusting
max_sectors.

This fixes the relative fragility of the original "dm thin: adjust
max_sectors_kb based on thinp blocksize" commit that only became
apparent when testing was performed using a DM thin-pool ontop of a
virtio_blk device. One proposed upstream patch detailed the problems
inherent in virtio_blk: https://lkml.org/lkml/2014/11/20/611

So even though virtio_blk incorrectly set its max_hw_sectors it actually
helped make it clear that we need DM thinp to be tolerant of any future
Linux driver that incorrectly sets max_hw_sectors.

We only need to be concerned with modifying the thin-pool device's
max_sectors limit if it is smaller than the thin-pool's blocksize. In
this case the value of max_sectors does become a limiting factor when
upper layers (e.g. filesystems) construct their bios. But if the
hardware can support IOs larger than the thin-pool's blocksize the user
is encouraged to adjust the thin-pool's data device's max_sectors
accordingly -- doing so will enable the thin-pool to inherit the
established user-defined max_sectors.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d200c30e

20 11月, 2014 5 次提交

dm thin: suspend/resume active thin devices when reloading thin-pool · 583024d2

由 Mike Snitzer 提交于 10月 28, 2014

Before this change it was expected that userspace would first suspend
all active thin devices, reload/resize the thin-pool target, then resume
all active thin devices.  Now the thin-pool suspend/resume will trigger
the suspend/resume of all active thins via appropriate calls to
dm_internal_suspend and dm_internal_resume.

Store the mapped_device for each thin device in struct thin_c to make
these calls possible.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

583024d2

dm: enhance internal suspend and resume interface · ffcc3936

由 Mike Snitzer 提交于 10月 28, 2014

Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast
-- dm-stats will continue using these methods to avoid all the extra
suspend/resume logic that is not needed in order to quickly flush IO.

Introduce dm_internal_suspend_noflush() variant that actually calls the
mapped_device's target callbacks -- otherwise target-specific hooks are
avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend). Common
code between dm_internal_{suspend_noflush,resume} and
dm_{suspend,resume} was factored out as __dm_{suspend,resume}.

Update dm_internal_{suspend_noflush,resume} to always take and release
the mapped_device's suspend_lock. Also update dm_{suspend,resume} to be
aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond
accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to
be cleared. Add lockdep annotation to dm_suspend() and dm_resume().

The existing DM_SUSPEND_FLAG remains unchanged.
DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and
cleared by dm_internal_resume().

Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device
was already suspended when dm_internal_suspend_noflush() was called --
this can be thought of as a "nested suspend". A "nested suspend" can
occur with legacy userspace dm-thin code that might suspend all active
thin volumes before suspending the pool for resize.

But otherwise, in the normal dm-thin-pool suspend case moving forward:
the thin-pool will have DM_SUSPEND_FLAG set and all active thins from
that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set.

Also add DM_INTERNAL_SUSPEND_FLAG to status report. This new
DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with
debugging (e.g. 'dmsetup info' will report an internally suspended
device accordingly).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

ffcc3936

dm thin: do not allow thin device activation while pool is suspended · 80e96c54

由 Mike Snitzer 提交于 11月 07, 2014

Otherwise IO could be issued to the pool while it is suspended.

Care was taken to properly interlock between the thin and thin-pool
targets when accessing the pool's 'suspended' flag.  The thin_ctr will
not add a new thin device to the pool's active_thins list if the pool is
susepended.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

80e96c54

dm: add presuspend_undo hook to target_type · d67ee213

由 Mike Snitzer 提交于 10月 28, 2014

The DM thin-pool target now must undo the changes performed during
pool_presuspend() so introduce presuspend_undo hook in target_type.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

d67ee213

dm: return earlier from dm_blk_ioctl if target doesn't implement .ioctl · 4d341d82

由 Mike Snitzer 提交于 11月 16, 2014

No point checking if the device is suspended if the current target
doesn't even implement .ioctl
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4d341d82

13 11月, 2014 3 次提交

M
dm thin: remove stale 'trim' message in block comment above pool_message · 5ec02084
由 Mike Snitzer 提交于 11月 07, 2014
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
5ec02084

dm thin: fix a race in thin_dtr · 17181fb7

由 Mikulas Patocka 提交于 11月 05, 2014

As long as struct thin_c is in the list, anyone can grab a reference of
it.  Consequently, we must wait for the reference count to drop to zero
*after* we remove the structure from the list, not before.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

17181fb7

dm cache: emit a warning message if there are a lot of cache blocks · d1d9220c

由 Joe Thornber 提交于 11月 11, 2014

Loading and saving millions of block mappings takes time.  We may as
well explain what's going on, and encourage people to use a larger
cache block size.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d1d9220c

11 11月, 2014 23 次提交

dm cache: improve discard support · 7ae34e77

由 Joe Thornber 提交于 11月 06, 2014

Safely allow the discard blocksize to be larger than the cache blocksize
by using the bio prison's range locking support.  This also improves
discard performance considerly because larger discards are issued to the
dm-cache device.  The discard blocksize was always intended to be
greater than the cache blocksize.  But until now it wasn't implemented
safely.

Also, by safely restoring the ability to have discard blocksize larger
than cache blocksize we're able to significantly reduce the memory used
for the cache's discard bitset.  Before, with a small discard blocksize,
the discard bitset could get quite large because its size is a function
of the discard blocksize and the origin device's size.  For example,
previously, using a 32KB cache blocksize with a 40TB origin resulted in
1280MB of incore memory use for the discard bitset!  Now, the discard
blocksize is scaled up accordingly to ensure the discard bitset is
capped at 2**14 bits, or 16KB.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7ae34e77

dm cache: revert "prevent corruption caused by discard_block_size > cache_block_size" · 08b18451

由 Joe Thornber 提交于 11月 06, 2014

This reverts commit d132cc6d because we
actually do want to allow the discard blocksize to be larger than the
cache blocksize.  Further dm-cache discard changes will make this
possible.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

08b18451

dm cache: revert "remove remainder of distinct discard block size" · 1bad9bc4

由 Joe Thornber 提交于 11月 07, 2014

This reverts commit 64ab346a because we
actually do want to allow the discard blocksize to be larger than the
cache blocksize.  Further dm-cache discard changes will make this
possible.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1bad9bc4

dm bio prison: introduce support for locking ranges of blocks · 5f274d88

由 Joe Thornber 提交于 9月 17, 2014

Ranges will be placed in the same cell if they overlap.

Range locking is a prerequisite for more efficient multi-block discard
support in both the cache and thin-provisioning targets.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5f274d88

dm cache policy mq: simplify ability to promote sequential IO to the cache · f1afb36a

由 Mike Snitzer 提交于 10月 30, 2014

Before, if the user wanted sequential IO to be promoted to the cache
they'd have to set sequential_threshold to some nebulous large value.

Now, the user may easily disable sequential IO detection (and sequential
IO's implicit bypass of the cache) by setting sequential_threshold to 0.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f1afb36a

dm cache policy mq: tweak algorithm that decides when to promote a block · b155aa0e

由 Joe Thornber 提交于 10月 22, 2014

Rather than maintaining a separate promote_threshold variable that we
periodically update we now use the hit count of the oldest clean
block.  Also add a fudge factor to discourage demoting dirty blocks.

With some tests this has a sizeable difference, because the old code
was too eager to demote blocks.  For example, device-mapper-test-suite's
git_extract_cache_quick test goes from taking 190 seconds, to 142
(linear on spindle takes 250).
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b155aa0e

dm: do not call dm_sync_table() when creating new devices · 41abc4e1

由 Hannes Reinecke 提交于 11月 05, 2014

When creating new devices dm_sync_table() calls
synchronize_rcu_expedited(), causing _all_ pending RCU pointers to be
flushed. This causes a latency overhead that is especially noticeable
when creating lots of devices.

And all of this is pointless as there are no old maps to be
disconnected, and hence no stale pointers which would need to be
cleared up.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

41abc4e1

dm: sparse: Annotate field with __rcu for checking · 6fa99520

由 Pranith Kumar 提交于 10月 28, 2014

Annotate the map field with __rcu since this is a rcu pointer which is checked
by sparse.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6fa99520

dm: Use rcu_dereference() for accessing rcu pointer · 33423974

由 Pranith Kumar 提交于 10月 28, 2014

The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference()
while accessing it.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

33423974

M
dm thin: refactor requeue_io to eliminate spinlock bouncing · 42d6a8ce
由 Mike Snitzer 提交于 10月 19, 2014
```
Also refactor some other bio_list erroring helpers.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
42d6a8ce

dm thin: optimize retry_bios_on_resume · 9d094eeb

由 Mike Snitzer 提交于 10月 19, 2014

Eliminate redundant should_error_unserviceable_bio check and error
loop.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9d094eeb

dm thin: sort the deferred cells · ac4c3f34

由 Joe Thornber 提交于 10月 10, 2014

Sort the cells in logical block order before processing each cell in
process_thin_deferred_cells().  This significantly improves the ondisk
layout on rotational storage, whereby improving read performance.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ac4c3f34

dm thin: direct dispatch when breaking sharing · 23ca2bb6

由 Joe Thornber 提交于 10月 15, 2014

This use of direct submission in process_shared_bio() reduces latency
for submitting bios in the shared cell by avoiding adding those bios to
the deferred list and waiting for the next iteration of the worker.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

23ca2bb6

dm thin: remap the bios in a cell immediately · 2d759a46

由 Joe Thornber 提交于 10月 10, 2014

This use of direct submission in process_prepared_mapping() reduces
latency for submitting bios in a cell by avoiding adding those bios to
the deferred list and waiting for the next iteration of the worker.

But this direct submission exposes the potential for a race between
releasing a cell and incrementing deferred set.  Fix this by introducing
dm_cell_visit_release() and refactoring inc_remap_and_issue_cell()
accordingly.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2d759a46

dm thin: defer whole cells rather than individual bios · a374bb21

由 Joe Thornber 提交于 10月 10, 2014

This avoids dropping the cell, so increases the probability that other
bios will collect within the cell, rather than being passed individually
to the worker.

Also add required process_cell and process_discard_cell error handling
wrappers and set associated pool-mode function pointers accordingly.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a374bb21

dm thin: factor out remap_and_issue_overwrite · 452d7a62

由 Mike Snitzer 提交于 10月 09, 2014

Purely cleanup of duplicated code, no functional change.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

452d7a62

dm thin: performance improvement to discard processing · 7a7e97ca

由 Joe Thornber 提交于 9月 12, 2014

When processing a discard bio, if the block is already quiesced do the
discard immediately rather than adding the mapping to a list for the
next iteration of the worker thread.

Discarding a fully provisioned 100G thin volume with 64k block size goes
from 860s to 95s with this change.

Clearly there's something wrong with the worker architecture, more
investigation needed.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7a7e97ca

dm thin: implement thin_merge · 36f12aeb

由 Mike Snitzer 提交于 10月 09, 2014

Introduce thin_merge so that any additional constraints from the data
volume may be taken into account when determing the maximum number of
sectors that can be issued relative to the specified logical offset.

This is particularly important if/when the data volume is layered ontop
of a more sophisticated device (e.g. dm-raid or some other DM target).
Reviewed-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

36f12aeb

dm: improve documentation and code clarity in dm_merge_bvec · 148e51ba

由 Mike Snitzer 提交于 10月 09, 2014

These code changes do not introduce a functional change.

But bio_add_page() will never attempt to build up a bio larger than
queue_max_sectors(). Similarly, bio_get_nr_vecs() is also bound by
queue_max_sectors(). Therefore, there is no point in allowing
dm_merge_bvec() to answer "how many sectors can a bio have at this
offset?" with anything larger than queue_max_sectors(). Using
queue_max_sectors() rather than BIO_MAX_SECTORS serves to more
accurately convey the limits that are being imposed.

Also, use unlikely() to clarify the fact that the defensive code in
dm_merge_bvec() relative to max_size going negative shouldn't ever
happen -- if it does happen there is a bug in the block layer for
requesting larger than dm_merge_bvec()'s initial response for a given
offset. Also, update a comment in dm_merge_bvec() relative to
max_hw_sectors_kb. And fix empty newline whitespace.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

148e51ba

dm thin: adjust max_sectors_kb based on thinp blocksize · 604ea906

由 Mike Snitzer 提交于 10月 09, 2014

Allows for filesystems to submit bios that are a factor of the thinp
blocksize, improving dm-thinp efficiency (particularly when the data
volume is RAID).

Also set io_min to max_sectors_kb if it is a factor of the thinp
blocksize.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

604ea906

dm thin: throttle incoming IO · 7d327fe0

由 Joe Thornber 提交于 10月 06, 2014

Throttle IO based on the time it's taking the worker to do one loop.
There were reports of hung task timeouts occuring and it was observed
that the excessively long avgqu-sz (as reported by iostat) was
contributing to these hung tasks.

Throttling definitely helps dm-thinp perform better under heavy IO load
(without being detremental by being overzealous).  It reduces avgqu-sz
drastically, e.g.: from 60K to ~6K, and even as low as 150 once metadata
is cached by bufio, when dirty_ratio=5, dirty_background_ratio=2.  And
avgqu-sz stays at or below 30K even with dirty_ratio=20,
dirty_background_ratio=10.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7d327fe0

dm thin: prefetch missing metadata pages · 8a01a6af

由 Joe Thornber 提交于 10月 06, 2014

Prefetch metadata at the start of the worker thread and then again every
128th bio processed from the deferred list.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8a01a6af

dm transaction manager: add support for prefetching blocks of metadata · 4646015d

由 Joe Thornber 提交于 10月 06, 2014

Introduce the dm_tm_issue_prefetches interface.  If you're using a
non-blocking clone the tm will build up a list of requested blocks that
weren't in core.  dm_tm_issue_prefetches will request those blocks to be
prefetched.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4646015d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功