- 02 12月, 2014 7 次提交
-
-
由 Joe Thornber 提交于
If the incoming bio is a WRITE and completely covers a block then we don't bother to do any copying for a promotion operation. Once this is done the cache block and origin block will be different, so we need to set it to 'dirty'. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
由 Joe Thornber 提交于
Overwrite causes the cache block and origin blocks to diverge, which is only allowed in writeback mode. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
由 Joe Thornber 提交于
Otherwise the cache blocks may span two discard blocks, which we don't handle when doing the discard lookup. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
It is more correct to hold the cell before checking the discard state. These flags are only used as hints to the policy so this change will have negligable effect. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
The discard block size can change if the origin changes size or if an old DM cache is upgraded from using a discard block size that was equal to cache block size. To fix this an extent of discarded blocks is established for the purpose of translating the old discard block size to the new in-core discard block size and set bits. The old (potentially huge) discard bitset is left ondisk until it is re-written using the new in-core information on the next successful DM cache shutdown. Fixes: 7ae34e77 ("dm cache: improve discard support") Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Commit 7ae34e77 ("dm cache: improve discard support") needed to also: - discontinue having DM core split the discard bios on cache block boundaries - calculate the cache's discard_nr_blocks relative to the determined discard_block_size rather than using oblock_to_dblock() Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
This could've been quite bad (to return success but not update the new root to point at the old) but in practice the only known consumer of the dm array code is the DM cache target. And the DM cache target passes in the same old root to array_resize() anyway. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 24 11月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
rcu_dereference() should be used in sections protected by rcu_read_lock. For writers, holding some kind of mutex or lock, rcu_dereference_protected() is the way to go, adding explicit lockdep bits. In __unbind(), we are the last user of this mapped device, so can use the constant '1' instead of a lockdep_is_held(), not consistent with other uses of rcu_dereference_protected() which use md->suspend_lock mutex. Reported-by: NKirill A. Shutemov <kirill@shutemov.name> Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 33423974 ("dm: Use rcu_dereference() for accessing rcu pointer") Cc: Pranith Kumar <bobby.prani@gmail.com> [snitzer: allow lines longer than 80 columns, refine subject] Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 22 11月, 2014 1 次提交
-
-
由 Mike Snitzer 提交于
Simplify the pool_io_hints code that works to establish a max_sectors value that is a power-of-2 factor of the thin-pool's blocksize. The biggest associated improvement is that the DM thin-pool is no longer concerning itself with the data device's max_hw_sectors when adjusting max_sectors. This fixes the relative fragility of the original "dm thin: adjust max_sectors_kb based on thinp blocksize" commit that only became apparent when testing was performed using a DM thin-pool ontop of a virtio_blk device. One proposed upstream patch detailed the problems inherent in virtio_blk: https://lkml.org/lkml/2014/11/20/611 So even though virtio_blk incorrectly set its max_hw_sectors it actually helped make it clear that we need DM thinp to be tolerant of any future Linux driver that incorrectly sets max_hw_sectors. We only need to be concerned with modifying the thin-pool device's max_sectors limit if it is smaller than the thin-pool's blocksize. In this case the value of max_sectors does become a limiting factor when upper layers (e.g. filesystems) construct their bios. But if the hardware can support IOs larger than the thin-pool's blocksize the user is encouraged to adjust the thin-pool's data device's max_sectors accordingly -- doing so will enable the thin-pool to inherit the established user-defined max_sectors. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 20 11月, 2014 5 次提交
-
-
由 Mike Snitzer 提交于
Before this change it was expected that userspace would first suspend all active thin devices, reload/resize the thin-pool target, then resume all active thin devices. Now the thin-pool suspend/resume will trigger the suspend/resume of all active thins via appropriate calls to dm_internal_suspend and dm_internal_resume. Store the mapped_device for each thin device in struct thin_c to make these calls possible. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com>
-
由 Mike Snitzer 提交于
Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast -- dm-stats will continue using these methods to avoid all the extra suspend/resume logic that is not needed in order to quickly flush IO. Introduce dm_internal_suspend_noflush() variant that actually calls the mapped_device's target callbacks -- otherwise target-specific hooks are avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend). Common code between dm_internal_{suspend_noflush,resume} and dm_{suspend,resume} was factored out as __dm_{suspend,resume}. Update dm_internal_{suspend_noflush,resume} to always take and release the mapped_device's suspend_lock. Also update dm_{suspend,resume} to be aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to be cleared. Add lockdep annotation to dm_suspend() and dm_resume(). The existing DM_SUSPEND_FLAG remains unchanged. DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and cleared by dm_internal_resume(). Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device was already suspended when dm_internal_suspend_noflush() was called -- this can be thought of as a "nested suspend". A "nested suspend" can occur with legacy userspace dm-thin code that might suspend all active thin volumes before suspending the pool for resize. But otherwise, in the normal dm-thin-pool suspend case moving forward: the thin-pool will have DM_SUSPEND_FLAG set and all active thins from that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set. Also add DM_INTERNAL_SUSPEND_FLAG to status report. This new DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with debugging (e.g. 'dmsetup info' will report an internally suspended device accordingly). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com>
-
由 Mike Snitzer 提交于
Otherwise IO could be issued to the pool while it is suspended. Care was taken to properly interlock between the thin and thin-pool targets when accessing the pool's 'suspended' flag. The thin_ctr will not add a new thin device to the pool's active_thins list if the pool is susepended. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com>
-
由 Mike Snitzer 提交于
The DM thin-pool target now must undo the changes performed during pool_presuspend() so introduce presuspend_undo hook in target_type. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com>
-
由 Mike Snitzer 提交于
No point checking if the device is suspended if the current target doesn't even implement .ioctl Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 13 11月, 2014 3 次提交
-
-
由 Mike Snitzer 提交于
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
As long as struct thin_c is in the list, anyone can grab a reference of it. Consequently, we must wait for the reference count to drop to zero *after* we remove the structure from the list, not before. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Loading and saving millions of block mappings takes time. We may as well explain what's going on, and encourage people to use a larger cache block size. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 11 11月, 2014 23 次提交
-
-
由 Joe Thornber 提交于
Safely allow the discard blocksize to be larger than the cache blocksize by using the bio prison's range locking support. This also improves discard performance considerly because larger discards are issued to the dm-cache device. The discard blocksize was always intended to be greater than the cache blocksize. But until now it wasn't implemented safely. Also, by safely restoring the ability to have discard blocksize larger than cache blocksize we're able to significantly reduce the memory used for the cache's discard bitset. Before, with a small discard blocksize, the discard bitset could get quite large because its size is a function of the discard blocksize and the origin device's size. For example, previously, using a 32KB cache blocksize with a 40TB origin resulted in 1280MB of incore memory use for the discard bitset! Now, the discard blocksize is scaled up accordingly to ensure the discard bitset is capped at 2**14 bits, or 16KB. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
This reverts commit d132cc6d because we actually do want to allow the discard blocksize to be larger than the cache blocksize. Further dm-cache discard changes will make this possible. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
This reverts commit 64ab346a because we actually do want to allow the discard blocksize to be larger than the cache blocksize. Further dm-cache discard changes will make this possible. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Ranges will be placed in the same cell if they overlap. Range locking is a prerequisite for more efficient multi-block discard support in both the cache and thin-provisioning targets. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Before, if the user wanted sequential IO to be promoted to the cache they'd have to set sequential_threshold to some nebulous large value. Now, the user may easily disable sequential IO detection (and sequential IO's implicit bypass of the cache) by setting sequential_threshold to 0. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Rather than maintaining a separate promote_threshold variable that we periodically update we now use the hit count of the oldest clean block. Also add a fudge factor to discourage demoting dirty blocks. With some tests this has a sizeable difference, because the old code was too eager to demote blocks. For example, device-mapper-test-suite's git_extract_cache_quick test goes from taking 190 seconds, to 142 (linear on spindle takes 250). Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Hannes Reinecke 提交于
When creating new devices dm_sync_table() calls synchronize_rcu_expedited(), causing _all_ pending RCU pointers to be flushed. This causes a latency overhead that is especially noticeable when creating lots of devices. And all of this is pointless as there are no old maps to be disconnected, and hence no stale pointers which would need to be cleared up. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Pranith Kumar 提交于
Annotate the map field with __rcu since this is a rcu pointer which is checked by sparse. Signed-off-by: NPranith Kumar <bobby.prani@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Pranith Kumar 提交于
The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference() while accessing it. Signed-off-by: NPranith Kumar <bobby.prani@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Also refactor some other bio_list erroring helpers. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Eliminate redundant should_error_unserviceable_bio check and error loop. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Sort the cells in logical block order before processing each cell in process_thin_deferred_cells(). This significantly improves the ondisk layout on rotational storage, whereby improving read performance. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
This use of direct submission in process_shared_bio() reduces latency for submitting bios in the shared cell by avoiding adding those bios to the deferred list and waiting for the next iteration of the worker. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
This use of direct submission in process_prepared_mapping() reduces latency for submitting bios in a cell by avoiding adding those bios to the deferred list and waiting for the next iteration of the worker. But this direct submission exposes the potential for a race between releasing a cell and incrementing deferred set. Fix this by introducing dm_cell_visit_release() and refactoring inc_remap_and_issue_cell() accordingly. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
This avoids dropping the cell, so increases the probability that other bios will collect within the cell, rather than being passed individually to the worker. Also add required process_cell and process_discard_cell error handling wrappers and set associated pool-mode function pointers accordingly. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Purely cleanup of duplicated code, no functional change. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
When processing a discard bio, if the block is already quiesced do the discard immediately rather than adding the mapping to a list for the next iteration of the worker thread. Discarding a fully provisioned 100G thin volume with 64k block size goes from 860s to 95s with this change. Clearly there's something wrong with the worker architecture, more investigation needed. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Introduce thin_merge so that any additional constraints from the data volume may be taken into account when determing the maximum number of sectors that can be issued relative to the specified logical offset. This is particularly important if/when the data volume is layered ontop of a more sophisticated device (e.g. dm-raid or some other DM target). Reviewed-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
These code changes do not introduce a functional change. But bio_add_page() will never attempt to build up a bio larger than queue_max_sectors(). Similarly, bio_get_nr_vecs() is also bound by queue_max_sectors(). Therefore, there is no point in allowing dm_merge_bvec() to answer "how many sectors can a bio have at this offset?" with anything larger than queue_max_sectors(). Using queue_max_sectors() rather than BIO_MAX_SECTORS serves to more accurately convey the limits that are being imposed. Also, use unlikely() to clarify the fact that the defensive code in dm_merge_bvec() relative to max_size going negative shouldn't ever happen -- if it does happen there is a bug in the block layer for requesting larger than dm_merge_bvec()'s initial response for a given offset. Also, update a comment in dm_merge_bvec() relative to max_hw_sectors_kb. And fix empty newline whitespace. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Allows for filesystems to submit bios that are a factor of the thinp blocksize, improving dm-thinp efficiency (particularly when the data volume is RAID). Also set io_min to max_sectors_kb if it is a factor of the thinp blocksize. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Throttle IO based on the time it's taking the worker to do one loop. There were reports of hung task timeouts occuring and it was observed that the excessively long avgqu-sz (as reported by iostat) was contributing to these hung tasks. Throttling definitely helps dm-thinp perform better under heavy IO load (without being detremental by being overzealous). It reduces avgqu-sz drastically, e.g.: from 60K to ~6K, and even as low as 150 once metadata is cached by bufio, when dirty_ratio=5, dirty_background_ratio=2. And avgqu-sz stays at or below 30K even with dirty_ratio=20, dirty_background_ratio=10. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Prefetch metadata at the start of the worker thread and then again every 128th bio processed from the deferred list. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
Introduce the dm_tm_issue_prefetches interface. If you're using a non-blocking clone the tm will build up a list of requested blocks that weren't in core. dm_tm_issue_prefetches will request those blocks to be prefetched. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-