提交 · 0f20972f7bf6922df49ef7ce7a6df802347d2c52 · openeuler / raspberrypi-kernel

22 5月, 2015 1 次提交

block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb

由 Mike Snitzer 提交于 5月 22, 2015

Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
 1) saving a bio's original bi_end_io
 2) wiring up an intermediate bi_end_io
 3) restoring the original bi_end_io from intermediate bi_end_io
 4) calling bio_endio() to execute the restored original bi_end_io

The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called.  For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0.  When bio_inc_remaining() occurred during step
3 it brought it to a value of 2.  When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).

Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining.  For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.

Also, the bio_inc_remaining() interface has been moved local to bio.c.

Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

326e1dbb

06 5月, 2015 1 次提交

bio: skip atomic inc/dec of ->bi_remaining for non-chains · c4cf5261

由 Jens Axboe 提交于 4月 17, 2015

Struct bio has an atomic ref count for chained bio's, and we use this
to know when to end IO on the bio. However, most bio's are not chained,
so we don't need to always introduce this atomic operation as part of
ending IO.

Add a helper to elevate the bi_remaining count, and flag the bio as
now actually needing the decrement at end_io time. Rename the field
to __bi_remaining to catch any current users of this doing the
incrementing manually.

For high IOPS workloads, this reduces the overhead of bio_endio()
substantially.
Tested-by: NRobert Elliott <elliott@hp.com>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

c4cf5261

27 2月, 2015 1 次提交

dm thin: fix to consistently zero-fill reads to unprovisioned blocks · 5f027a3b

由 Joe Thornber 提交于 2月 27, 2015

It was always intended that a read to an unprovisioned block will return
zeroes regardless of whether the pool is in read-only or read-write
mode.  thin_bio_map() was inconsistent with its handling of such reads
when the pool is in read-only mode, it now properly zero-fills the bios
it returns in response to unprovisioned block reads.

Eliminate thin_bio_map()'s special read-only mode handling of -ENODATA
and just allow the IO to be deferred to the worker which will result in
pool->process_bio() handling the IO (which already properly zero-fills
reads to unprovisioned blocks).
Reported-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

5f027a3b

10 2月, 2015 1 次提交

dm: use time_in_range() and time_after() · 0f30af98

由 Manuel Schölling 提交于 5月 22, 2014

To be future-proof and for better readability the time comparisons are modified
to use time_in_range() and time_after() instead of plain, error-prone math.
Signed-off-by: NManuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0f30af98

28 1月, 2015 1 次提交

dm thin: don't allow messages to be sent to a pool target in READ_ONLY or FAIL mode · 2a7eaea0

由 Joe Thornber 提交于 1月 26, 2015

You can't modify the metadata in these modes.  It's better to fail these
messages immediately than let the block-manager deny write locks on
metadata blocks.  Otherwise these failed metadata changes will trigger
'needs_check' to get set in the metadata superblock -- requiring repair
using the thin_check utility.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

2a7eaea0

18 12月, 2014 3 次提交

dm thin: fix crash by initializing thin device's refcount and completion earlier · 2b94e896

由 Marc Dionne 提交于 12月 17, 2014

Commit 80e96c54 ("dm thin: do not allow thin device activation
while pool is suspended") delayed the initialization of a new thin
device's refcount and completion until after this new thin was added
to the pool's active_thins list and the pool lock is released.  This
opens a race with a worker thread that walks the list and calls
thin_get/put, noticing that the refcount goes to 0 and calling
complete, freezing up the system and giving the oops below:

 kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
 kernel: IP: [<ffffffff810d360b>] __wake_up_common+0x2b/0x90

 kernel: Call Trace:
 kernel: [<ffffffff810d3683>] __wake_up_locked+0x13/0x20
 kernel: [<ffffffff810d3dc7>] complete+0x37/0x50
 kernel: [<ffffffffa0595c50>] thin_put+0x20/0x30 [dm_thin_pool]
 kernel: [<ffffffffa059aab7>] do_worker+0x667/0x870 [dm_thin_pool]
 kernel: [<ffffffff816a8a4c>] ? __schedule+0x3ac/0x9a0
 kernel: [<ffffffff810b1aef>] process_one_work+0x14f/0x400
 kernel: [<ffffffff810b206b>] worker_thread+0x6b/0x490
 kernel: [<ffffffff810b2000>] ? rescuer_thread+0x260/0x260
 kernel: [<ffffffff810b6a7b>] kthread+0xdb/0x100
 kernel: [<ffffffff810b69a0>] ? kthread_create_on_node+0x170/0x170
 kernel: [<ffffffff816ad7ec>] ret_from_fork+0x7c/0xb0
 kernel: [<ffffffff810b69a0>] ? kthread_create_on_node+0x170/0x170

Set the thin device's initial refcount and initialize the completion
before adding it to the pool's active_thins list in thin_ctr().
Signed-off-by: NMarc Dionne <marc.dionne@your-file-system.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2b94e896

dm thin: fix missing out-of-data-space to write mode transition if blocks are released · 2c43fd26

由 Joe Thornber 提交于 12月 11, 2014

Discard bios and thin device deletion have the potential to release data
blocks.  If the thin-pool is in out-of-data-space mode, and blocks were
released, transition the thin-pool back to full write mode.

The correct time to do this is just after the thin-pool metadata commit.
It cannot be done before the commit because the space maps will not
allow immediate reuse of the data blocks in case there's a rollback
following power failure.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

2c43fd26

dm thin: fix inability to discard blocks when in out-of-data-space mode · 45ec9bd0

由 Joe Thornber 提交于 12月 10, 2014

When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard
function pointer was incorrectly being set to
process_prepared_discard_passdown rather than process_prepared_discard.

This incorrect function pointer meant the discard was being passed down,
but not effecting the mapping.  As such any discard that was issued, in
an attempt to reclaim blocks, would not successfully free data space.
Reported-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

45ec9bd0

22 11月, 2014 1 次提交

dm thin: fix pool_io_hints to avoid looking at max_hw_sectors · d200c30e

由 Mike Snitzer 提交于 11月 20, 2014

Simplify the pool_io_hints code that works to establish a max_sectors
value that is a power-of-2 factor of the thin-pool's blocksize. The
biggest associated improvement is that the DM thin-pool is no longer
concerning itself with the data device's max_hw_sectors when adjusting
max_sectors.

This fixes the relative fragility of the original "dm thin: adjust
max_sectors_kb based on thinp blocksize" commit that only became
apparent when testing was performed using a DM thin-pool ontop of a
virtio_blk device. One proposed upstream patch detailed the problems
inherent in virtio_blk: https://lkml.org/lkml/2014/11/20/611

So even though virtio_blk incorrectly set its max_hw_sectors it actually
helped make it clear that we need DM thinp to be tolerant of any future
Linux driver that incorrectly sets max_hw_sectors.

We only need to be concerned with modifying the thin-pool device's
max_sectors limit if it is smaller than the thin-pool's blocksize. In
this case the value of max_sectors does become a limiting factor when
upper layers (e.g. filesystems) construct their bios. But if the
hardware can support IOs larger than the thin-pool's blocksize the user
is encouraged to adjust the thin-pool's data device's max_sectors
accordingly -- doing so will enable the thin-pool to inherit the
established user-defined max_sectors.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d200c30e

20 11月, 2014 2 次提交

dm thin: suspend/resume active thin devices when reloading thin-pool · 583024d2

由 Mike Snitzer 提交于 10月 28, 2014

Before this change it was expected that userspace would first suspend
all active thin devices, reload/resize the thin-pool target, then resume
all active thin devices.  Now the thin-pool suspend/resume will trigger
the suspend/resume of all active thins via appropriate calls to
dm_internal_suspend and dm_internal_resume.

Store the mapped_device for each thin device in struct thin_c to make
these calls possible.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

583024d2

dm thin: do not allow thin device activation while pool is suspended · 80e96c54

由 Mike Snitzer 提交于 11月 07, 2014

Otherwise IO could be issued to the pool while it is suspended.

Care was taken to properly interlock between the thin and thin-pool
targets when accessing the pool's 'suspended' flag.  The thin_ctr will
not add a new thin device to the pool's active_thins list if the pool is
susepended.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

80e96c54

13 11月, 2014 2 次提交

M
dm thin: remove stale 'trim' message in block comment above pool_message · 5ec02084
由 Mike Snitzer 提交于 11月 07, 2014
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
5ec02084

dm thin: fix a race in thin_dtr · 17181fb7

由 Mikulas Patocka 提交于 11月 05, 2014

As long as struct thin_c is in the list, anyone can grab a reference of
it.  Consequently, we must wait for the reference count to drop to zero
*after* we remove the structure from the list, not before.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

17181fb7

11 11月, 2014 14 次提交

dm bio prison: introduce support for locking ranges of blocks · 5f274d88

由 Joe Thornber 提交于 9月 17, 2014

Ranges will be placed in the same cell if they overlap.

Range locking is a prerequisite for more efficient multi-block discard
support in both the cache and thin-provisioning targets.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5f274d88

M
dm thin: refactor requeue_io to eliminate spinlock bouncing · 42d6a8ce
由 Mike Snitzer 提交于 10月 19, 2014
```
Also refactor some other bio_list erroring helpers.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
42d6a8ce

dm thin: optimize retry_bios_on_resume · 9d094eeb

由 Mike Snitzer 提交于 10月 19, 2014

Eliminate redundant should_error_unserviceable_bio check and error
loop.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9d094eeb

dm thin: sort the deferred cells · ac4c3f34

由 Joe Thornber 提交于 10月 10, 2014

Sort the cells in logical block order before processing each cell in
process_thin_deferred_cells().  This significantly improves the ondisk
layout on rotational storage, whereby improving read performance.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ac4c3f34

dm thin: direct dispatch when breaking sharing · 23ca2bb6

由 Joe Thornber 提交于 10月 15, 2014

This use of direct submission in process_shared_bio() reduces latency
for submitting bios in the shared cell by avoiding adding those bios to
the deferred list and waiting for the next iteration of the worker.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

23ca2bb6

dm thin: remap the bios in a cell immediately · 2d759a46

由 Joe Thornber 提交于 10月 10, 2014

This use of direct submission in process_prepared_mapping() reduces
latency for submitting bios in a cell by avoiding adding those bios to
the deferred list and waiting for the next iteration of the worker.

But this direct submission exposes the potential for a race between
releasing a cell and incrementing deferred set.  Fix this by introducing
dm_cell_visit_release() and refactoring inc_remap_and_issue_cell()
accordingly.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2d759a46

dm thin: defer whole cells rather than individual bios · a374bb21

由 Joe Thornber 提交于 10月 10, 2014

This avoids dropping the cell, so increases the probability that other
bios will collect within the cell, rather than being passed individually
to the worker.

Also add required process_cell and process_discard_cell error handling
wrappers and set associated pool-mode function pointers accordingly.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a374bb21

dm thin: factor out remap_and_issue_overwrite · 452d7a62

由 Mike Snitzer 提交于 10月 09, 2014

Purely cleanup of duplicated code, no functional change.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

452d7a62

dm thin: performance improvement to discard processing · 7a7e97ca

由 Joe Thornber 提交于 9月 12, 2014

When processing a discard bio, if the block is already quiesced do the
discard immediately rather than adding the mapping to a list for the
next iteration of the worker thread.

Discarding a fully provisioned 100G thin volume with 64k block size goes
from 860s to 95s with this change.

Clearly there's something wrong with the worker architecture, more
investigation needed.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7a7e97ca

dm thin: implement thin_merge · 36f12aeb

由 Mike Snitzer 提交于 10月 09, 2014

Introduce thin_merge so that any additional constraints from the data
volume may be taken into account when determing the maximum number of
sectors that can be issued relative to the specified logical offset.

This is particularly important if/when the data volume is layered ontop
of a more sophisticated device (e.g. dm-raid or some other DM target).
Reviewed-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

36f12aeb

dm thin: adjust max_sectors_kb based on thinp blocksize · 604ea906

由 Mike Snitzer 提交于 10月 09, 2014

Allows for filesystems to submit bios that are a factor of the thinp
blocksize, improving dm-thinp efficiency (particularly when the data
volume is RAID).

Also set io_min to max_sectors_kb if it is a factor of the thinp
blocksize.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

604ea906

dm thin: throttle incoming IO · 7d327fe0

由 Joe Thornber 提交于 10月 06, 2014

Throttle IO based on the time it's taking the worker to do one loop.
There were reports of hung task timeouts occuring and it was observed
that the excessively long avgqu-sz (as reported by iostat) was
contributing to these hung tasks.

Throttling definitely helps dm-thinp perform better under heavy IO load
(without being detremental by being overzealous).  It reduces avgqu-sz
drastically, e.g.: from 60K to ~6K, and even as low as 150 once metadata
is cached by bufio, when dirty_ratio=5, dirty_background_ratio=2.  And
avgqu-sz stays at or below 30K even with dirty_ratio=20,
dirty_background_ratio=10.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7d327fe0

dm thin: prefetch missing metadata pages · 8a01a6af

由 Joe Thornber 提交于 10月 06, 2014

Prefetch metadata at the start of the worker thread and then again every
128th bio processed from the deferred list.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8a01a6af

dm bio prison: switch to using a red black tree · a195db2d

由 Joe Thornber 提交于 10月 06, 2014

Previously it was using a fixed sized hash table.  There are times
when very many concurrent cells are held (such as when processing a very
large discard).  When this happens the hash table performance becomes
very poor.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a195db2d

05 11月, 2014 1 次提交

dm thin: grab a virtual cell before looking up the mapping · c822ed96

由 Joe Thornber 提交于 10月 10, 2014

Avoids normal IO racing with discard.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

c822ed96

02 8月, 2014 3 次提交

dm thin: set minimum_io_size to pool's data block size · fdfb4c8c

由 Mike Snitzer 提交于 7月 18, 2014

Before, if the block layer's limit stacking didn't establish an
optimal_io_size that was compatible with the thin-pool's data block size
we'd set optimal_io_size to the data block size and minimum_io_size to 0
(which the block layer adjusts to be physical_block_size).

Update pool_io_hints() to set both minimum_io_size and optimal_io_size
to the thin-pool's data block size.  This fixes an issue reported where
mkfs.xfs would create more XFS Allocation Groups on thinp volumes than
on a normal linear LV of comparable size, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1003227Reported-by: NChris Murphy <lists@colorremedies.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

fdfb4c8c

dm thin: relax external origin size constraints · e5aea7b4

由 Joe Thornber 提交于 6月 13, 2014

Track the size of any external origin.  Previously the external origin's
size had to be a multiple of the thin-pool's block size, that is no
longer a requirement.  In addition, snapshots that are larger than the
external origin are now supported.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e5aea7b4

dm thin: switch to an atomic_t for tracking pending new block preparations · 50f3c3ef

由 Joe Thornber 提交于 6月 13, 2014

Previously we used separate boolean values to track quiescing and
copying actions.  By switching to an atomic_t we can support blocks that
need a partial copy and partial zero.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

50f3c3ef

12 6月, 2014 1 次提交

dm thin: update discard_granularity to reflect the thin-pool blocksize · 09869de5

由 Lukas Czerner 提交于 6月 11, 2014

DM thinp already checks whether the discard_granularity of the data
device is a factor of the thin-pool block size.  But when using the
dm-thin-pool's discard passdown support, DM thinp was not selecting the
max of the underlying data device's discard_granularity and the
thin-pool's block size.

Update set_discard_limits() to set discard_granularity to the max of
these values.  This enables blkdev_issue_discard() to properly align the
discards that are sent to the DM thin device on a full block boundary.
As such each discard will now cover an entire DM thin-pool block and the
block will be reclaimed.
Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

09869de5

04 6月, 2014 2 次提交

dm thin: return ENOSPC instead of EIO when error_if_no_space enabled · af91805a

由 Mike Snitzer 提交于 5月 22, 2014

Update the DM thin provisioning target's allocation failure error to be
consistent with commit a9d6ceb8 ("[SCSI] return ENOSPC on thin
provisioning failure").

The DM thin target now returns -ENOSPC rather than -EIO when
block allocation fails due to the pool being out of data space (and
the 'error_if_no_space' thin-pool feature is enabled).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-By: NJoe Thornber <ejt@redhat.com>

af91805a

dm thin: cleanup noflush_work to use a proper completion · e7a3e871

由 Joe Thornber 提交于 5月 13, 2014

Factor out a pool_work interface that noflush_work makes use of to wait
for and complete work items (in terms of a proper completion struct).
Allows discontinuing the use of a custom completion in terms of atomic_t
and wait_event.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e7a3e871

21 5月, 2014 1 次提交

dm thin: add 'no_space_timeout' dm-thin-pool module param · 80c57893

由 Mike Snitzer 提交于 5月 20, 2014

Commit 85ad643b ("dm thin: add timeout to stop out-of-data-space mode
holding IO forever") introduced a fixed 60 second timeout.  Users may
want to either disable or modify this timeout.

Allow the out-of-data-space timeout to be configured using the
'no_space_timeout' dm-thin-pool module param.  Setting it to 0 will
disable the timeout, resulting in IO being queued until more data space
is added to the thin-pool.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+

80c57893

15 5月, 2014 2 次提交

dm thin: add timeout to stop out-of-data-space mode holding IO forever · 85ad643b

由 Joe Thornber 提交于 5月 09, 2014

If the pool runs out of data space, dm-thin can be configured to
either error IOs that would trigger provisioning, or hold those IOs
until the pool is resized.  Unfortunately, holding IOs until the pool is
resized can result in a cascade of tasks hitting the hung_task_timeout,
which may render the system unavailable.

Add a fixed timeout so IOs can only be held for a maximum of 60 seconds.
If LVM is going to resize a thin-pool that is out of data space it needs
to be prompt about it.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+

85ad643b

dm thin: allow metadata commit if pool is in PM_OUT_OF_DATA_SPACE mode · 8d07e8a5

由 Joe Thornber 提交于 5月 06, 2014

Commit 3e1a0699 ("dm thin: fix out of data space handling") introduced
a regression in the metadata commit() method by returning an error if
the pool is in PM_OUT_OF_DATA_SPACE mode.  This oversight caused a thin
device to return errors even if the default queue_if_no_space ENOSPC
handling mode is used.

Fix commit() to only fail if pool is in PM_READ_ONLY or PM_FAIL mode.

Reported-by: qindehua@163.com
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+

8d07e8a5

29 4月, 2014 1 次提交

dm thin: use INIT_WORK_ONSTACK in noflush_work to avoid ODEBUG warning · fbcde3d8

由 Mike Snitzer 提交于 4月 29, 2014

Use INIT_WORK_ONSTACK to silence "ODEBUG: object is on stack, but not
annotated".
Reported-by: NZdeněk Kabeláč <zkabelac@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

fbcde3d8

08 4月, 2014 2 次提交

dm thin: fix rcu_read_lock being held in code that can sleep · b10ebd34

由 Joe Thornber 提交于 4月 08, 2014

Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
introduced the use of an rculist for all active thin devices.  The use
of rcu_read_lock() in process_deferred_bios() can result in a BUG if a
dm_bio_prison_cell must be allocated as a side-effect of bio_detain():

 BUG: sleeping function called from invalid context at mm/mempool.c:203
 in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
 3 locks held by kworker/u8:0/6:
   #0:  ("dm-" "thin"){.+.+..}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
   #1:  ((&pool->worker)){+.+...}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
   #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816360b5>] do_worker+0x5/0x4d0

We can't process deferred bios with the rcu lock held, since
dm_bio_prison_cell allocation may block if the bio-prison's cell mempool
is exhausted.

To fix:

- Introduce a refcount and completion field to each thin_c

- Add thin_get/put methods for adjusting the refcount.  If the refcount
  hits zero then the completion is triggered.

- Initialise refcount to 1 when creating thin_c

- When iterating the active_thins list we thin_get() whilst the rcu
  lock is held.

- After the rcu lock is dropped we process the deferred bios for that
  thin.

- When destroying a thin_c we thin_put() and then wait for the
  completion -- to avoid a race between the worker thread iterating
  from that thin_c and destroying the thin_c.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b10ebd34

dm thin: irqsave must always be used with the pool->lock spinlock · 5e3283e2

由 Joe Thornber 提交于 4月 08, 2014

Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
incorrectly stopped disabling irqs when taking the pool's spinlock.

Irqs must be disabled when taking the pool's spinlock otherwise a thread
could spin_lock(), then get interrupted to service thin_endio() in
interrupt context, which would then deadlock in spin_lock_irqsave().
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5e3283e2