提交 · 3510cb94ff7b04b016bd22bfee913e2c1c05c066 · openeuler / Kernel

11 12月, 2009 7 次提交

dm snapshot: rename exception functions · 3510cb94

由 Jon Brassow 提交于 12月 10, 2009

Rename exception functions.  Preparing to pull them out of
dm-snap.c for broader use.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

3510cb94

dm snapshot: rename exception_table to dm_exception_table · 191437a5

由 Jon Brassow 提交于 12月 10, 2009

Rename exception_table for broader use outside dm-snap.c
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

191437a5

dm snapshot: rename dm_snap_exception to dm_exception · 1d4989c8

由 Jon Brassow 提交于 12月 10, 2009

The exception structure is not necessarily just a snapshot
element (especially after we pull it out of dm-snap.c).

Renaming appropriately.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1d4989c8

dm snapshot: consolidate insert exception functions · d32a6ea6

由 Jon Brassow 提交于 12月 10, 2009

Consolidate the insert_*exception functions.  'insert_completed_exception'
already contains all the logic to handle 'insert_exception' (via
check for a hash_shift of 0), so remove redundant function.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d32a6ea6

dm snapshot: abstract minimum_chunk_size fn · 7e201b35

由 Mikulas Patocka 提交于 12月 10, 2009

The origin needs to find minimum chunksize of all snapshots.  This logic is
moved to a separate function because it will be used at another place in
the snapshot merge patches.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

7e201b35

dm snapshot: cope with chunk size larger than origin · 8e87b9b8

由 Mikulas Patocka 提交于 12月 10, 2009

Under some special conditions the snapshot hash_size is calculated as zero.
This patch instead sets a minimum value of 64, the same as for the
pending exception table.

rounddown_pow_of_two(0) is an undefined operation (it expands to shift
by -1).  init_exception_table with an argument of 0 would fail with -ENOMEM.

The way to trigger the problem is to create a snapshot with a chunk size
that is larger than the origin device.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8e87b9b8

dm snapshot: only take lock for statustype info not table · 94e76572

由 Mikulas Patocka 提交于 12月 10, 2009

Take snapshot lock only for STATUSTYPE_INFO, not STATUSTYPE_TABLE.

Commit 4c6fff44
(dm-snapshot-lock-snapshot-while-supplying-status.patch)
introduced this use of the lock, but userspace applications using
libdevmapper have been found to request STATUSTYPE_TABLE while the device
is suspended and the lock is already held, leading to deadlock.  Since
the lock is not necessary in this case, don't try to take it.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

94e76572

17 10月, 2009 5 次提交

dm snapshot: use unsigned integer chunk size · df96eee6

由 Mikulas Patocka 提交于 10月 16, 2009

Use unsigned integer chunk size.

Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size,
so the number can be 32-bit. This fixes compiler failure on 32-bit systems
with large block devices.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

df96eee6

dm snapshot: lock snapshot while supplying status · 4c6fff44

由 Mikulas Patocka 提交于 10月 16, 2009

This patch locks the snapshot when returning status.  It fixes a race
when it could return an invalid number of free chunks if someone
was simultaneously modifying it.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4c6fff44

dm snapshot: require non zero chunk size by end of ctr · 3f2412dc

由 Mikulas Patocka 提交于 10月 16, 2009

If we are creating snapshot with memory-stored exception store, fail if
the user didn't specify chunk size. Zero chunk size would probably crash
a lot of places in the rest of snapshot code.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

3f2412dc

dm snapshot: free exception store on init failure · 034a186d

由 Jonathan Brassow 提交于 10月 16, 2009

While initializing the snapshot module, if we fail to register
the snapshot target then we must back-out the exception store
module initialization.

Cc: stable@kernel.org
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

034a186d

dm snapshot: sort by chunk size to fix race · 6d45d93e

由 Mikulas Patocka 提交于 10月 16, 2009

Avoid a race causing corruption when snapshots of the same origin have
different chunk sizes by sorting the internal list of snapshots by chunk
size, largest first.
  https://bugzilla.redhat.com/show_bug.cgi?id=182659

For example, let's have two snapshots with different chunk sizes. The
first snapshot (1) has small chunk size and the second snapshot (2) has
large chunk size.  Let's have chunks A, B, C in these snapshots:
snapshot1: ====A====   ====B====
snapshot2: ==========C==========

(Chunk size is a power of 2. Chunks are aligned.)

A write to the origin at a position within A and C comes along. It
triggers reallocation of A, then reallocation of C and links them
together using A as the 'primary' exception.

Then another write to the origin comes along at a position within B and
C.  It creates pending exception for B.  C already has a reallocation in
progress and it already has a primary exception (A), so nothing is done
to it: B and C are not linked.

If the reallocation of B finishes before the reallocation of C, because
there is no link with the pending exception for C it does not know to
wait for it and, the second write is dispatched to the origin and causes
data corruption in the chunk C in snapshot2.

To avoid this situation, we maintain snapshots sorted in descending
order of chunk size.  This leads to a guaranteed ordering on the links
between the pending exceptions and avoids the problem explained above -
both A and B now get linked to C.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6d45d93e

05 9月, 2009 1 次提交

dm snapshot: implement iterate devices · 8811f46c

由 Mike Snitzer 提交于 9月 04, 2009

Implement the .iterate_devices for the origin and snapshot targets.
dm-snapshot's lack of .iterate_devices resulted in the inability to
properly establish queue_limits for both targets.

With 4K sector drives: an unfortunate side-effect of not establishing
proper limits in either targets' DM device was that IO to the devices
would fail even though both had been created without error.

Commit af4874e0 ("dm target:s introduce
iterate devices fn") in 2.6.31-rc1 should have implemented .iterate_devices
for dm-snap.c's origin and snapshot targets.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8811f46c

22 6月, 2009 1 次提交

dm snapshot: support barriers · 494b3ee7

由 Mikulas Patocka 提交于 6月 22, 2009

Flush support for dm-snapshot target.

This patch just forwards the flush request to either the origin or the snapshot
device.  (It doesn't flush exception store metadata.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

494b3ee7

15 4月, 2009 1 次提交

block: move bio list helpers into bio.h · 8f3d8ba2

由 Christoph Hellwig 提交于 4月 07, 2009

It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8f3d8ba2

03 4月, 2009 13 次提交

dm snapshot: move status to exception store · 1e302a92

由 Jonathan Brassow 提交于 4月 02, 2009

Let the exception store types print out their status through
the new API, rather than having the snapshot code do it.

Adjust the buffer position to allow for the preceding DMEMIT in the
arguments to type->status().
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1e302a92

dm snapshot: move ctr parsing to exception store · fee1998e

由 Jonathan Brassow 提交于 4月 02, 2009

First step of having the exception stores parse their own arguments -
generalizing the interface.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fee1998e

dm snapshot: use DMEMIT macro for status · 2e4a31df

由 Jonathan Brassow 提交于 4月 02, 2009

Use DMEMIT in place of snprintf.  This makes it easier later when
other modules are helping to populate our status output.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2e4a31df

dm snapshot: remove dm_snap header · ccc45ea8

由 Jonathan Brassow 提交于 4月 02, 2009

Move some of the last bits from dm-snap.h into dm-snap.c where they
belong and remove dm-snap.h.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

ccc45ea8

dm snapshot: remove dm_snap header use · 71fab00a

由 Jonathan Brassow 提交于 4月 02, 2009

Move useful functions out of dm-snap.h and stop using dm-snap.h.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

71fab00a

dm exception store: move cow pointer · 49beb2b8

由 Jonathan Brassow 提交于 4月 02, 2009

Move COW device from snapshot to exception store.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

49beb2b8

dm exception store: move chunk_fields · d0216849

由 Jonathan Brassow 提交于 4月 02, 2009

Move chunk fields from snapshot to exception store.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d0216849

dm exception store: move dm_target pointer · 0cea9c78

由 Jonathan Brassow 提交于 4月 02, 2009

Move target pointer from snapshot to exception store.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0cea9c78

dm exception store: introduce registry · 493df71c

由 Jonathan Brassow 提交于 4月 02, 2009

Move exception stores into a registry.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

493df71c

dm exception store: separate type from instance · b2a11465

由 Jonathan Brassow 提交于 4月 02, 2009

Introduce struct dm_exception_store_type.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b2a11465

dm snapshot: avoid having two exceptions for the same chunk · 35bf659b

由 Mikulas Patocka 提交于 4月 02, 2009

We need to check if the exception was completed after dropping the lock.

After regaining the lock, __find_pending_exception checks if the exception
was already placed into &s->pending hash.

But we don't check if the exception was already completed and placed into
&s->complete hash. If the process waiting in alloc_pending_exception was
delayed at this point because of a scheduling latency and the exception
was meanwhile completed, we'd miss that and allocate another pending
exception for already completed chunk.

It would lead to a situation where two records for the same chunk exist
and potential data corruption because multiple snapshot I/Os to the
affected chunk could be redirected to different locations in the
snapshot.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

35bf659b

dm snapshot: avoid dropping lock in __find_pending_exception · c6621392

由 Mikulas Patocka 提交于 4月 02, 2009

It is uncommon and bug-prone to drop a lock in a function that is called with
the lock held, so this is moved to the caller.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c6621392

dm snapshot: refactor __find_pending_exception · 2913808e

由 Mikulas Patocka 提交于 4月 02, 2009

Move looking-up of a pending exception from __find_pending_exception to another
function.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2913808e

06 1月, 2009 5 次提交

dm snapshot: extend exception store functions · a159c1ac

由 Jonathan Brassow 提交于 1月 06, 2009

Supply dm_add_exception as a callback to the read_metadata function.
Add a status function ready for a later patch and name the functions
consistently.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a159c1ac

dm snapshot: split out exception store implementations · 4db6bfe0

由 Alasdair G Kergon 提交于 1月 06, 2009

Move the existing snapshot exception store implementations out into
separate files.  Later patches will place these behind a new
interface in preparation for alternative implementations.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4db6bfe0

dm snapshot: separate out exception store interface · aea53d92

由 Jonathan Brassow 提交于 1月 06, 2009

Pull structures that bridge the gap between snapshot and
exception store out of dm-snap.h and put them in a new
.h file - dm-exception-store.h.  This file will define the
API for new exception stores.

Ultimately, dm-snap.h is unnecessary, since only dm-snap.c
should be using it.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

aea53d92

dm: consolidate target deregistration error handling · 10d3bd09

由 Mikulas Patocka 提交于 1月 06, 2009

Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

10d3bd09

dm snapshot: change yield to msleep · 90fa1527

由 Mikulas Patocka 提交于 1月 06, 2009

Change yield() to msleep(1). If the thread had realtime priority,
yield() doesn't really yield, so the yielding process would loop
indefinitely and cause machine lockup.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

90fa1527

30 10月, 2008 2 次提交

dm snapshot: wait for chunks in destructor · 879129d2

由 Mikulas Patocka 提交于 10月 30, 2008

If there are several snapshots sharing an origin and one is removed
while the origin is being written to, the snapshot's mempool may get
deleted while elements are still referenced.

Prior to dm-snapshot-use-per-device-mempools.patch the pending
exceptions may still have been referenced after the snapshot was
destroyed, but this was not a problem because the shared mempool
was still there.

This patch fixes the problem by tracking the number of mempool elements
in use.

The scenario:
- You have an origin and two snapshots 1 and 2.
- Someone writes to the origin.
- It creates two exceptions in the snapshots, snapshot 1 will be primary
exception, snapshot 2's pending_exception->primary_pe will point to the
exception in snapshot 1.
- The exceptions are being relocated, relocation of exception 1 finishes
(but it's pending_exception is still allocated, because it is referenced
by an exception from snapshot 2)
- The user lvremoves snapshot 1 --- it calls just suspend (does nothing)
and destructor. md->pending is zero (there is no I/O submitted to the
snapshot by md layer), so it won't help us.
- The destructor waits for kcopyd jobs to finish on snapshot 1 --- but
there are none.
- The destructor on snapshot 1 cleans up everything.
- The relocation of exception on snapshot 2 finishes, it drops reference
on primary_pe. This frees its primary_pe pointer. Primary_pe points to
pending exception created for snapshot 1. So it frees memory into
non-existing mempool.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

879129d2

dm snapshot: fix register_snapshot deadlock · 60c856c8

由 Mikulas Patocka 提交于 10月 30, 2008

register_snapshot() performs a GFP_KERNEL allocation while holding
_origins_lock for write, but that could write out dirty pages onto a
device that attempts to acquire _origins_lock for read, resulting in
deadlock.

So move the allocation up before taking the lock.

This path is not performance-critical, so it doesn't matter that we
allocate memory and free it if we find that we won't need it.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

60c856c8

22 10月, 2008 2 次提交

dm snapshot: drop unused last_percent · f68d4f3d

由 Mikulas Patocka 提交于 10月 21, 2008

The last_percent field is unused - remove it.
(It dates from when events were triggered as each X% filled up.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f68d4f3d

dm snapshot: fix primary_pe race · 7c5f78b9

由 Mikulas Patocka 提交于 10月 21, 2008

Fix a race condition with primary_pe ref_count handling.

put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.

__origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
dm_snapshot->lock.

This opens the following race condition:
Assume two CPUs, CPU1 is executing put_pending_exception (and holding
dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
primary_pe->ref_count == 2.

CPU1:
if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
	origin_bios = bio_list_get(&primary_pe->origin_bios);
... decrements primary_pe->ref_count to 1. Doesn't load origin_bios

CPU2:
if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
	flush_bios(bio_list_get(&primary_pe->origin_bios));
	free_pending_exception(primary_pe);
	/* If we got here, pe_queue is necessarily empty. */
	return r;
}
... decrements primary_pe->ref_count to 0, submits pending bios, frees
primary_pe.

CPU1:
if (!primary_pe || primary_pe != pe)
	free_pending_exception(pe);
... this has no effect.
if (primary_pe && !atomic_read(&primary_pe->ref_count))
	free_pending_exception(primary_pe);
... sees ref_count == 0 (written by CPU 2), does double free !!

This bug can happen only if someone is simultaneously writing to both the
origin and the snapshot.

If someone is writing only to the origin, __origin_write will submit kcopyd
request after it decrements primary_pe->ref_count (so it can't happen that the
finished copy races with primary_pe->ref_count decrementation).

If someone is writing only to the snapshot, __origin_write isn't invoked at all
and the race can't happen.

The race happens when someone writes to the snapshot --- this creates
pending_exception with primary_pe == NULL and starts copying. Then, someone
writes to the same chunk in the snapshot, and __origin_write races with
termination of already submitted request in pending_complete (that calls
put_pending_exception).

This race may be reason for bugs:
  http://bugzilla.kernel.org/show_bug.cgi?id=11636
  https://bugzilla.redhat.com/show_bug.cgi?id=465825

The patch fixes the code to make sure that:
1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
must no longer dereference primary_pe (because someone else may free it under
us).
2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
is responsible for freeing primary_pe.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org

7c5f78b9

21 7月, 2008 3 次提交

dm snapshot: use per device mempools · 92e86812

由 Mikulas Patocka 提交于 7月 21, 2008

Change snapshot per-module mempool to per-device mempool.

Per-module mempools could cause a deadlock if multiple
snapshot devices are stacked above each other.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

92e86812

dm snapshot: fix race during exception creation · a8d41b59

由 Mikulas Patocka 提交于 7月 21, 2008

Fix a race condition that returns incorrect data when a write causes an
exception to be allocated whilst a read is still in flight.

The race condition happens as follows:
* A read to non-reallocated sector in the snapshot is submitted so that the
  read is routed to the original device.
* A write to the original device is submitted. The write causes an exception
  that reallocates the block.  The write proceeds.
* The original read is dequeued and reads the wrong data.

This race can be triggered with CFQ scheduler and one thread writing and
multiple threads reading simultaneously.

(This patch relies upon the earlier dm-kcopyd-per-device.patch to avoid a
deadlock.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a8d41b59

dm snapshot: track snapshot reads · cd45daff

由 Mikulas Patocka 提交于 7月 21, 2008

Whenever a snapshot read gets mapped through to the origin, track it in
a per-snapshot hash table indexed by chunk number, using memory allocated
from a new per-snapshot mempool.

We need to track these reads to avoid race conditions which will be fixed
by patches that follow.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

cd45daff

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功