提交 · 7f82f000ed030d1108b4de47d9e2d556092980c6 · OpenHarmony / kernel_linux

30 10月, 2008 2 次提交

dm snapshot: wait for chunks in destructor · 879129d2

由 Mikulas Patocka 提交于 10月 30, 2008

If there are several snapshots sharing an origin and one is removed
while the origin is being written to, the snapshot's mempool may get
deleted while elements are still referenced.

Prior to dm-snapshot-use-per-device-mempools.patch the pending
exceptions may still have been referenced after the snapshot was
destroyed, but this was not a problem because the shared mempool
was still there.

This patch fixes the problem by tracking the number of mempool elements
in use.

The scenario:
- You have an origin and two snapshots 1 and 2.
- Someone writes to the origin.
- It creates two exceptions in the snapshots, snapshot 1 will be primary
exception, snapshot 2's pending_exception->primary_pe will point to the
exception in snapshot 1.
- The exceptions are being relocated, relocation of exception 1 finishes
(but it's pending_exception is still allocated, because it is referenced
by an exception from snapshot 2)
- The user lvremoves snapshot 1 --- it calls just suspend (does nothing)
and destructor. md->pending is zero (there is no I/O submitted to the
snapshot by md layer), so it won't help us.
- The destructor waits for kcopyd jobs to finish on snapshot 1 --- but
there are none.
- The destructor on snapshot 1 cleans up everything.
- The relocation of exception on snapshot 2 finishes, it drops reference
on primary_pe. This frees its primary_pe pointer. Primary_pe points to
pending exception created for snapshot 1. So it frees memory into
non-existing mempool.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

879129d2

dm snapshot: fix register_snapshot deadlock · 60c856c8

由 Mikulas Patocka 提交于 10月 30, 2008

register_snapshot() performs a GFP_KERNEL allocation while holding
_origins_lock for write, but that could write out dirty pages onto a
device that attempts to acquire _origins_lock for read, resulting in
deadlock.

So move the allocation up before taking the lock.

This path is not performance-critical, so it doesn't matter that we
allocate memory and free it if we find that we won't need it.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

60c856c8

22 10月, 2008 2 次提交

dm snapshot: drop unused last_percent · f68d4f3d

由 Mikulas Patocka 提交于 10月 21, 2008

The last_percent field is unused - remove it.
(It dates from when events were triggered as each X% filled up.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f68d4f3d

dm snapshot: fix primary_pe race · 7c5f78b9

由 Mikulas Patocka 提交于 10月 21, 2008

Fix a race condition with primary_pe ref_count handling.

put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.

__origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
dm_snapshot->lock.

This opens the following race condition:
Assume two CPUs, CPU1 is executing put_pending_exception (and holding
dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
primary_pe->ref_count == 2.

CPU1:
if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
	origin_bios = bio_list_get(&primary_pe->origin_bios);
... decrements primary_pe->ref_count to 1. Doesn't load origin_bios

CPU2:
if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
	flush_bios(bio_list_get(&primary_pe->origin_bios));
	free_pending_exception(primary_pe);
	/* If we got here, pe_queue is necessarily empty. */
	return r;
}
... decrements primary_pe->ref_count to 0, submits pending bios, frees
primary_pe.

CPU1:
if (!primary_pe || primary_pe != pe)
	free_pending_exception(pe);
... this has no effect.
if (primary_pe && !atomic_read(&primary_pe->ref_count))
	free_pending_exception(primary_pe);
... sees ref_count == 0 (written by CPU 2), does double free !!

This bug can happen only if someone is simultaneously writing to both the
origin and the snapshot.

If someone is writing only to the origin, __origin_write will submit kcopyd
request after it decrements primary_pe->ref_count (so it can't happen that the
finished copy races with primary_pe->ref_count decrementation).

If someone is writing only to the snapshot, __origin_write isn't invoked at all
and the race can't happen.

The race happens when someone writes to the snapshot --- this creates
pending_exception with primary_pe == NULL and starts copying. Then, someone
writes to the same chunk in the snapshot, and __origin_write races with
termination of already submitted request in pending_complete (that calls
put_pending_exception).

This race may be reason for bugs:
  http://bugzilla.kernel.org/show_bug.cgi?id=11636
  https://bugzilla.redhat.com/show_bug.cgi?id=465825

The patch fixes the code to make sure that:
1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
must no longer dereference primary_pe (because someone else may free it under
us).
2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
is responsible for freeing primary_pe.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org

7c5f78b9

21 7月, 2008 3 次提交

dm snapshot: use per device mempools · 92e86812

由 Mikulas Patocka 提交于 7月 21, 2008

Change snapshot per-module mempool to per-device mempool.

Per-module mempools could cause a deadlock if multiple
snapshot devices are stacked above each other.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

92e86812

dm snapshot: fix race during exception creation · a8d41b59

由 Mikulas Patocka 提交于 7月 21, 2008

Fix a race condition that returns incorrect data when a write causes an
exception to be allocated whilst a read is still in flight.

The race condition happens as follows:
* A read to non-reallocated sector in the snapshot is submitted so that the
  read is routed to the original device.
* A write to the original device is submitted. The write causes an exception
  that reallocates the block.  The write proceeds.
* The original read is dequeued and reads the wrong data.

This race can be triggered with CFQ scheduler and one thread writing and
multiple threads reading simultaneously.

(This patch relies upon the earlier dm-kcopyd-per-device.patch to avoid a
deadlock.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a8d41b59

dm snapshot: track snapshot reads · cd45daff

由 Mikulas Patocka 提交于 7月 21, 2008

Whenever a snapshot read gets mapped through to the origin, track it in
a per-snapshot hash table indexed by chunk number, using memory allocated
from a new per-snapshot mempool.

We need to track these reads to avoid race conditions which will be fixed
by patches that follow.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

cd45daff

25 4月, 2008 5 次提交

dm: move include files · a765e20e

由 Alasdair G Kergon 提交于 4月 24, 2008

Publish the dm-io, dm-log and dm-kcopyd headers in include/linux.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a765e20e

dm kcopyd: clean interface · eb69aca5

由 Heinz Mauelshagen 提交于 4月 24, 2008

Clean up the kcopyd interface to prepare for publishing it in include/linux.
Signed-off-by: NHeinz Mauelshagen <hjm@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

eb69aca5

dm io: clean interface · 22a1ceb1

由 Heinz Mauelshagen 提交于 4月 24, 2008

Clean up the dm-io interface to prepare for publishing it in include/linux.
Signed-off-by: NHeinz Mauelshagen <hjm@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

22a1ceb1

dm snapshot: store pointer to target instance · 72727bad

由 Mikulas Patocka 提交于 4月 24, 2008

Save pointer to dm_target in dm_snapshot structure.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

72727bad

dm snapshot: reduce default memory allocation · 8ee2767a

由 Milan Broz 提交于 4月 24, 2008

Limit the amount of memory allocated per snapshot on systems
with a large page size.  (The larger default chunk size on
these systems compensates for the smaller number of pages reserved.)
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8ee2767a

29 3月, 2008 1 次提交

dm io: write error bits form long not int · 4cdc1d1f

由 Alasdair G Kergon 提交于 3月 28, 2008

write_err is an unsigned long used with set_bit() so should not be passed
around as unsigned int.

http://bugzilla.kernel.org/show_bug.cgi?id=10271Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4cdc1d1f

08 2月, 2008 2 次提交

dm snapshot: combine consecutive exceptions in memory · d74f81f8

由 Milan Broz 提交于 2月 08, 2008

Provided sector_t is 64 bits, reduce the in-memory footprint of the
snapshot exception table by the simple method of using unused bits of
the chunk number to combine consecutive entries.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d74f81f8

dm snapshot: use rounddown_pow_of_two · 8defd830

由 Robert P. J. Day 提交于 2月 08, 2008

Since the source file already includes the log2.h header file, it
seems pointless to re-invent the necessary routine.
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8defd830

20 10月, 2007 1 次提交

dm: use is_power_of_2 · 6f3c3f0a

由 vignesh babu 提交于 10月 19, 2007

Replacing n & (n - 1) for power of 2 check by is_power_of_2(n)
Signed-off-by: Nvignesh babu <vignesh.babu@wipro.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6f3c3f0a

10 10月, 2007 1 次提交

Drop 'size' argument from bio_endio and bi_end_io · 6712ecf8

由 NeilBrown 提交于 9月 27, 2007

As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant.  Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size.  So don't do that either.

While we are at it, change bi_end_io to return void.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6712ecf8

13 7月, 2007 3 次提交

dm: disable barriers · 07a83c47

由 Stefan Bader 提交于 7月 12, 2007

This patch causes device-mapper to reject any barrier requests.  This is done
since most of the targets won't handle this correctly anyway.  So until the
situation improves it is better to reject these requests at the first place.
Since barrier requests won't get to the targets, the checks there can be
removed.

Cc: stable@kernel.org
Signed-off-by: NStefan Bader <shbader@de.ibm.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

07a83c47

dm snapshot: permit invalid activation · 0764147b

由 Milan Broz 提交于 7月 12, 2007

Allow invalid snapshots to be activated instead of failing.

This allows userspace to reinstate any given snapshot state - for
example after an unscheduled reboot - and clean up the invalid snapshot
at its leisure.

Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0764147b

dm: use kmem_cache macro · 028867ac

由 Alasdair G Kergon 提交于 7月 12, 2007

Use new KMEM_CACHE() macro and make the newly-exposed structure names more
meaningful.  Also remove some superfluous casts and inlines (let a modern
compiler be the judge).
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

028867ac

09 12月, 2006 3 次提交

[PATCH] make drivers/md/dm-snap.c:ksnapd static · c642f9e0

由 Adrian Bunk 提交于 12月 08, 2006

Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Acked-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c642f9e0

[PATCH] dm: snapshot: abstract memory release · 31c93a0c

由 Milan Broz 提交于 12月 08, 2006

Move the code that releases memory used by a snapshot into a separate function.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

31c93a0c

[PATCH] dm: map and endio symbolic return codes · d2a7ad29

由 Kiyoshi Ueda 提交于 12月 08, 2006

Update existing targets to use the new symbols for return values from target
map and end_io functions.

There is no effect on behaviour.

Test results:
Done build test without errors.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d2a7ad29

08 12月, 2006 1 次提交

[PATCH] slab: remove kmem_cache_t · e18b890b

由 Christoph Lameter 提交于 12月 06, 2006

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e18b890b

22 11月, 2006 1 次提交
- D
  WorkStruct: make allyesconfig · c4028958
  由 David Howells 提交于 11月 22, 2006
```
Fix up for make allyesconfig.
Signed-Off-By: NDavid Howells <dhowells@redhat.com>
```
  c4028958
03 10月, 2006 8 次提交

[PATCH] dm snapshot: fix freeing pending exception · 695368ac

由 Alasdair G Kergon 提交于 10月 03, 2006

If a snapshot became invalid while there are outstanding pending_exceptions,
when pending_complete() processes each one it forgets to remove the
corresponding exception from its exception table before freeing it.

Fix this by moving the 'out:' label up one statement so that
remove_exception() is always called. Then __invalidate_exception() no longer
needs to call it and its 'pe' argument become superfluous.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

695368ac

[PATCH] dm snapshot: tidy pe ref counting · 4b832e8d

由 Alasdair G Kergon 提交于 10月 03, 2006

Rename sibling_count to ref_count and introduce get and put functions.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4b832e8d

[PATCH] dm snapshot: add workqueue · ca3a931f

由 Alasdair G Kergon 提交于 10月 03, 2006

Add a workqueue so that I/O can be queued up to be flushed from a separate
thread (e.g.  if local interrupts are disabled).

A new per-snapshot spinlock pe_lock is introduced to protect queued_bios.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ca3a931f

[PATCH] dm snapshot: tidy pending_complete · 9d493fa8

由 Alasdair G Kergon 提交于 10月 03, 2006

This patch rearranges the pending_complete() code so that the functional
changes in subsequent patches are clearer.

By consolidating the error and the non-error paths, we can move
error_snapshot_bios() and __flush_bios() in line.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9d493fa8

[PATCH] dm snapshot: tidy snapshot_map · ba40a2aa

由 Alasdair G Kergon 提交于 10月 03, 2006

This patch rearranges the snapshot_map code so that the functional changes in
subsequent patches are clearer.

The only functional change is to replace the existing read lock with a write
lock which the next patch needs.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ba40a2aa

[PATCH] dm snapshot: fix metadata error handling · f9cea4f7

由 Mark McLoughlin 提交于 10月 03, 2006

Fix the error handling when store.read_metadata is called: the error should be
returned immediately.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f9cea4f7

[PATCH] dm snapshot: allow zero chunk_size · 4c7e3bf4

由 Mark McLoughlin 提交于 10月 03, 2006

The chunk size of snapshots cannot be changed so it is redundant to require it
as a parameter when activating an existing snapshot.  Allow a value of zero in
this case and ignore it.  For a new snapshot, use a default value if zero is
specified.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4c7e3bf4

[PATCH] dm snapshot: fix invalidation ENOMEM · 92c060a6

由 Milan Broz 提交于 10月 03, 2006

Fix ENOMEM error sign.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

92c060a6

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

27 6月, 2006 2 次提交

[PATCH] dm: improve error message consistency · 72d94861

由 Alasdair G Kergon 提交于 6月 26, 2006

Tidy device-mapper error messages to include context information
automatically.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

72d94861

[PATCH] dm snapshot: unify chunk_size · c51c2752

由 Alasdair G Kergon 提交于 6月 26, 2006

Persistent snapshots currently store a private copy of the chunk size.
Userspace also supplies the chunk size when loading a snapshot.  Ensure
consistency by only storing the chunk_size in one place instead of two.

Currently the two sizes will differ if the chunk size supplied by userspace
does not match the chunk size an existing snapshot actually uses.  Amongst
other problems, this causes an incorrect 'percentage full' to be reported.

The patch ensures consistency by only storing the chunk_size in one place,
removing it from struct pstore.  Some initialisation is delayed until the
correct chunk_size is known.  If read_header() discovers that the wrong chunk
size was supplied, the 'area' buffer (which the header already got read into)
is reinitialised to the correct size.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c51c2752

28 3月, 2006 4 次提交

[PATCH] dm snapshot: fix kcopyd destructor · 138728dc

由 Alasdair G Kergon 提交于 3月 27, 2006

Before removing a snapshot, wait for the completion of any kcopyd jobs using
it.

Do this by maintaining a count (nr_jobs) of how many outstanding jobs each
kcopyd_client has.

The snapshot destructor first unregisters the snapshot so that no new kcopyd
jobs (created by writes to the origin) will reference that particular
snapshot.  kcopyd_client_destroy() is now run next to wait for the completion
of any outstanding jobs before the snapshot exception structures (that those
jobs reference) are freed.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

138728dc

[PATCH] dm: remove SECTOR_FORMAT · 4ee218cd

由 Andrew Morton 提交于 3月 27, 2006

We don't know what type sector_t has.  Sometimes it's unsigned long, sometimes
it's unsigned long long.  For example on ppc64 it's unsigned long with
CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.

The way to handle all of this is to always use unsigned long long and to
always typecast the sector_t when printing it.
Acked-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4ee218cd

[PATCH] device-mapper snapshot: fix invalidation · 76df1c65

由 Alasdair G Kergon 提交于 3月 27, 2006

When a snapshot becomes invalid, s->valid is set to 0.  In this state, a
snapshot can no longer be accessed.

When s->lock is acquired, before doing anything else, s->valid must be checked
to ensure the snapshot remains valid.

This patch eliminates some races (that may cause panics) by adding some
missing checks.  At the same time, some unnecessary levels of indentation are
removed and snapshot invalidation is moved into a single function that always
generates a device-mapper event.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

76df1c65

[PATCH] device-mapper snapshot: replace sibling list · b4b610f6

由 Alasdair G Kergon 提交于 3月 27, 2006

The siblings "list" is used unsafely at the moment.

Firstly, only the element on the list being changed gets locked (via the
snapshot lock), not the next and previous elements which have pointers that
are also being changed.

Secondly, if you have two or more snapshots and write to the same chunk a
second time before every snapshot has finished making its private copy of the
data, if you're unlucky, _origin_write() could attempt its list_merge() and
dereference a 'last' pointer to a pending_exception structure that has just
been freed.

Analysis reveals that the list is actually only there for reference counting.
If 5 pending_exceptions are needed in origin_write, then the 5 are joined
together into a 5-element list - without a separate list head because there's
nowhere suitable to store it.  As the pending_exceptions complete, they are
removed from the list one-by-one and any contents of origin_bios get moved
across to one of the remaining pending_exceptions on the list.  Whichever one
is last is detected because list_empty() is then true and the origin_bios get
submitted.

The fix proposed here uses an alternative reference counting mechanism by
choosing one of the pending_exceptions as primary and maintaining an atomic
counter there.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b4b610f6

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年