提交 · 71fd5ae25d88841c08d5bbea90c0f0a12ca05509 · openeuler / raspberrypi-kernel

29 3月, 2012 17 次提交

dm persistent data: remove space map ref_count entries if redundant · 71fd5ae2

由 Joe Thornber 提交于 3月 28, 2012

Save space by removing entries from the space map ref_count tree if
they're no longer needed.

Ref counts are stored in two places: a bitmap if the ref_count is
below 3, or a btree of uint32_t if 3 or above.

When a ref_count that was above 3 drops below we can remove it from
the tree and save some metadata space.  This removal was commented out
before because I was unsure why this was causing under-populated btree
nodes.  Earlier patches have fixed this issue.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

71fd5ae2

dm thin: commit outstanding data every second · 905e51b3

由 Joe Thornber 提交于 3月 28, 2012

Commit unwritten data every second to prevent too much building up.

Released blocks don't become available until after the next commit
(for crash resilience).  Prior to this patch commits were only
triggered by a message to the target or a REQ_{FLUSH,FUA} bio.  This
allowed far too big a position to build up.

The interval is hard-coded to 1 second.  This is a sensible setting.
I'm not making this user configurable, since there isn't much to be
gained by tweaking this - and a lot lost by setting it far too high.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

905e51b3

dm: reject trailing characters in sccanf input · 31998ef1

由 Mikulas Patocka 提交于 3月 28, 2012

Device mapper uses sscanf to convert arguments to numbers. The problem is that
the way we use it ignores additional unmatched characters in the scanned string.

For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
but also it will match number with some garbage appended, like "123abc".

As a result, device mapper accepts garbage after some numbers. For example
the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
will pass without an error.

This patch fixes all sscanf uses in device mapper. It appends "%c" with
a pointer to a dummy character variable to every sscanf statement.

The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
only if string is a null-terminated number (optionally preceded by some
whitespace characters). If there is some character appended after the number,
sscanf matches "%c", writes the character to the dummy variable and returns 2.
We check the return value for 1 and consequently reject numbers with some
garbage appended.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

31998ef1

dm raid: handle failed devices during start up · 0447568f

由 Jonathan E Brassow 提交于 3月 28, 2012

The dm-raid code currently fails to create a RAID array if any of the
superblocks cannot be read.  This was an oversight as there is already
code to handle this case if the values ('- -') were provided for the
failed array position.

With this patch, if a superblock cannot be read, the array position's
fields are initialized as though '- -' was set in the table.  That is,
the device is failed and the position should not be used, but if there
is sufficient redundancy, the array should still be activated.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0447568f

dm thin metadata: pass correct space map to dm_sm_root_size · fef838cc

由 Joe Thornber 提交于 3月 28, 2012

Fix a harmless typo.

The root is a chunk of data that gets written to the superblock.  This
data is used to recreate the space map when opening a metadata area.
We have two space maps; one tracking space on the metadata device and
one of the data device.  Both of these use the same format for their
root, so this typo was harmless.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fef838cc

dm persistent data: remove redundant value_size arg from value_ptr · a3aefb39

由 Joe Thornber 提交于 3月 28, 2012

Now that the value_size is held within every node of the btrees we can
remove this argument from value_ptr().

For the last few months a BUG_ON has been checking this argument is
the same as that held in the node.  No issues were reported.  So this
is a safe change.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a3aefb39

dm mpath: detect invalid map_context · 466891f9

由 Jun'ichi Nomura 提交于 3月 28, 2012

The map_context pointer should always be set. However, we have reports
that upon requeuing it is not set correctly.  So add set and clear
functions with a BUG_ON() to track the issue properly.
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Acked-by: NHannes Reinecke <hare@suse.de>
Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NDave Wysochanski <dwysocha@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

466891f9

dm: clear bi_end_io on remapping failure · 4d7b38b7

由 Hannes Reinecke 提交于 3月 28, 2012

As a precaution, set bi_end_io to NULL when failing to remap.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4d7b38b7

dm table: simplify call to free_devices · 574ce07e

由 Hannes Reinecke 提交于 3月 28, 2012

free_devices in dm_table.c already uses list_for_each(), so we don't
need to check if the list is empty.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

574ce07e

dm thin: correct comments · fe878f34

由 Joe Thornber 提交于 3月 28, 2012

Remove documentation for unimplemented 'trim' message.

I'd planned a 'trim' target message for shrinking thin devices, but
this is better handled via the discard ioctl.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fe878f34

dm raid: no longer experimental · 035220b3

由 Alasdair G Kergon 提交于 3月 28, 2012

The dm raid module (using md) is becoming the preferred way of creating long-lived
mirrors through userspace LVM so remove the EXPERIMENTAL tag.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

035220b3

dm uevent: no longer experimental · e0b215da

由 Alasdair G Kergon 提交于 3月 28, 2012

Drop EXPERIMENTAL tag from dm-uevent.

It's not changed for a while and some userspace tools are relying upon it.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

e0b215da

dm persistent data: fix btree rebalancing after remove · b0988900

由 Joe Thornber 提交于 3月 28, 2012

When we remove an entry from a node we sometimes rebalance with it's
two neighbours.  This wasn't being done correctly; in some cases
entries have to move all the way from the right neighbour to the left
neighbour, or vice versa.  This patch pretty much re-writes the
balancing code to fix it.

This code is barely used currently; only when you delete a thin
device, and then only if you have hundreds of them in the same pool.
Once we have discard support, which removes mappings, this will be used
much more heavily.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b0988900

dm thin: fix stacked bi_next usage · 6f94a4c4

由 Joe Thornber 提交于 3月 28, 2012

Avoid using the bi_next field for the holder of a cell when deferring
bios because a stacked device below might change it.  Store the
holder in a new field in struct cell instead.

When a cell is created, the bio that triggered creation (the holder) was
added to the same bio list as subsequent bios.  In some cases we pass
this holder bio directly to devices underneath.  If those devices use
the bi_next field there will be trouble...

This also simplifies some code that had to work out which bio was the
holder.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6f94a4c4

dm crypt: add missing error handling · 72c6e7af

由 Mikulas Patocka 提交于 3月 28, 2012

Always set io->error to -EIO when an error is detected in dm-crypt.

There were cases where an error code would be set only if we finish
processing the last sector. If there were other encryption operations in
flight, the error would be ignored and bio would be returned with
success as if no error happened.

This bug is present in kcryptd_crypt_write_convert, kcryptd_crypt_read_convert
and kcryptd_async_done.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Reviewed-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

72c6e7af

dm crypt: fix mempool deadlock · aeb2deae

由 Mikulas Patocka 提交于 3月 28, 2012

This patch fixes a possible deadlock in dm-crypt's mempool use.

Currently, dm-crypt reserves a mempool of MIN_BIO_PAGES reserved pages.
It allocates first MIN_BIO_PAGES with non-failing allocation (the allocation
cannot fail and waits until the mempool is refilled). Further pages are
allocated with different gfp flags that allow failing.

Because allocations may be done in parallel, this code can deadlock. Example:
There are two processes, each tries to allocate MIN_BIO_PAGES and the processes
run simultaneously.
It may end up in a situation where each process allocates (MIN_BIO_PAGES / 2)
pages. The mempool is exhausted. Each process waits for more pages to be freed
to the mempool, which never happens.

To avoid this deadlock scenario, this patch changes the code so that only
the first page is allocated with non-failing gfp mask. Allocation of further
pages may fail.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

aeb2deae

dm exception store: fix init error path · aadbe266

由 Andrei Warkentin 提交于 3月 28, 2012

Call the correct exit function on failure in dm_exception_store_init.
Signed-off-by: NAndrei Warkentin <andrey.warkentin@gmail.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

aadbe266

20 3月, 2012 2 次提交
- C
  dm: remove the second argument of k[un]map_atomic() · c2e022cb
  由 Cong Wang 提交于 11月 28, 2011
```
Acked-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
```
  c2e022cb
- C
  md: remove the second argument of k[un]map_atomic() · b2f46e68
  由 Cong Wang 提交于 11月 28, 2011
```
Acked-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NCong Wang <amwang@redhat.com>
```
  b2f46e68
19 3月, 2012 17 次提交

md: Add judgement bb->unacked_exist in function md_ack_all_badblocks(). · ecb178bb

由 majianpeng 提交于 3月 19, 2012

If there are no unacked bad blocks, then there is no point searching
for them to acknowledge them.
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

ecb178bb

md: fix clearing of the 'changed' flags for the bad blocks list. · d0962936

由 NeilBrown 提交于 3月 19, 2012

In super_1_sync (the first hunk) we need to clear 'changed' before
checking read_seqretry(), otherwise we might race with other code
adding a bad block and so won't retry later.

In md_update_sb (the second hunk), in the case where there is no
metadata (neither persistent nor external), we treat any bad blocks as
an error.  However we need to clear the 'changed' flag before calling
md_ack_all_badblocks, else it won't do anything.

This patch is suitable for -stable release 3.0 and later.

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

d0962936

md/bitmap: discard CHUNK_BLOCK_SHIFT macro · 61a0d80c

由 NeilBrown 提交于 3月 19, 2012

Be redefining ->chunkshift as the shift from sectors to chunks rather
than bytes to chunks, we can just use "bitmap->chunkshift" which is
shorter than the macro call, and less indirect.
Signed-off-by: NNeilBrown <neilb@suse.de>

61a0d80c

md/bitmap: remove unnecessary indirection when allocating. · 792a1d4b

由 NeilBrown 提交于 3月 19, 2012

These funcitons don't add anything useful except possibly the trace
points, and I don't think they are worth the extra indirection.
So remove them.
Signed-off-by: NNeilBrown <neilb@suse.de>

792a1d4b

md/bitmap: remove some pointless locking. · 5a6c824e

由 NeilBrown 提交于 3月 19, 2012

There is nothing gained by holding a lock while we check if a pointer
is NULL or not.  If there could be a race, then it could become NULL
immediately after the unlock - but there is no race here.

So just remove the locking.
Signed-off-by: NNeilBrown <neilb@suse.de>

5a6c824e

md/bitmap: change a 'goto' to a normal 'if' construct. · 278c1ca2

由 NeilBrown 提交于 3月 19, 2012

The use of a goto makes the control flow more obscure here.

So make it a normal:
  if (x) {
     Y;
  }

No functional change.
Signed-off-by: NNeilBrown <neilb@suse.de>

278c1ca2

md/bitmap: move printing of bitmap status to bitmap.c · 57148964

由 NeilBrown 提交于 3月 19, 2012

The part of /proc/mdstat which describes the bitmap should really
be generated by code in bitmap.c.  So move it there.
Signed-off-by: NNeilBrown <neilb@suse.de>

57148964

N
md/bitmap: remove some unused noise from bitmap.h · 4ba97dff
由 NeilBrown 提交于 3月 19, 2012
```
Signed-off-by: NNeilBrown <neilb@suse.de>
```
4ba97dff

md/raid10 - support resizing some RAID10 arrays. · 006a09a0

由 NeilBrown 提交于 3月 19, 2012

'resizing' an array in this context means making use of extra
space that has become available in component devices, not adding new
devices.
It also includes shrinking the array to take up less space of
component devices.

This is not supported for array with a 'far' layout.  However
for 'near' and 'offset' layout arrays, adding and removing space at
the end of the devices is easy to support, and this patch provides
that support.
Signed-off-by: NNeilBrown <neilb@suse.de>

006a09a0

md/raid1: handle merge_bvec_fn in member devices. · 6b740b8d

由 NeilBrown 提交于 3月 19, 2012

Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.

So create a raid1 merge_bvec_fn to check that function in children
as well.

This introduces a small problem.  There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
device added between these could end up getting a request which
violates its merge_bvec_fn.

Currently the best we can do is synchronize_sched().  This will work
providing no preemption happens.  If there is is preemption, we just
have to hope that new devices are largely consistent with old devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

6b740b8d

md/raid10: handle merge_bvec_fn in member devices. · 050b6615

由 NeilBrown 提交于 3月 19, 2012

Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.

So enhance the raid10 merge_bvec_fn to check that function in children
as well.

This introduces a small problem.  There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
device added between these could end up getting a request which
violates its merge_bvec_fn.

Currently the best we can do is synchronize_sched().  This will work
providing no preemption happens.  If there is preemption, we just
have to hope that new devices are largely consistent with old devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

050b6615

md: add proper merge_bvec handling to RAID0 and Linear. · ba13da47

由 NeilBrown 提交于 3月 19, 2012

These personalities currently set a max request size of one page
when any member device has a merge_bvec_fn because they don't
bother to call that function.

This causes extra works in splitting and combining requests.

So make the extra effort to call the merge_bvec_fn when it exists
so that we end up with larger requests out the bottom.
Signed-off-by: NNeilBrown <neilb@suse.de>

ba13da47

md: tidy up rdev_for_each usage. · dafb20fa

由 NeilBrown 提交于 3月 19, 2012

md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
mddev.  However it uses the 'safe' version of list_for_each_entry,
and so requires the extra variable, but doesn't include 'safe' in the
name, which is useful documentation.

Consequently some places use this safe version without needing it, and
many use an explicity list_for_each entry.

So:
 - rename rdev_for_each to rdev_for_each_safe
 - create a new rdev_for_each which uses the plain
   list_for_each_entry,
 - use the 'safe' version only where needed, and convert all other
   list_for_each_entry calls to use rdev_for_each.
Signed-off-by: NNeilBrown <neilb@suse.de>

dafb20fa

md/raid1,raid10: avoid deadlock during resync/recovery. · d6b42dcb

由 NeilBrown 提交于 3月 19, 2012

If RAID1 or RAID10 is used under LVM or some other stacking
block device, it is possible to enter a deadlock during
resync or recovery.
This can happen if the upper level block device creates
two requests to the RAID1 or RAID10.  The first request gets
processed, blocks recovery and queue requests for underlying
requests in current->bio_list.  A resync request then starts
which will wait for those requests and block new IO.

But then the second request to the RAID1/10 will be attempted
and it cannot progress until the resync request completes,
which cannot progress until the underlying device requests complete,
which are on a queue behind that second request.

So allow that second request to proceed even though there is
a resync request about to start.

This is suitable for any -stable kernel.

Cc: stable@vger.kernel.org
Reported-by: NRay Morris <support@bettercgi.com>
Tested-by: NRay Morris <support@bettercgi.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

d6b42dcb

md/bitmap: ensure to load bitmap when creating via sysfs. · 4474ca42

由 NeilBrown 提交于 3月 19, 2012

When commit 69e51b44 (md/bitmap:  separate out loading a bitmap...)
created bitmap_load, it missed calling it after bitmap_create when a
bitmap is created through the sysfs interface.
So if a bitmap is added this way, we don't allocate memory properly
and can crash.

This is suitable for any -stable release since 2.6.35.
Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

4474ca42

md: don't set md arrays to readonly on shutdown. · c744a65c

由 NeilBrown 提交于 3月 19, 2012

It seems that with recent kernel, writeback can still be happening
while shutdown is happening, and consequently data can be written
after the md reboot notifier switches all arrays to read-only.
This causes a BUG.

So don't switch them to read-only - just mark them clean and
set 'safemode' to '2' which mean that immediately after any
write the array will be switch back to 'clean'.

This could result in the shutdown happening when array is marked
dirty, thus forcing a resync on reboot.  However if you reboot
without performing a "sync" first, you get to keep both halves.

This is suitable for any stable kernel (though there might be some
conflicts with obvious fixes in earlier kernels).

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

c744a65c

md: allow re-add to failed arrays. · dc10c643

由 NeilBrown 提交于 3月 19, 2012

When an array is failed (some data inaccessible) then there is no
point attempting to add a spare as it could not possibly be recovered.

However that may be value in re-adding a recently removed device.
e.g. if there is a write-intent-bitmap and it is clear, then access
to the data could be restored by this action.

So don't reject a re-add to a failed array for RAID10 and RAID5 (the
only arrays  types that check for a failed array).
Signed-off-by: NNeilBrown <neilb@suse.de>

dc10c643

13 3月, 2012 4 次提交

M
md/raid5: use atomic_dec_return() instead of atomic_dec() and atomic_read(). · 41fe75f6
由 majianpeng 提交于 3月 13, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
41fe75f6

md/raid5: removed unused 'added_devices' variable. · 9d4c7d87

由 NeilBrown 提交于 3月 13, 2012

commit 908f4fbd removed the last user of this variable,
so we should discard it completely.
Signed-off-by: NNeilBrown <neilb@suse.de>

9d4c7d87

md/raid10: remove unnecessary smp_mb() from end_sync_write · 547414d1

由 NeilBrown 提交于 3月 13, 2012

Recent commit 4ca40c2c (md/raid10: Allow replacement device ...)
added an smp_mb in end_sync_write.
This was to close a possible race with raid10_remove_disk.
However there is no such race as it is never attempted to remove a
disk while resync (or recovery) is happening.
so the smp_mb is just noise.
Signed-off-by: NNeilBrown <neilb@suse.de>

547414d1

N
md/raid5: make sure reshape_position is cleared on error path. · 1e3fa9bd
由 NeilBrown 提交于 3月 13, 2012
```
Leaving a valid reshape_position value in place could be confusing.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
1e3fa9bd