提交 · 2c810cddc44d6f95cef75df3f07fc0850ff92417 · openeuler / Kernel

21 5月, 2012 2 次提交

md: allow a reshape operation to be reversed. · 2c810cdd

由 NeilBrown 提交于 5月 21, 2012

Currently a reshape operation always progresses from the start
of the array to the end unless the number of devices is being
reduced, in which case it progressed in the opposite direction.

To reverse a partial reshape which changes the number of devices
you can stop the array and re-assemble with the raid-disks numbers
reversed and it will undo.

However for a reshape that does not change the number of devices
it is not possible to reverse the reshape in the middle - you have to
wait until it completes.

So add a 'reshape_direction' attribute with is either 'forwards' or
'backwards' and can be explicitly set when delta_disks is zero.

This will become more important when we allow the data_offset to
change in a reshape.  Then the explicit statement of what direction is
being used will be more useful.

This can be enabled in raid5 trivially as it already supports
reverse reshape and just needs to use a different trigger to request it.
Signed-off-by: NNeilBrown <neilb@suse.de>

2c810cdd

md: using GFP_NOIO to allocate bio for flush request · b5e1b8ce

由 Shaohua Li 提交于 5月 21, 2012

A flush request is usually issued in transaction commit code path, so
using GFP_KERNEL to allocate memory for flush request bio falls into
the classic deadlock issue.

This is suitable for any -stable kernel to which it applies as it
avoids a possible deadlock.

Cc: stable@vger.kernel.org
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b5e1b8ce

19 5月, 2012 1 次提交

md/raid10: fix transcription error in calc_sectors conversion. · b0d634d5

由 NeilBrown 提交于 5月 19, 2012

The old code was
		sector_div(stride, fc);
the new code was
		sector_dir(size, conf->near_copies);

'size' is right (the stride various wasn't really needed), but
'fc' means 'far_copies', and that is an important difference.

Signed-off-by: NeilBrown <neilb@suse.de>

b0d634d5

17 5月, 2012 2 次提交

MD: Add del_timer_sync to mddev_suspend (fix nasty panic) · 0d9f4f13

由 Jonathan Brassow 提交于 5月 16, 2012

Use del_timer_sync to remove timer before mddev_suspend finishes.

We don't want a timer going off after an mddev_suspend is called. This is
especially true with device-mapper, since it can call the destructor function
immediately following a suspend. This results in the removal (kfree) of the
structures upon which the timer depends - resulting in a very ugly panic.
Therefore, we add a del_timer_sync to mddev_suspend to prevent this.

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

0d9f4f13

md/raid10: set dev_sectors properly when resizing devices in array. · 6508fdbf

由 NeilBrown 提交于 5月 17, 2012

raid10 stores dev_sectors in 'conf' separately from the one in
'mddev' because it can have a very significant effect on block
addressing and so need to be updated carefully.

However raid10_resize isn't updating it at all!

To update it correctly, we need to make sure it is a proper
multiple of the chunksize taking various details of the layout
in to account.
This calculation is currently done in setup_conf.   So split it
out from there and call it from raid10_resize as well.
Then set conf->dev_sectors properly.
Signed-off-by: NNeilBrown <neilb@suse.de>

6508fdbf

04 5月, 2012 1 次提交

md/bitmap: fix calculation of 'chunks' - missing shift. · b16b1b6c

由 NeilBrown 提交于 5月 04, 2012

commit 61a0d80c "md/bitmap: discard CHUNK_BLOCK_SHIFT macro"
replaced CHUNK_BLOCK_RATIO() by the same text that was
replacing CHUNK_BLOCK_SHIFT() - which is clearly wrong.

The result is that 'chunks' is often too small by 1,
which can sometimes result in a crash (not sure how).

So use the correct replacement, and get rid of CHUNK_BLOCK_RATIO
which is no longe used.
Reported-by: NKarl Newman <siliconfiend@gmail.com>
Tested-by: NKarl Newman <siliconfiend@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b16b1b6c

24 4月, 2012 3 次提交

md: fix possible corruption of array metadata on shutdown. · 30b8aa91

由 NeilBrown 提交于 4月 24, 2012

commit c744a65c
  md: don't set md arrays to readonly on shutdown.

removed the possibility of a 'BUG' when data is written to an array
that has just been switched to read-only, but also introduced the
possibility that the array metadata could be corrupted.

If, when md_notify_reboot gets the mddev lock, the array is
in a state where it is assembled but hasn't been started (as can
happen if the personality module is not available, or in other unusual
situations), then incorrect metadata will be written out making it
impossible to re-assemble the array.

So only call __md_stop_writes() if the array has actually been
activated.

This patch is needed for any stable kernel which has had the above
commit applied.

Cc: stable@vger.kernel.org
Reported-by: NChristoph Nelles <evilazrael@evilazrael.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

30b8aa91

md: don't call ->add_disk unless there is good reason. · ed209584

由 NeilBrown 提交于 4月 24, 2012

Commit 7bfec5f3

   md/raid5: If there is a spare and a want_replacement device, start replacement.

cause md_check_recovery to call ->add_disk much more often.
Instead of only when the array is degraded, it is now called whenever
md_check_recovery finds anything useful to do, which includes
updating the metadata for clean<->dirty transition.
This causes unnecessary work, and causes info messages from ->add_disk
to be reported much too often.

So refine md_check_recovery to only do any actual recovery checking
(including ->add_disk) if MD_RECOVERY_NEEDED is set.

This fix is suitable for 3.3.y:

Cc: stable@vger.kernel.org
Reported-by: NJan Ceuleers <jan.ceuleers@computer.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

ed209584

DM RAID: Use safe version of rdev_for_each · a9ad8526

由 Jonathan Brassow 提交于 4月 24, 2012

Fix segfault caused by using rdev_for_each instead of rdev_for_each_safe

Commit dafb20fa mistakenly replaced a safe
iterator with an unsafe one when making some macro changes.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a9ad8526

12 4月, 2012 3 次提交

md/bitmap: prevent bitmap_daemon_work running while initialising bitmap · afbaa90b

由 NeilBrown 提交于 4月 12, 2012

If a bitmap is added while the array is active, it is possible
for bitmap_daemon_work to run while the bitmap is being
initialised.
This is particularly a problem if bitmap_daemon_work sees
bitmap->filemap as non-NULL before it has been filled in properly.
So hold bitmap_info.mutex while filling in ->filemap
to prevent problems.

This patch is suitable for any -stable kernel, though it might not
apply cleanly before about 3.1.

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

afbaa90b

md/raid1,raid10: Fix calculation of 'vcnt' when processing error recovery. · f4380a91

由 majianpeng 提交于 4月 12, 2012

If r1bio->sectors % 8 != 0,then the memcmp and a later
memcpy will omit the last bio_vec.

This is suitable for any stable kernel since 3.1 when bad-block
management was introduced.

Cc: stable@vger.kernel.org
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f4380a91

MD: Bitmap version cleanup. · 9e41dd35

由 Andrei Warkentin 提交于 4月 12, 2012

bitmap_new_disk_sb() would still create V3 bitmap superblock
with host-endian layout.

Perhaps I'm confused, but shouldn't bitmap_new_disk_sb() be
creating a V4 bitmap superblock instead, that is portable,
as per comment in bitmap.h?
Signed-off-by: NAndrei Warkentin <andrey.warkentin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9e41dd35

03 4月, 2012 5 次提交

md/raid1,raid10: don't compare excess byte during consistency check. · 5020ad7d

由 NeilBrown 提交于 4月 02, 2012

When comparing two pages read from different legs of a mirror, only
compare the bytes that were read, not the whole page.

In most cases we read a whole page, but in some cases with
bad blocks or odd sizes devices we might read fewer than that.

This bug has been present "forever" but at worst it might cause
a report of two many mismatches and generate a little bit
extra resync IO, so there is no need to back-port to -stable
kernels.
Reported-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

5020ad7d

md/raid5: Fix a bug about judging if the operation is syncing or replacing · c6d2e084

由 majianpeng 提交于 4月 02, 2012

When create a raid5 using assume-clean and echo check or repair to
sync_action.Then component disks did not operated IO but the raid
check/resync faster than normal.
Because the judgement in function analyse_stripe():
		if (do_recovery ||
		    sh->sector >= conf->mddev->recovery_cp)
			s->syncing = 1;
		else
			s->replacing = 1;
When check or repair,the recovery_cp == MaxSectore,so syncing equal zero
not one.

This bug was introduced by commit 9a3e1101
    md/raid5:  detect and handle replacements during recovery.
so this patch is suitable for 3.3-stable.

Cc: stable@vger.kernel.org
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c6d2e084

md/raid1:Remove unnecessary rcu_dereference(conf->mirrors[i].rdev). · a42f9d83

由 majianpeng 提交于 4月 02, 2012

Because rde->nr_pending > 0,so can not remove this disk.
And in any case, we aren't holding rcu_read_lock()
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a42f9d83

md: Avoid OOPS when reshaping raid1 to raid0 · 24b961f8

由 Jes Sorensen 提交于 4月 01, 2012

raid1 arrays do not have the notion of chunk size. Calculate the
largest chunk sector size we can use to avoid a divide by zero OOPS
when aligning the size of the new array to the chunk size.
Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

24b961f8

md/raid5: fix handling of bad blocks during recovery. · 18b9837e

由 NeilBrown 提交于 4月 01, 2012

1/ We can only treat a known-bad-block like a read-error if we
   have the data that belongs in that block.  So fix that test.

2/ If we cannot recovery a stripe due to insufficient data,
   don't tell "md_done_sync" that the sync failed unless we really
   did fail something.  If we successfully record bad blocks,
   that is success.
Reported-by: N"majianpeng" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

18b9837e

02 4月, 2012 3 次提交
- M
  md/raid1: If md_integrity_register() failed,run() must free the mem · 5220ea1e
  由 majianpeng 提交于 4月 02, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  5220ea1e
- M
  md/raid0: If md_integrity_register() fails, raid0_run() must free the mem. · 0366ef84
  由 majianpeng 提交于 4月 02, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  0366ef84
- M
  md/linear: If md_integrity_register() fails, linear_run() must free the mem. · 98d5561b
  由 majianpeng 提交于 4月 02, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  98d5561b
29 3月, 2012 20 次提交

dm: add verity target · a4ffc152

由 Mikulas Patocka 提交于 3月 28, 2012

This device-mapper target creates a read-only device that transparently
validates the data on one underlying device against a pre-generated tree
of cryptographic checksums stored on a second device.

Two checksum device formats are supported: version 0 which is already
shipping in Chromium OS and version 1 which incorporates some
improvements.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Signed-off-by: NWill Drewry <wad@chromium.org>
Signed-off-by: NElly Jones <ellyjones@chromium.org>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a4ffc152

dm bufio: prefetch · a66cc28f

由 Mikulas Patocka 提交于 3月 28, 2012

This patch introduces a new function dm_bufio_prefetch. It prefetches
the specified range of blocks into dm-bufio cache without waiting
for i/o completion.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a66cc28f

dm thin: add pool target flags to control discard · 67e2e2b2

由 Joe Thornber 提交于 3月 28, 2012

Add dm thin target arguments to control discard support.

ignore_discard: Disables discard support

no_discard_passdown: Don't pass discards down to the underlying data
device, but just remove the mapping within the thin provisioning target.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

67e2e2b2

dm thin: support discards · 104655fd

由 Joe Thornber 提交于 3月 28, 2012

Support discards in the thin target.

On discard the corresponding mapping(s) are removed from the thin
device.  If the associated block(s) are no longer shared the discard
is passed to the underlying device.

All bios other than discards now have an associated deferred_entry
that is saved to the 'all_io_entry' in endio_hook.  When non-discard
IO completes and associated mappings are quiesced any discards that
were deferred, via ds_add_work() in process_discard(), will be queued
for processing by the worker thread.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

drivers/md/dm-thin.c |  173 ++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/md/dm-thin.c |  172 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 158 insertions(+), 14 deletions(-)

104655fd

dm thin: prepare to support discard · eb2aa48d

由 Joe Thornber 提交于 3月 28, 2012

This patch contains the ground work needed for dm-thin to support discard.

  - Adds endio function that replaces shared_read_endio.

  - Introduce an explicit 'quiesced' flag into the new_mapping structure.
    Before, this was implicitly indicated by m->list being empty.

  - The map_info->ptr remains constant for the duration of a bio's trip
    through the thin target.  Make it easier to reason about it.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

eb2aa48d

dm thin: use dm_target_offset · 6efd6e83

由 Alasdair G Kergon 提交于 3月 28, 2012

Use dm_target_offset wrapper instead of referencing the awkward ti->begin
explicitly.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6efd6e83

dm thin: support read only external snapshot origins · 2dd9c257

由 Joe Thornber 提交于 3月 28, 2012

Support the use of an external _read only_ device as an origin for a thin
device.

Any read to an unprovisioned area of the thin device will be passed
through to the origin.  Writes trigger allocation of new blocks as
usual.

One possible use case for this would be VM hosts that want to run
guests on thinly-provisioned volumes but have the base image on another
device (possibly shared between many VMs).
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2dd9c257

dm thin: relax hard limit on the maximum size of a metadata device · c4a69ecd

由 Mike Snitzer 提交于 3月 28, 2012

The thin metadata format can only make use of a device that is <=
THIN_METADATA_MAX_SECTORS (currently 15.9375 GB).  Therefore, there is no
practical benefit to using a larger device.

However, it may be that other factors impose a certain granularity for
the space that is allocated to a device (E.g. lvm2 can impose a coarse
granularity through the use of large, >= 1 GB, physical extents).

Rather than reject a larger metadata device, during thin-pool device
construction, switch to allowing it but issue a warning if a device
larger than THIN_METADATA_MAX_SECTORS_WARNING (16 GB) is
provided.  Any space over 15.9375 GB will not be used.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c4a69ecd

dm persistent data: remove space map ref_count entries if redundant · 71fd5ae2

由 Joe Thornber 提交于 3月 28, 2012

Save space by removing entries from the space map ref_count tree if
they're no longer needed.

Ref counts are stored in two places: a bitmap if the ref_count is
below 3, or a btree of uint32_t if 3 or above.

When a ref_count that was above 3 drops below we can remove it from
the tree and save some metadata space.  This removal was commented out
before because I was unsure why this was causing under-populated btree
nodes.  Earlier patches have fixed this issue.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

71fd5ae2

dm thin: commit outstanding data every second · 905e51b3

由 Joe Thornber 提交于 3月 28, 2012

Commit unwritten data every second to prevent too much building up.

Released blocks don't become available until after the next commit
(for crash resilience).  Prior to this patch commits were only
triggered by a message to the target or a REQ_{FLUSH,FUA} bio.  This
allowed far too big a position to build up.

The interval is hard-coded to 1 second.  This is a sensible setting.
I'm not making this user configurable, since there isn't much to be
gained by tweaking this - and a lot lost by setting it far too high.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

905e51b3

dm: reject trailing characters in sccanf input · 31998ef1

由 Mikulas Patocka 提交于 3月 28, 2012

Device mapper uses sscanf to convert arguments to numbers. The problem is that
the way we use it ignores additional unmatched characters in the scanned string.

For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
but also it will match number with some garbage appended, like "123abc".

As a result, device mapper accepts garbage after some numbers. For example
the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
will pass without an error.

This patch fixes all sscanf uses in device mapper. It appends "%c" with
a pointer to a dummy character variable to every sscanf statement.

The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
only if string is a null-terminated number (optionally preceded by some
whitespace characters). If there is some character appended after the number,
sscanf matches "%c", writes the character to the dummy variable and returns 2.
We check the return value for 1 and consequently reject numbers with some
garbage appended.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

31998ef1

dm raid: handle failed devices during start up · 0447568f

由 Jonathan E Brassow 提交于 3月 28, 2012

The dm-raid code currently fails to create a RAID array if any of the
superblocks cannot be read.  This was an oversight as there is already
code to handle this case if the values ('- -') were provided for the
failed array position.

With this patch, if a superblock cannot be read, the array position's
fields are initialized as though '- -' was set in the table.  That is,
the device is failed and the position should not be used, but if there
is sufficient redundancy, the array should still be activated.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0447568f

dm thin metadata: pass correct space map to dm_sm_root_size · fef838cc

由 Joe Thornber 提交于 3月 28, 2012

Fix a harmless typo.

The root is a chunk of data that gets written to the superblock.  This
data is used to recreate the space map when opening a metadata area.
We have two space maps; one tracking space on the metadata device and
one of the data device.  Both of these use the same format for their
root, so this typo was harmless.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fef838cc

dm persistent data: remove redundant value_size arg from value_ptr · a3aefb39

由 Joe Thornber 提交于 3月 28, 2012

Now that the value_size is held within every node of the btrees we can
remove this argument from value_ptr().

For the last few months a BUG_ON has been checking this argument is
the same as that held in the node.  No issues were reported.  So this
is a safe change.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a3aefb39

dm mpath: detect invalid map_context · 466891f9

由 Jun'ichi Nomura 提交于 3月 28, 2012

The map_context pointer should always be set. However, we have reports
that upon requeuing it is not set correctly.  So add set and clear
functions with a BUG_ON() to track the issue properly.
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Acked-by: NHannes Reinecke <hare@suse.de>
Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NDave Wysochanski <dwysocha@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

466891f9

dm: clear bi_end_io on remapping failure · 4d7b38b7

由 Hannes Reinecke 提交于 3月 28, 2012

As a precaution, set bi_end_io to NULL when failing to remap.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4d7b38b7

dm table: simplify call to free_devices · 574ce07e

由 Hannes Reinecke 提交于 3月 28, 2012

free_devices in dm_table.c already uses list_for_each(), so we don't
need to check if the list is empty.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

574ce07e

dm thin: correct comments · fe878f34

由 Joe Thornber 提交于 3月 28, 2012

Remove documentation for unimplemented 'trim' message.

I'd planned a 'trim' target message for shrinking thin devices, but
this is better handled via the discard ioctl.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fe878f34

dm raid: no longer experimental · 035220b3

由 Alasdair G Kergon 提交于 3月 28, 2012

The dm raid module (using md) is becoming the preferred way of creating long-lived
mirrors through userspace LVM so remove the EXPERIMENTAL tag.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

035220b3

dm uevent: no longer experimental · e0b215da

由 Alasdair G Kergon 提交于 3月 28, 2012

Drop EXPERIMENTAL tag from dm-uevent.

It's not changed for a while and some userspace tools are relying upon it.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

e0b215da

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功