提交 · bf07bb7d5be813630d3530be274b3324f85e310c · openeuler / raspberrypi-kernel

22 5月, 2012 8 次提交

md/bitmap: disentangle two different 'pending' flags. · bf07bb7d

由 NeilBrown 提交于 5月 22, 2012

There are two different 'pending' concepts in the handling of the
write intent bitmap.

Firstly, a 'page' from the bitmap (which container PAGE_SIZE*8 bits)
may have changes (bits cleared) that should be written in due course.
There is no hurry for these and the page will transition from
PENDING to NEEDWRITE and will then be written, though if it ever
becomes DIRTY it will be written much sooner and PENDING will be
cleared.

Secondly, a page of counters - which contains PAGE_SIZE/2 counters, one
for each bit, can usefully have a 'pending' flag which indicates if
any of the counters are low (2 or 1) and ready to be processed by
bitmap_daemon_work().  If this flag is clear we can skip the whole
page.

These two concepts are currently combined in the bitmap-file flag.
This causes a tighter connection between the counters and the bitmap
file than I would like - as I want to add some flexibility to the
bitmap file.

So introduce a new flag with the page-of-counters, and rewrite
bitmap_daemon_work() so that it handles the two different 'pending'
concepts separately.

This also allows us to clear BITMAP_PAGE_PENDING when we write out
a dirty page, which may occasionally reduce the number of times we
write a page.
Signed-off-by: NNeilBrown <neilb@suse.de>

bf07bb7d

raid5: support sync request · bc0934f0

由 Shaohua Li 提交于 5月 22, 2012

REQ_SYNC is ignored in current raid5 code. Block layer does use it to do
policy,
for example ioscheduler. This patch adds it.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

bc0934f0

raid5: remove unused variables · cceeca43

由 Shaohua Li 提交于 5月 22, 2012

The two variables are useless.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

cceeca43

md/raid10: Fix memleak in r10buf_pool_alloc · 5fdd2cf8

由 majianpeng 提交于 5月 22, 2012

If the allocation of rep1_bio fails, we currently don't free the 'bio'
of the same dev.

Reported by kmemleak.
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

5fdd2cf8

md/raid1: allow fix_read_error to read from recovering device. · da8840a7

由 majianpeng 提交于 5月 22, 2012

When attempting to fix a read error, it is acceptable to read from a
device that is recovering, provided the recovery has got past the
place we are reading from.  This makes the test for "can we read from
here" the same as the test in read_balance.
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

da8840a7

md: move freeing of badblocks.page into md_rdev_clear · 4fa2f327

由 NeilBrown 提交于 5月 22, 2012

This ensures that it is always freed - there were case where
we failed to free the page.
Reported-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

4fa2f327

md: dm-raid should call helper function to clear rdev. · 545c8795

由 NeilBrown 提交于 5月 22, 2012

dm-raid currently open-codes the freeing of some members of
and rdev.  It is more maintainable to have it call common code
from md.c which does this for all call-sites.

So remove free_disk_sb to md_rdev_clear, export it, and use it in
dm-raid.c
Signed-off-by: NNeilBrown <neilb@suse.de>

545c8795

md/raid10: add reshape support · 3ea7daa5

由 NeilBrown 提交于 5月 22, 2012

A 'near' or 'offset' lay RAID10 array can be reshaped to a different
'near' or 'offset' layout, a different chunk size, and a different
number of devices.
However the number of copies cannot change.

Unlike RAID5/6, we do not support having user-space backup data that
is being relocated during a 'critical section'.  Rather, the
data_offset of each device must change so that when writing any block
to a new location, it will not over-write any data that is still
'live'.

This means that RAID10 reshape is not supportable on v0.90 metadata.

The different between the old data_offset and the new_offset must be
at least the larger of the chunksize multiplied by offset copies of
each of the old and new layout. (for 'near' mode, offset_copies == 1).

A larger difference of around 64M seems useful for in-place reshapes
as more data can be moved between metadata updates.
Very large differences (e.g. 512M) seem to slow the process down due
to lots of long seeks (on oldish consumer graded devices at least).

Metadata needs to be updated whenever the place we are about to write
to is considered - by the current metadata - to still contain data in
the old layout.

[unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>]
Signed-off-by: NNeilBrown <neilb@suse.de>

3ea7daa5

21 5月, 2012 10 次提交

md/raid10: split out interpretation of layout to separate function. · deb200d0

由 NeilBrown 提交于 5月 21, 2012

We will soon be interpreting the layout (and chunksize etc) from
multiple places to support reshape.  So split it out into separate
function.
Signed-off-by: NNeilBrown <neilb@suse.de>

deb200d0

md/raid10: Introduce 'prev' geometry to support reshape. · f8c9e74f

由 NeilBrown 提交于 5月 21, 2012

When RAID10 supports reshape it will need a 'previous' and a 'current'
geometry, so introduce that here.
Use the 'prev' geometry when before the reshape_position, and the
current 'geo' when beyond it.  At other times, use both as
appropriate.

For now, both are identical (And reshape_position is never set).

When we use the 'prev' geometry, we must use the old data_offset.
When we use the current (And a reshape is happening) we must use
the new_data_offset.
Signed-off-by: NNeilBrown <neilb@suse.de>

f8c9e74f

md: use resync_max_sectors for reshape as well as resync. · c804cdec

由 NeilBrown 提交于 5月 21, 2012

Some resync type operations need to act on the address space of the
device, others on the address space of the array.

This only affects RAID10, so it sets resync_max_sectors to the array
size (it defaults to the device size), and that is currently used for
resync only.  However reshape of a RAID10 must be done against the
array size, not device size, so change code to use resync_max_sectors
for both the resync and the reshape cases.
This does not affect RAID5 or RAID1, just RAID10.
Signed-off-by: NNeilBrown <neilb@suse.de>

c804cdec

md: teach sync_page_io about new_data_offset. · 1fdd6fc9

由 NeilBrown 提交于 5月 21, 2012

Some code in raid1 and raid10 use sync_page_io to
read/write pages when responding to read errors.
As we will shortly support changing data_offset for
raid10, this function must understand new_data_offset.

So add that understanding.
Signed-off-by: NNeilBrown <neilb@suse.de>

1fdd6fc9

md/raid10: collect some geometry fields into a dedicated structure. · 5cf00fcd

由 NeilBrown 提交于 5月 21, 2012

We will shortly be adding reshape support for RAID10 which will
require it having 2 concurrent geometries (before and after).
To make that easier, collect most geometry fields into 'struct geom'
and access them from there.  Then we will more easily be able to add
a second set of fields.

Note that 'copies' is not in this struct and so cannot be changed.
There is little need to change this number and doing so is a lot
more difficult as it requires reallocating more things.
So leave it out for now.
Signed-off-by: NNeilBrown <neilb@suse.de>

5cf00fcd

md/raid5: allow for change in data_offset while managing a reshape. · b5254dd5

由 NeilBrown 提交于 5月 21, 2012

The important issue here is incorporating the different in data_offset
into calculations concerning when we might need to over-write data
that is still thought to be valid.

To this end we find the minimum offset difference across all devices
and add that where appropriate.
Signed-off-by: NNeilBrown <neilb@suse.de>

b5254dd5

md/raid5: Use correct data_offset for all IO. · 05616be5

由 NeilBrown 提交于 5月 21, 2012

As there can now be two different data_offsets - an 'old' and
a 'new' - we need to carefully choose between them.
Signed-off-by: NNeilBrown <neilb@suse.de>

05616be5

md: add possibility to change data-offset for devices. · c6563a8c

由 NeilBrown 提交于 5月 21, 2012

When reshaping we can avoid costly intermediate backup by
changing the 'start' address of the array on the device
(if there is enough room).

So as a first step, allow such a change to be requested
through sysfs, and recorded in v1.x metadata.

(As we didn't previous check that all 'pad' fields were zero,
 we need a new FEATURE flag for this.
 A (belatedly) check that all remaining 'pad' fields are
 zero to avoid a repeat of this)

The new data offset must be requested separately for each device.
This allows each to have a different change in the data offset.
This is not likely to be used often but as data_offset can be
set per-device, new_data_offset should be too.

This patch also removes the 'acknowledged' arg to rdev_set_badblocks as
it is never used and never will be.  At the same time we add a new
arg ('in_new') which is currently always zero but will be used more
soon.

When a reshape finishes we will need to update the data_offset
and rdev->sectors.  So provide an exported function to do that.
Signed-off-by: NNeilBrown <neilb@suse.de>

c6563a8c

md: allow a reshape operation to be reversed. · 2c810cdd

由 NeilBrown 提交于 5月 21, 2012

Currently a reshape operation always progresses from the start
of the array to the end unless the number of devices is being
reduced, in which case it progressed in the opposite direction.

To reverse a partial reshape which changes the number of devices
you can stop the array and re-assemble with the raid-disks numbers
reversed and it will undo.

However for a reshape that does not change the number of devices
it is not possible to reverse the reshape in the middle - you have to
wait until it completes.

So add a 'reshape_direction' attribute with is either 'forwards' or
'backwards' and can be explicitly set when delta_disks is zero.

This will become more important when we allow the data_offset to
change in a reshape.  Then the explicit statement of what direction is
being used will be more useful.

This can be enabled in raid5 trivially as it already supports
reverse reshape and just needs to use a different trigger to request it.
Signed-off-by: NNeilBrown <neilb@suse.de>

2c810cdd

md: using GFP_NOIO to allocate bio for flush request · b5e1b8ce

由 Shaohua Li 提交于 5月 21, 2012

A flush request is usually issued in transaction commit code path, so
using GFP_KERNEL to allocate memory for flush request bio falls into
the classic deadlock issue.

This is suitable for any -stable kernel to which it applies as it
avoids a possible deadlock.

Cc: stable@vger.kernel.org
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b5e1b8ce

19 5月, 2012 1 次提交

md/raid10: fix transcription error in calc_sectors conversion. · b0d634d5

由 NeilBrown 提交于 5月 19, 2012

The old code was
		sector_div(stride, fc);
the new code was
		sector_dir(size, conf->near_copies);

'size' is right (the stride various wasn't really needed), but
'fc' means 'far_copies', and that is an important difference.

Signed-off-by: NeilBrown <neilb@suse.de>

b0d634d5

17 5月, 2012 2 次提交

MD: Add del_timer_sync to mddev_suspend (fix nasty panic) · 0d9f4f13

由 Jonathan Brassow 提交于 5月 16, 2012

Use del_timer_sync to remove timer before mddev_suspend finishes.

We don't want a timer going off after an mddev_suspend is called. This is
especially true with device-mapper, since it can call the destructor function
immediately following a suspend. This results in the removal (kfree) of the
structures upon which the timer depends - resulting in a very ugly panic.
Therefore, we add a del_timer_sync to mddev_suspend to prevent this.

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

0d9f4f13

md/raid10: set dev_sectors properly when resizing devices in array. · 6508fdbf

由 NeilBrown 提交于 5月 17, 2012

raid10 stores dev_sectors in 'conf' separately from the one in
'mddev' because it can have a very significant effect on block
addressing and so need to be updated carefully.

However raid10_resize isn't updating it at all!

To update it correctly, we need to make sure it is a proper
multiple of the chunksize taking various details of the layout
in to account.
This calculation is currently done in setup_conf.   So split it
out from there and call it from raid10_resize as well.
Then set conf->dev_sectors properly.
Signed-off-by: NNeilBrown <neilb@suse.de>

6508fdbf

04 5月, 2012 1 次提交

md/bitmap: fix calculation of 'chunks' - missing shift. · b16b1b6c

由 NeilBrown 提交于 5月 04, 2012

commit 61a0d80c "md/bitmap: discard CHUNK_BLOCK_SHIFT macro"
replaced CHUNK_BLOCK_RATIO() by the same text that was
replacing CHUNK_BLOCK_SHIFT() - which is clearly wrong.

The result is that 'chunks' is often too small by 1,
which can sometimes result in a crash (not sure how).

So use the correct replacement, and get rid of CHUNK_BLOCK_RATIO
which is no longe used.
Reported-by: NKarl Newman <siliconfiend@gmail.com>
Tested-by: NKarl Newman <siliconfiend@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b16b1b6c

24 4月, 2012 3 次提交

md: fix possible corruption of array metadata on shutdown. · 30b8aa91

由 NeilBrown 提交于 4月 24, 2012

commit c744a65c
  md: don't set md arrays to readonly on shutdown.

removed the possibility of a 'BUG' when data is written to an array
that has just been switched to read-only, but also introduced the
possibility that the array metadata could be corrupted.

If, when md_notify_reboot gets the mddev lock, the array is
in a state where it is assembled but hasn't been started (as can
happen if the personality module is not available, or in other unusual
situations), then incorrect metadata will be written out making it
impossible to re-assemble the array.

So only call __md_stop_writes() if the array has actually been
activated.

This patch is needed for any stable kernel which has had the above
commit applied.

Cc: stable@vger.kernel.org
Reported-by: NChristoph Nelles <evilazrael@evilazrael.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

30b8aa91

md: don't call ->add_disk unless there is good reason. · ed209584

由 NeilBrown 提交于 4月 24, 2012

Commit 7bfec5f3

   md/raid5: If there is a spare and a want_replacement device, start replacement.

cause md_check_recovery to call ->add_disk much more often.
Instead of only when the array is degraded, it is now called whenever
md_check_recovery finds anything useful to do, which includes
updating the metadata for clean<->dirty transition.
This causes unnecessary work, and causes info messages from ->add_disk
to be reported much too often.

So refine md_check_recovery to only do any actual recovery checking
(including ->add_disk) if MD_RECOVERY_NEEDED is set.

This fix is suitable for 3.3.y:

Cc: stable@vger.kernel.org
Reported-by: NJan Ceuleers <jan.ceuleers@computer.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

ed209584

DM RAID: Use safe version of rdev_for_each · a9ad8526

由 Jonathan Brassow 提交于 4月 24, 2012

Fix segfault caused by using rdev_for_each instead of rdev_for_each_safe

Commit dafb20fa mistakenly replaced a safe
iterator with an unsafe one when making some macro changes.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a9ad8526

12 4月, 2012 3 次提交

md/bitmap: prevent bitmap_daemon_work running while initialising bitmap · afbaa90b

由 NeilBrown 提交于 4月 12, 2012

If a bitmap is added while the array is active, it is possible
for bitmap_daemon_work to run while the bitmap is being
initialised.
This is particularly a problem if bitmap_daemon_work sees
bitmap->filemap as non-NULL before it has been filled in properly.
So hold bitmap_info.mutex while filling in ->filemap
to prevent problems.

This patch is suitable for any -stable kernel, though it might not
apply cleanly before about 3.1.

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

afbaa90b

md/raid1,raid10: Fix calculation of 'vcnt' when processing error recovery. · f4380a91

由 majianpeng 提交于 4月 12, 2012

If r1bio->sectors % 8 != 0,then the memcmp and a later
memcpy will omit the last bio_vec.

This is suitable for any stable kernel since 3.1 when bad-block
management was introduced.

Cc: stable@vger.kernel.org
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f4380a91

MD: Bitmap version cleanup. · 9e41dd35

由 Andrei Warkentin 提交于 4月 12, 2012

bitmap_new_disk_sb() would still create V3 bitmap superblock
with host-endian layout.

Perhaps I'm confused, but shouldn't bitmap_new_disk_sb() be
creating a V4 bitmap superblock instead, that is portable,
as per comment in bitmap.h?
Signed-off-by: NAndrei Warkentin <andrey.warkentin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9e41dd35

03 4月, 2012 5 次提交

md/raid1,raid10: don't compare excess byte during consistency check. · 5020ad7d

由 NeilBrown 提交于 4月 02, 2012

When comparing two pages read from different legs of a mirror, only
compare the bytes that were read, not the whole page.

In most cases we read a whole page, but in some cases with
bad blocks or odd sizes devices we might read fewer than that.

This bug has been present "forever" but at worst it might cause
a report of two many mismatches and generate a little bit
extra resync IO, so there is no need to back-port to -stable
kernels.
Reported-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

5020ad7d

md/raid5: Fix a bug about judging if the operation is syncing or replacing · c6d2e084

由 majianpeng 提交于 4月 02, 2012

When create a raid5 using assume-clean and echo check or repair to
sync_action.Then component disks did not operated IO but the raid
check/resync faster than normal.
Because the judgement in function analyse_stripe():
		if (do_recovery ||
		    sh->sector >= conf->mddev->recovery_cp)
			s->syncing = 1;
		else
			s->replacing = 1;
When check or repair,the recovery_cp == MaxSectore,so syncing equal zero
not one.

This bug was introduced by commit 9a3e1101
    md/raid5:  detect and handle replacements during recovery.
so this patch is suitable for 3.3-stable.

Cc: stable@vger.kernel.org
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c6d2e084

md/raid1:Remove unnecessary rcu_dereference(conf->mirrors[i].rdev). · a42f9d83

由 majianpeng 提交于 4月 02, 2012

Because rde->nr_pending > 0,so can not remove this disk.
And in any case, we aren't holding rcu_read_lock()
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a42f9d83

md: Avoid OOPS when reshaping raid1 to raid0 · 24b961f8

由 Jes Sorensen 提交于 4月 01, 2012

raid1 arrays do not have the notion of chunk size. Calculate the
largest chunk sector size we can use to avoid a divide by zero OOPS
when aligning the size of the new array to the chunk size.
Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

24b961f8

md/raid5: fix handling of bad blocks during recovery. · 18b9837e

由 NeilBrown 提交于 4月 01, 2012

1/ We can only treat a known-bad-block like a read-error if we
   have the data that belongs in that block.  So fix that test.

2/ If we cannot recovery a stripe due to insufficient data,
   don't tell "md_done_sync" that the sync failed unless we really
   did fail something.  If we successfully record bad blocks,
   that is success.
Reported-by: N"majianpeng" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

18b9837e

02 4月, 2012 3 次提交
- M
  md/raid1: If md_integrity_register() failed,run() must free the mem · 5220ea1e
  由 majianpeng 提交于 4月 02, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  5220ea1e
- M
  md/raid0: If md_integrity_register() fails, raid0_run() must free the mem. · 0366ef84
  由 majianpeng 提交于 4月 02, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  0366ef84
- M
  md/linear: If md_integrity_register() fails, linear_run() must free the mem. · 98d5561b
  由 majianpeng 提交于 4月 02, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  98d5561b
29 3月, 2012 4 次提交

dm: add verity target · a4ffc152

由 Mikulas Patocka 提交于 3月 28, 2012

This device-mapper target creates a read-only device that transparently
validates the data on one underlying device against a pre-generated tree
of cryptographic checksums stored on a second device.

Two checksum device formats are supported: version 0 which is already
shipping in Chromium OS and version 1 which incorporates some
improvements.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Signed-off-by: NWill Drewry <wad@chromium.org>
Signed-off-by: NElly Jones <ellyjones@chromium.org>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a4ffc152

dm bufio: prefetch · a66cc28f

由 Mikulas Patocka 提交于 3月 28, 2012

This patch introduces a new function dm_bufio_prefetch. It prefetches
the specified range of blocks into dm-bufio cache without waiting
for i/o completion.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a66cc28f

dm thin: add pool target flags to control discard · 67e2e2b2

由 Joe Thornber 提交于 3月 28, 2012

Add dm thin target arguments to control discard support.

ignore_discard: Disables discard support

no_discard_passdown: Don't pass discards down to the underlying data
device, but just remove the mapping within the thin provisioning target.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

67e2e2b2

dm thin: support discards · 104655fd

由 Joe Thornber 提交于 3月 28, 2012

Support discards in the thin target.

On discard the corresponding mapping(s) are removed from the thin
device.  If the associated block(s) are no longer shared the discard
is passed to the underlying device.

All bios other than discards now have an associated deferred_entry
that is saved to the 'all_io_entry' in endio_hook.  When non-discard
IO completes and associated mappings are quiesced any discards that
were deferred, via ds_add_work() in process_discard(), will be queued
for processing by the worker thread.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

drivers/md/dm-thin.c |  173 ++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/md/dm-thin.c |  172 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 158 insertions(+), 14 deletions(-)

104655fd