提交 · af3a2cd6b8a479345786e7fe5e199ad2f6240e56 · openeuler / raspberrypi-kernel

18 5月, 2010 9 次提交

md: Fix read balancing in RAID1 and RAID10 on drives > 2TB · af3a2cd6

由 NeilBrown 提交于 5月 08, 2010

read_balance uses a "unsigned long" for a sector number which
will get truncated beyond 2TB.
This will cause read-balancing to be non-optimal, and can cause
data to be read from the 'wrong' branch during a resync.  This has a
very small chance of returning wrong data.
Reported-by: NJordan Russell <jr-list-2010@quo.to>
Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

af3a2cd6

md/raid1: improve printk messages · 9dd1e2fa

由 NeilBrown 提交于 5月 03, 2010

Make sure the array name is included in a uniform way in all printk
messages.
Signed-off-by: NNeilBrown <neilb@suse.de>

9dd1e2fa

md/raid1: delay reads that could overtake behind-writes. · e555190d

由 NeilBrown 提交于 3月 31, 2010

When a raid1 array is configured to support write-behind
on some devices, it normally only reads from other devices.
If all devices are write-behind (because the rest have failed)
it is possible for a read request to be serviced before a
behind-write request, which would appear as data corruption.

So when forced to read from a WriteMostly device, wait for any
write-behind to complete, and don't start any more behind-writes.
Signed-off-by: NNeilBrown <neilb@suse.de>

e555190d

md/raid1: fix confusing 'redirect sector' message. · d754c5ae

由 NeilBrown 提交于 4月 07, 2010

This message seems to suggest the named device is the one on which a
read failed, however it is actually the device that the read will be
redirected to.
So make the message a little clearer.
Reported-by: NTim Burgess <ozburgess@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

d754c5ae

md: pass mddev to make_request functions rather than request_queue · 21a52c6d

由 NeilBrown 提交于 4月 01, 2010

We used to pass the personality make_request function direct
to the block layer so the first argument had to be a queue.
But now we have the intermediary md_make_request so it makes
at lot more sense to pass a struct mddev_s.
It makes it possible to have an mddev without its own queue too.
Signed-off-by: NNeilBrown <neilb@suse.de>

21a52c6d

md: remove ->changed and related code. · b821eaa5

由 NeilBrown 提交于 3月 29, 2010

We set ->changed to 1 and call check_disk_change at the end
of md_open so that bd_invalidated would be set and thus
partition rescan would happen appropriately.

Now that we call revalidate_disk directly, which sets bd_invalidates,
that indirection is no longer needed and can be removed.
Signed-off-by: NNeilBrown <neilb@suse.de>

b821eaa5

md: move io accounting out of personalities into md_make_request · 49077326

由 NeilBrown 提交于 3月 25, 2010

While I generally prefer letting personalities do as much as possible,
given that we have a central md_make_request anyway we may as well use
it to simplify code.
Also this centralises knowledge of ->gendisk which will help later.
Signed-off-by: NNeilBrown <neilb@suse.de>

49077326

drivers/md: Remove unnecessary casts of void * · 7b92813c

由 H Hartley Sweeten 提交于 3月 08, 2010

void pointers do not need to be cast to other pointer types.
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7b92813c

md/raid1: fix counting of write targets. · 964147d5

由 NeilBrown 提交于 5月 18, 2010

There is a very small race window when writing to a
RAID1 such that if a device is marked faulty at exactly the wrong
time, the write-in-progress will not be sent to the device,
but the bitmap (if present) will be updated to say that
the write was sent.

Then if the device turned out to still be usable as was re-added
to the array, the bitmap-based-resync would skip resyncing that
block, possibly leading to corruption.  This would only be a problem
if no further writes were issued to that area of the device (i.e.
that bitmap chunk).

Suitable for any pending -stable kernel.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

964147d5

16 3月, 2010 1 次提交

md: deal with merge_bvec_fn in component devices better. · 627a2d3c

由 NeilBrown 提交于 3月 08, 2010

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.
Signed-off-by: NNeilBrown <neilb@suse.de>
Cc: stable@kernel.org

627a2d3c

26 2月, 2010 1 次提交

block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectors · 086fa5ff

由 Martin K. Petersen 提交于 2月 26, 2010

The block layer calling convention is blk_queue_<limit name>.
blk_queue_max_sectors predates this practice, leading to some confusion.
Rename the function to appropriately reflect that its intended use is to
set max_hw_sectors.

Also introduce a temporary wrapper for backwards compability.  This can
be removed after the merge window is closed.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

086fa5ff

14 12月, 2009 4 次提交

N
md: add MODULE_DESCRIPTION for all md related modules. · 0efb9e61
由 NeilBrown 提交于 12月 14, 2009
```
Suggested by  Oren Held <orenhe@il.ibm.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
0efb9e61

md: move offset, daemon_sleep and chunksize out of bitmap structure · 42a04b50

由 NeilBrown 提交于 12月 14, 2009

... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.
Signed-off-by: NNeilBrown <neilb@suse.de>

42a04b50

N
md/raid1: add takeover support for raid5->raid1 · 709ae487
由 NeilBrown 提交于 12月 14, 2009
```
A 2-device raid5 array can now be converted to raid1.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
709ae487

md: add honouring of suspend_{lo,hi} to raid1. · 6eef4b21

由 NeilBrown 提交于 12月 14, 2009

This will allow us to stop writeout to portions of the array
while  they are resynced by someone else - e.g. another node in
a cluster.
Signed-off-by: NNeilBrown <neilb@suse.de>

6eef4b21

01 12月, 2009 1 次提交

md: revert incorrect fix for read error handling in raid1. · d0e26078

由 NeilBrown 提交于 12月 01, 2009

commit 4706b349 was a forward port of a fix that was needed
for SLES10.  But in fact it is not needed in mainline because
the earlier commit dd00a99e fixes the same problem in a
better way.
Further, this commit introduces a bug in the way it interacts with
the automatic read-error-correction.  If, after a read error is
successfully corrected, the same disk is chosen to re-read - the
re-read won't be attempted but an error will be returned instead.

After reverting that commit, there is the possibility that a
read error on a read-only array (where read errors cannot
be corrected as that requires a write) will repeatedly read the same
device and continue to get an error.
So in the "Array is readonly" case, fail the drive immediately on
a read error.
Signed-off-by: NNeilBrown <neilb@suse.de>
Cc: stable@kernel.org

d0e26078

16 10月, 2009 2 次提交

md: raid1/raid10: handle allocation errors during array setup. · ed9bfdf1

由 NeilBrown 提交于 10月 16, 2009

Both raid1 and raid10 create a mempool during startup.
If the 'alloc' function for this mempool fails, unplug_slaves
is called.
If that happens when the pool is being initialised, unplug_slaves
will try to use the 'conf' structure that isn't filled in yet, and
badness will happen.

So ensure that unplug_slaves doesn't get called unless we know
that the conf structure if fully initialised.
Signed-off-by: NNeilBrown <neilb@suse.de>

ed9bfdf1

md/raid1/raid10: add a cond_resched · 1d9d5241

由 NeilBrown 提交于 10月 16, 2009

During 'check' of a raid1 or raid10 it is possible for the management
thread to spend a lot of time running 'memcmp' on blocks from
different devices, so make sure the thread has a chance to schedule.
raid5d already has a cond_resched (in process_stripe).
Reported-By: NLee Howard <faxguy@howardsilvan.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

1d9d5241

23 9月, 2009 3 次提交

md: raid-1/10: fix RW bits manipulation · 1ef04fef

由 Dmitry Monakhov 提交于 9月 20, 2009

Recently Jens has changed bio_rw_flagged() logic by following
commit 1f98a13f. Now it returns
bool instead of int. This broke raid1/raid10 RW bits manipulation logic.
One of visible result is BUG_ON triggering due to empty barrier
here scsi_lib.c:1108 scsi_setup_fs_cmnd()
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

1ef04fef

md: report device as congested when suspended · 3fa841d7

由 NeilBrown 提交于 9月 23, 2009

This should writeback from coming when the device is temporarily
suspended.
Signed-off-by: NNeilBrown <neilb@suse.de>

3fa841d7

md: Improve name of threads created by md_register_thread · 0da3c619

由 NeilBrown 提交于 9月 23, 2009

The management thread for raid4,5,6 arrays are all called
mdX_raid5, independent of the actual raid level, which is wrong and
can be confusion.

So change md_register_thread to use the name from the personality
unless no alternate name (like 'resync' or 'reshape') is given.

This is simpler and more correct.

Cc: Jinzc <zhenchengjin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0da3c619

11 9月, 2009 1 次提交

bio: first step in sanitizing the bio->bi_rw flag testing · 1f98a13f

由 Jens Axboe 提交于 9月 11, 2009

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1f98a13f

03 8月, 2009 2 次提交

md: Use revalidate_disk to effect changes in size of device. · 449aad3e

由 NeilBrown 提交于 8月 03, 2009

As revalidate_disk calls check_disk_size_change, it will cause
any capacity change of a gendisk to be propagated to the blockdev
inode.  So use that instead of mucking about with locks and
i_size_write.

Also add a call to revalidate_disk in do_md_run and a few other places
where the gendisk capacity is changed.
Signed-off-by: NNeilBrown <neilb@suse.de>

449aad3e

md: Push down data integrity code to personalities. · ac5e7113

由 Andre Noll 提交于 8月 03, 2009

This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.

md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity.  The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().

The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.

For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

ac5e7113

01 7月, 2009 1 次提交

md: Use new topology calls to indicate alignment and I/O sizes · 8f6c2e4b

由 Martin K. Petersen 提交于 7月 01, 2009

Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

8f6c2e4b

18 6月, 2009 3 次提交

md: Push down reconstruction log message to personality code. · 8c6ac868

由 Andre Noll 提交于 6月 18, 2009

Currently, the md layer checks in analyze_sbs() if the raid level
supports reconstruction (mddev->level >= 1) and if reconstruction is
in progress (mddev->recovery_cp != MaxSector).

Move that printk into the personality code of those raid levels that
care (levels 1, 4, 5, 6, 10).
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

8c6ac868

md: Convert mddev->new_chunk to sectors. · 664e7c41

由 Andre Noll 提交于 6月 18, 2009

A straight-forward conversion which gets rid of some
multiplications/divisions/shifts. The patch also introduces a couple
of new ones, most of which are due to conf->chunk_size still being
represented in bytes. This will be cleaned up in subsequent patches.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

664e7c41

md: Make mddev->chunk_size sector-based. · 9d8f0363

由 Andre Noll 提交于 6月 18, 2009

This patch renames the chunk_size field to chunk_sectors with the
implied change of semantics.  Since

	is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
				  = is_power_of_2(chunk_sectors)

these bits don't need an adjustment for the shift.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

9d8f0363

16 6月, 2009 1 次提交

md: remove mddev_to_conf "helper" macro · 070ec55d

由 NeilBrown 提交于 6月 16, 2009

Having a macro just to cast a void* isn't really helpful.
I would must rather see that we are simply de-referencing ->private,
than have to know what the macro does.

So open code the macro everywhere and remove the pointless cast.
Signed-off-by: NNeilBrown <neilb@suse.de>

070ec55d

23 5月, 2009 1 次提交

block: Use accessor functions for queue limits · ae03bf63

由 Martin K. Petersen 提交于 5月 22, 2009

Convert all external users of queue limits to using wrapper functions
instead of poking the request queue variables directly.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ae03bf63

15 4月, 2009 1 次提交

block: move bio list helpers into bio.h · 8f3d8ba2

由 Christoph Hellwig 提交于 4月 07, 2009

It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8f3d8ba2

07 4月, 2009 1 次提交

md/raid1: fix build breakage · 91a9e99d

由 Alexander Beregalov 提交于 4月 07, 2009

Fix this build error:

  drivers/md/raid1.c: In function 'raid1_congested':
  drivers/md/raid1.c:589: error: 'BDI_write_congested' undeclared

BDI_write_congested was changed in commit 1faa16d2 ("block: change the
request allocation/congestion logic to be sync/async based")
Signed-off-by: NAlexander Beregalov <a.beregalov@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91a9e99d

06 4月, 2009 1 次提交

md/raid1 - don't assume newly allocated bvecs are initialised. · 303a0e11

由 NeilBrown 提交于 4月 06, 2009

Since commit d3f76110
newly allocated bvecs aren't initialised to NULL, so we have
to be more careful about freeing a bio which only managed
to get a few pages allocated to it.  Otherwise the resync
process crashes.

This patch is appropriate for 2.6.29-stable.

Cc: stable@kernel.org
Cc: "Jens Axboe" <jens.axboe@oracle.com>
Reported-by: NGabriele Tozzi <gabriele@tozzi.eu>
Signed-off-by: NNeilBrown <neilb@suse.de>

303a0e11

31 3月, 2009 7 次提交

md: 'array_size' sysfs attribute · b522adcd

由 Dan Williams 提交于 3月 31, 2009

Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
   a) If size is set before the array is running, do_md_run will fail
      if size is greater than the default size
   b) A reshape attempt that reduces the default size to less than the set
      array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
   kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size.  Resync/reshape operations
always follow the default size.

Finally, fixup other locations that read a number of 1k-blocks from
userspace to use strict_blocks_to_sectors() which checks for unsigned
long long to sector_t overflow and blocks to sectors overflow.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b522adcd

md: centralize ->array_sectors modifications · 1f403624

由 Dan Williams 提交于 3月 31, 2009

Get personalities out of the business of directly modifying
->array_sectors.  Lays groundwork to introduce policy on when
->array_sectors can be modified.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1f403624

md: add 'size' as a personality method · 80c3a6ce

由 Dan Williams 提交于 3月 17, 2009

In preparation for giving userspace control over ->array_sectors we need
to be able to retrieve the 'default' size, and the 'anticipated' size
when a reshape is requested.  For personalities that do not reshape emit
a warning if anything but the default size is requested.

In the raid5 case we need to update ->previous_raid_disks to make the
new 'default' size available.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

80c3a6ce

md: enable suspend/resume of md devices. · 409c57f3

由 NeilBrown 提交于 3月 31, 2009

To be able to change the 'level' of an md/raid array, we need to
suspend the device so that no requests are active - then move some
pointers around etc.

The code already keeps counts of active requests and the ->quiesce
function can be used to wait until those counts hit zero.
However the quiesce function blocks new requests once they are all
ready 'inside' the personality module, and that is too late if we want
to replace the personality modules.

So make all md requests come in through a common md_make_request
function that keeps track of how many requests have entered the
modules but may not yet be on the internal reference counts.
Allow md_make_request to be blocked when we want to suspend the
device, and make it possible to wait for all those in-transit requests
to be added to internal lists so that ->quiesce can wait for them.

There is still a problem that when a request completes, we drop the
ref count inside the personality code so there is a short time between
when the refcount hits zero, and when the personality code is no
longer being used.
The personality code never blocks (schedule or spinlock) between
dropping the refcount and exiting the routine, so this should be safe
(as put_module calls synchronize_sched() before unmapping the module
code).
Signed-off-by: NNeilBrown <neilb@suse.de>

409c57f3

md: Make mddev->size sector-based. · 58c0fed4

由 Andre Noll 提交于 3月 31, 2009

This patch renames the "size" field of struct mddev_s to "dev_sectors"
and stores the number of 512-byte sectors instead of the number of
1K-blocks in it.

All users of that field, including raid levels 1,4-6,10, are adjusted
accordingly. This simplifies the code a bit because it allows to get
rid of a couple of divisions/multiplications by two.

In order to make checkpatch happy, some minor coding style issues
have also been addressed. In particular, size_store() now uses
strict_strtoull() instead of simple_strtoull().
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

58c0fed4

N
md: move md_k.h from include/linux/raid/ to drivers/md/ · 43b2e5d8
由 NeilBrown 提交于 3月 31, 2009
```
It really is nicer to keep related code together..
Signed-off-by: NNeilBrown <neilb@suse.de>
```
43b2e5d8

md: move lots of #include lines out of .h files and into .c · bff61975

由 NeilBrown 提交于 3月 31, 2009

This makes the includes more explicit, and is preparation for moving
md_k.h to drivers/md/md.h

Remove include/raid/md.h as its only remaining use was to #include
other files.
Signed-off-by: NNeilBrown <neilb@suse.de>

bff61975