提交 · 3fa841d7e7266f6fcc1b3885b905f5153ba897d8 · openanolis / cloud-kernel

23 9月, 2009 3 次提交

md: report device as congested when suspended · 3fa841d7

由 NeilBrown 提交于 9月 23, 2009

This should writeback from coming when the device is temporarily
suspended.
Signed-off-by: NNeilBrown <neilb@suse.de>

3fa841d7

md: Improve name of threads created by md_register_thread · 0da3c619

由 NeilBrown 提交于 9月 23, 2009

The management thread for raid4,5,6 arrays are all called
mdX_raid5, independent of the actual raid level, which is wrong and
can be confusion.

So change md_register_thread to use the name from the personality
unless no alternate name (like 'resync' or 'reshape') is given.

This is simpler and more correct.

Cc: Jinzc <zhenchengjin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0da3c619

md: remove sparse waring "symbol xxx shadows an earlier one" · a9f326eb

由 NeilBrown 提交于 9月 23, 2009

Rename some variable and remove some duplicate definitions
to avoid there warnings.  None of them are actual errors.
Signed-off-by: NNeilBrown <neilb@suse.de>

a9f326eb

11 9月, 2009 1 次提交

bio: first step in sanitizing the bio->bi_rw flag testing · 1f98a13f

由 Jens Axboe 提交于 9月 11, 2009

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1f98a13f

13 8月, 2009 3 次提交

md/raid5: Properly remove excess drives after shrinking a raid5/6 · 1a67dde0

由 NeilBrown 提交于 8月 13, 2009

We were removing the drives, from the array, but not
removing symlinks from /sys/.... and not marking the device
as having been removed.
Signed-off-by: NNeilBrown <neilb@suse.de>

1a67dde0

md/raid5: make sure a reshape restarts at the correct address. · a639755c

由 NeilBrown 提交于 8月 13, 2009

This "if" don't allow for the possibility that the number of devices
doesn't change, and so sector_nr isn't set correctly in that case.
So change '>' to '>='.
Signed-off-by: NNeilBrown <neilb@suse.de>

a639755c

md/raid5: allow new reshape modes to be restarted in the middle. · 67ac6011

由 NeilBrown 提交于 8月 13, 2009

md/raid5 doesn't allow a reshape to restart if it involves writing
over the same part of disk that it would be reading from.
This happens at the beginning of a reshape that increases the number
of devices, at the end of a reshape that decreases the number of
devices, and continuously for a reshape that does not change the
number of devices.

The current code is correct for the "increase number of devices"
case as the critical section at the start is handled by userspace
performing a backup.

It does not work for reducing the number of devices, or the
no-change case.
For 'reducing', we need to invert the test.  For no-change we cannot
really be sure things will be safe, so simply require the array
to be read-only, which is how the user-space code which carefully
starts such arrays works.
Signed-off-by: NNeilBrown <neilb@suse.de>

67ac6011

03 8月, 2009 3 次提交

md: Use revalidate_disk to effect changes in size of device. · 449aad3e

由 NeilBrown 提交于 8月 03, 2009

As revalidate_disk calls check_disk_size_change, it will cause
any capacity change of a gendisk to be propagated to the blockdev
inode.  So use that instead of mucking about with locks and
i_size_write.

Also add a call to revalidate_disk in do_md_run and a few other places
where the gendisk capacity is changed.
Signed-off-by: NNeilBrown <neilb@suse.de>

449aad3e

md: allow raid5_quiesce to work properly when reshape is happening. · 64bd660b

由 NeilBrown 提交于 8月 03, 2009

The ->quiesce method is not supposed to stop resync/recovery/reshape,
just normal IO.
But in raid5 we don't have a way to know which stripes are being
used for normal IO and which for resync etc, so we need to wait for
all stripes to be idle to be sure that all writes have completed.

However reshape keeps at least some stripe busy for an extended period
of time, so a call to raid5_quiesce can block for several seconds
needlessly.
So arrange for reshape etc to pause briefly while raid5_quiesce is
trying to quiesce the array so that the active_stripes count can
drop to zero.
Signed-off-by: NNeilBrown <neilb@suse.de>

64bd660b

md/raid5: set reshape_position correctly when reshape starts. · e516402c

由 NeilBrown 提交于 8月 03, 2009

As the internal reshape_progress counter is the main driver
for reshape, the fact that reshape_position sometimes starts with the
wrong value has minimal effect.  It is visible in sysfs and that
is all.
Signed-off-by: NNeilBrown <neilb@suse.de>

e516402c

31 7月, 2009 1 次提交

md/raid6: release spare page at ->stop() · 95fc17aa

由 Dan Williams 提交于 7月 31, 2009

Add missing call to safe_put_page from stop() by unifying open coded
raid5_conf_t de-allocation under free_conf().

Cc: <stable@kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

95fc17aa

01 7月, 2009 3 次提交

md: use interruptible wait when duration is controlled by userspace. · e62e58a5

由 NeilBrown 提交于 7月 01, 2009

User space can set various limits on an md array so that resync waits
when it gets to a certain point, or so that I/O is blocked for a short
while.
When md is waiting against one of these limit, it should use an
interruptible wait so as not to add to the load average, and so are
not to trigger a warning if the wait goes on for too long.
Signed-off-by: NNeilBrown <neilb@suse.de>

e62e58a5

md/raid5: suspend shouldn't affect read requests. · a5c308d4

由 NeilBrown 提交于 7月 01, 2009

md allows write to regions on an array to be suspended temporarily.
This allows user-space to participate is aspects of reshape.
In particular, data can be copied with not risk of a race.
We should not be blocking read requests though, so don't.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

a5c308d4

md: Use new topology calls to indicate alignment and I/O sizes · 8f6c2e4b

由 Martin K. Petersen 提交于 7月 01, 2009

Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

8f6c2e4b

18 6月, 2009 11 次提交

md/raid5: correctly update sync_completed when we reach max_resync · 48606a9f

由 NeilBrown 提交于 6月 18, 2009

At the end of reshape_request we update cyrr_resync_completed
if we are about to pause due to reaching resync_max.
However we update it to the wrong value.  We need to add the
"reshape_sectors" that have just been reshaped.
Signed-off-by: NNeilBrown <neilb@suse.de>

48606a9f

md/raid5: add missing call to schedule() after prepare_to_wait() · 7a3ab908

由 Dan Williams 提交于 6月 16, 2009

In the unlikely event that reshape progresses past the current request
while it is waiting for a stripe we need to schedule() before retrying
for 2 reasons:
1/ Prevent list corruption from duplicated list_add() calls without
   intervening list_del().
2/ Give the reshape code a chance to make some progress to resolve the
   conflict.

Cc: <stable@kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7a3ab908

md: Push down reconstruction log message to personality code. · 8c6ac868

由 Andre Noll 提交于 6月 18, 2009

Currently, the md layer checks in analyze_sbs() if the raid level
supports reconstruction (mddev->level >= 1) and if reconstruction is
in progress (mddev->recovery_cp != MaxSector).

Move that printk into the personality code of those raid levels that
care (levels 1, 4, 5, 6, 10).
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

8c6ac868

md: merge reconfig and check_reshape methods. · 50ac168a

由 NeilBrown 提交于 6月 18, 2009

The difference between these two methods is artificial.
Both check that a pending reshape is valid, and perform any
aspect of it that can be done immediately.
'reconfig' handles chunk size and layout.
'check_reshape' handles raid_disks.

So make them just one method.
Signed-off-by: NNeilBrown <neilb@suse.de>

50ac168a

md: remove unnecessary arguments from ->reconfig method. · 597a711b

由 NeilBrown 提交于 6月 18, 2009

Passing the new layout and chunksize as args is not necessary as
the mddev has fields for new_check and new_layout.

This is preparation for combining the check_reshape and reconfig
methods
Signed-off-by: NNeilBrown <neilb@suse.de>

597a711b

md: raid5: check stripe cache is large enough in start_reshape · 01ee22b4

由 NeilBrown 提交于 6月 18, 2009

In reshape cases that do not change the number of devices,
start_reshape is called without first calling check_reshape.

Currently, the check that the stripe_cache is large enough is
only done in check_reshape.  It should be in start_reshape too.
Signed-off-by: NNeilBrown <neilb@suse.de>

01ee22b4

md: fix some comments. · cdc2ae6d

由 Andre Noll 提交于 6月 18, 2009

1/ Raid5 has learned to take over also raid4 and raid6 arrays.
2/ new_chunk in mdp_superblock_1 is in sectors, not bytes.
Signed-off-by: NNeilBrown <neilb@suse.de>

cdc2ae6d

A
md/raid5: Use is_power_of_2() in raid5_reconfig()/raid6_reconfig(). · 0ba459d2
由 Andre Noll 提交于 6月 18, 2009
```
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
0ba459d2

md: convert conf->chunk_size and conf->prev_chunk to sectors. · 09c9e5fa

由 Andre Noll 提交于 6月 18, 2009

This kills some more shifts.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

09c9e5fa

md: Convert mddev->new_chunk to sectors. · 664e7c41

由 Andre Noll 提交于 6月 18, 2009

A straight-forward conversion which gets rid of some
multiplications/divisions/shifts. The patch also introduces a couple
of new ones, most of which are due to conf->chunk_size still being
represented in bytes. This will be cleaned up in subsequent patches.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

664e7c41

md: Make mddev->chunk_size sector-based. · 9d8f0363

由 Andre Noll 提交于 6月 18, 2009

This patch renames the chunk_size field to chunk_sectors with the
implied change of semantics.  Since

	is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
				  = is_power_of_2(chunk_sectors)

these bits don't need an adjustment for the shift.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

9d8f0363

16 6月, 2009 2 次提交

md: raid5: chunk size check in setup_conf · 740da449

由 raz ben yehuda 提交于 6月 16, 2009

have raid5 check chunk size in run/reshape method instead of in md

Signed-off-by: raziebe@gmail.com
Signed-off-by: NNeilBrown <neilb@suse.de>

740da449

md: remove mddev_to_conf "helper" macro · 070ec55d

由 NeilBrown 提交于 6月 16, 2009

Having a macro just to cast a void* isn't really helpful.
I would must rather see that we are simply de-referencing ->private,
than have to know what the macro does.

So open code the macro everywhere and remove the pointless cast.
Signed-off-by: NNeilBrown <neilb@suse.de>

070ec55d

09 6月, 2009 3 次提交

md/raid5: fix bug in reshape code when chunk_size decreases. · 0e6e0271

由 NeilBrown 提交于 6月 09, 2009

Now that we support changing the chunksize, we calculate
"reshape_sectors" to be the max of number of sectors in old
and new chunk size.
However there is one please where we still use 'chunksize'
rather than 'reshape_sectors'.
This causes a reshape that reduces the size of chunks to freeze.
Signed-off-by: NNeilBrown <neilb@suse.de>

0e6e0271

md/raid5 - avoid deadlocks in get_active_stripe during reshape · a8c906ca

由 NeilBrown 提交于 6月 09, 2009

md has functionality to 'quiesce' and array so that all pending
IO completed and no new IO starts.  This is used to achieve a
stable state before making internal changes.

Currently this quiescing applies equally to normal IO, resync
IO, and reshape IO.
However there is a problem with applying it to reshape IO.
Reshape can have multiple 'stripe_heads' that must be active together.
If the quiesce come between allocating the first and the last of
such a collection, then we deadlock, as the last will not be allocated
until the quiesce is lifted, the quiesce will not be lifted until the
first (which has been allocated) gets used, and that first cannot be
used until the last is allocated.

It is not necessary to inhibit reshape IO when a quiesce is
requested.  Those places in the code that require a full quiesce will
ensure the reshape thread is not running at all.

So allow reshape requests to get access to new stripe_heads without
being blocked by a 'quiesce'.

This only affects in-place reshapes (i.e. where the array does not
grow or shrink) and these are only newly supported.  So this patch is
not needed in earlier kernels.
Signed-off-by: NNeilBrown <neilb@suse.de>

a8c906ca

md/raid5: use conf->raid_disks in preference to mddev->raid_disk · f001a70c

由 NeilBrown 提交于 6月 09, 2009

mddev->raid_disks can be changed and any time by a request from
user-space.  It is a suggestion as to what number of raid_disks is
desired.

conf->raid_disks can only be changed by the raid5 module with suitable
locks in place.  It is a statement as to the current number of
raid_disks.

There are two places where the latter should be used, but the former
is used.  This can lead to a crash when reshaping an array.

This patch changes to mddev-> to conf->
Signed-off-by: NNeilBrown <neilb@suse.de>

f001a70c

27 5月, 2009 1 次提交

md: raid5: change incorrect usage of 'min' macro to 'min_t' · ed37d83e

由 NeilBrown 提交于 5月 27, 2009

A recent patch to raid5.c use min on an int and a sector_t.
This isn't allowed.
So change it to min_t(sector_t,x,y).
Signed-off-by: NNeilBrown <neilb@suse.de>

ed37d83e

26 5月, 2009 1 次提交

md: raid5: avoid sector values going negative when testing reshape progress. · 848b3182

由 NeilBrown 提交于 5月 26, 2009

As sector_t in unsigned, we cannot afford to let 'safepos' etc go
negative.
So replace
   a -= b;
by
   a -= min(b,a);
Signed-off-by: NNeilBrown <neilb@suse.de>

848b3182

23 5月, 2009 1 次提交

block: Use accessor functions for queue limits · ae03bf63

由 Martin K. Petersen 提交于 5月 22, 2009

Convert all external users of queue limits to using wrapper functions
instead of poking the request queue variables directly.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ae03bf63

17 4月, 2009 1 次提交

md: update sync_completed and reshape_position even more often. · c03f6a19

由 NeilBrown 提交于 4月 17, 2009

There are circumstances when a user-space process might need to
"oversee" a resync/reshape process.  For example when doing an
in-place reshape of a raid5, it is prudent to take a backup of each
section before reshaping it as this is the only way to provide
safety against an unplanned shutdown (i.e. crash/power failure).

The sync_max sysfs value can be used to stop the resync from
advancing beyond a particular point.
So user-space can:
  suspend IO to the first section and back it up
  set 'sync_max' to the end of the section
  wait for 'sync_completed' to reach that point
  resume IO on the first section and move on to the next section.

However this process requires the kernel and user-space to run in
lock-step which could introduce unnecessary delays.

It would be better if a 'double buffered' approach could be used with
userspace and kernel space working on different sections with the
'next' section always ready when the 'current' section is finished.

One problem with implementing this is that sync_completed is only
guaranteed to be updated when the sync process reaches sync_max.
(it is updated on a time basis at other times, but it is hard to rely
on that).  This defeats some of the double buffering.

With this patch, sync_completed (and reshape_position) get updated as
the current position approaches sync_max, so there is room for
userspace to advance sync_max early without losing updates.

To be precise, sync_completed is updated when the current sync
position reaches half way between the current value of sync_completed
and the value of sync_max.  This will usually be a good time for user
space to update sync_max.

If sync_max does not get updated, the updates to sync_completed
(together with associated metadata updates) will occur at an
exponentially increasing frequency which will get unreasonably fast
(one update every page) immediately before the process hits sync_max
and stops.  So the update rate will be unreasonably fast only for an
insignificant period of time.
Signed-off-by: NNeilBrown <neilb@suse.de>

c03f6a19

14 4月, 2009 1 次提交

md: improve usefulness and accuracy of sysfs file md/sync_completed. · acb180b0

由 NeilBrown 提交于 4月 14, 2009

The sync_completed file reports how much of a resync (or recovery or
reshape) has been completed.
However due to the possibility of out-of-order completion of writes,
it is not certain to be accurate.

We have an internal value - mddev->curr_resync_completed - which is an
accurate value (though it might not always be quite so uptodate).

So:
 - make curr_resync_completed be uptodate a little more often,
   particularly when raid5 reshape updates status in the metadata
 - report curr_resync_completed in the sysfs file
 - allow poll/select to report all updates to md/sync_completed.

This makes sync_completed completed usable by any external metadata
handler that wants to record this status information in its metadata.
Signed-off-by: NNeilBrown <neilb@suse.de>

acb180b0

31 3月, 2009 5 次提交

md/raid5 revise rules for when to update metadata during reshape · c8f517c4

由 NeilBrown 提交于 3月 31, 2009

We currently update the metadata :
 1/ every 3Megabytes
 2/ When the place we will write new-layout data to is recorded in
    the metadata as still containing old-layout data.

Rule one exists to avoid having to re-do too much reshaping in the
face of a crash/restart.  So it should really be time based rather
than size based.  So change it to "every 10 seconds".

Rule two turns out to be too harsh when restriping an array
'in-place', as in that case the metadata much be updates for every
stripe.
For the in-place update, it can only possibly be safe from a crash if
some user-space program data a backup of every e.g. few hundred
stripes before allowing them to be reshaped.  In that case, the
constant metadata update is pointless.
So only update the metadata if the new metadata will report that the
end of the 'old-layout' data is beyond where we are currently
writing 'new-layout' data.
Signed-off-by: NNeilBrown <neilb@suse.de>

c8f517c4

md/raid5: minor code cleanups in make_request. · b0f9ec04

由 NeilBrown 提交于 3月 31, 2009

... and to be certain the that make_request doesn't wait forever,
add a 'wake_up' when ->reshape_progress has been set to MaxSector
Signed-off-by: NNeilBrown <neilb@suse.de>

b0f9ec04

md: remove CONFIG_MD_RAID_RESHAPE config option. · 2cffc4a0

由 NeilBrown 提交于 3月 31, 2009

This was only needed when the code was experimental.  Most of it
is well tested now, so the option is no longer useful.
Signed-off-by: NNeilBrown <neilb@suse.de>

2cffc4a0

md/raid5: be more careful about write ordering when reshaping. · ab69ae12

由 NeilBrown 提交于 3月 31, 2009

When we are reshaping an array, it is very important that we read
the data from a particular sector offset before writing new data
at that offset.

In most cases when growing or shrinking an array we read long before
we even consider writing.  But when restriping an array without
changing it size, there is a small possibility that we might have
some data to available write before the read has happened at the same
location.  This would require some stripes to be in cache already.

To guard against this small possibility, we check, before writing,
that the 'old' stripe at the same location is not in the process of
being read.  And we ensure that we mark all 'source' stripes as such
before allowing new 'destination' stripes to proceed.
Signed-off-by: NNeilBrown <neilb@suse.de>

ab69ae12

md/raid5: allow layout and chunksize to be changed on active array. · 88ce4930

由 NeilBrown 提交于 3月 31, 2009

If an array has 3 or more devices, we allow the chunksize or layout
to be changed and when a reshape starts, we use these as the 'new'
values.
Signed-off-by: NNeilBrown <neilb@suse.de>

88ce4930

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功