提交 · 7ca263cdf8cf74d0f1c6f48d07d556de92e3bec9 · openeuler / raspberrypi-kernel

23 9月, 2009 3 次提交

md: report device as congested when suspended · 3fa841d7

由 NeilBrown 提交于 9月 23, 2009

This should writeback from coming when the device is temporarily
suspended.
Signed-off-by: NNeilBrown <neilb@suse.de>

3fa841d7

md: Improve name of threads created by md_register_thread · 0da3c619

由 NeilBrown 提交于 9月 23, 2009

The management thread for raid4,5,6 arrays are all called
mdX_raid5, independent of the actual raid level, which is wrong and
can be confusion.

So change md_register_thread to use the name from the personality
unless no alternate name (like 'resync' or 'reshape') is given.

This is simpler and more correct.

Cc: Jinzc <zhenchengjin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0da3c619

md: remove sparse waring "symbol xxx shadows an earlier one" · a9f326eb

由 NeilBrown 提交于 9月 23, 2009

Rename some variable and remove some duplicate definitions
to avoid there warnings.  None of them are actual errors.
Signed-off-by: NNeilBrown <neilb@suse.de>

a9f326eb

17 9月, 2009 2 次提交

md/raid6: cleanup ops_run_compute6_2 · 6c910a78

由 Dan Williams 提交于 9月 16, 2009

Neil says:
	"It is correct as it stands, but the fact that every branch in
	 the 'if' part ends with a 'return' isn't immediately obvious,
	 so it is clearer if we are explicit about the if / then / else
	 structure."
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6c910a78

md/raid6: eliminate BUG_ON with side effect · 2d6e4ecc

由 Dan Williams 提交于 9月 16, 2009

As pointed out by Neil it should be possible to build a driver with all
BUG_ON statements deleted.  It's bad form to have a BUG_ON with a side
effect.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

2d6e4ecc

11 9月, 2009 1 次提交

bio: first step in sanitizing the bio->bi_rw flag testing · 1f98a13f

由 Jens Axboe 提交于 9月 11, 2009

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1f98a13f

09 9月, 2009 1 次提交

dmaengine: add fence support · 0403e382

由 Dan Williams 提交于 9月 08, 2009

Some engines optimize operation by reading ahead in the descriptor chain
such that descriptor2 may start execution before descriptor1 completes.
If descriptor2 depends on the result from descriptor1 then a fence is
required (on descriptor2) to disable this optimization. The async_tx
api could implicitly identify dependencies via the 'depend_tx'
parameter, but that would constrain cases where the dependency chain
only specifies a completion order rather than a data dependency. So,
provide an ASYNC_TX_FENCE to explicitly identify data dependencies.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0403e382

30 8月, 2009 12 次提交

md/raid456: distribute raid processing over multiple cores · 07a3b417

由 Dan Williams 提交于 8月 29, 2009

Now that the resources to handle stripe_head operations are allocated
percpu it is possible for raid5d to distribute stripe handling over
multiple cores.  This conversion also adds a call to cond_resched() in
the non-multicore case to prevent one core from getting monopolized for
raid operations.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

07a3b417

md/raid6: remove synchronous infrastructure · b774ef49

由 Yuri Tikhonov 提交于 8月 29, 2009

These routines have been replaced by there asynchronous counterparts.
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b774ef49

md/raid6: asynchronous handle_stripe6 · 6c0069c0

由 Yuri Tikhonov 提交于 8月 29, 2009

1/ Use STRIPE_OP_BIOFILL to offload completion of read requests to
   raid_run_ops
2/ Implement a handler for sh->reconstruct_state similar to the raid5 case
   (adds handling of Q parity)
3/ Prevent handle_parity_checks6 from running concurrently with 'compute'
   operations
4/ Hook up raid_run_ops
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6c0069c0

md/raid6: asynchronous handle_parity_check6 · d82dfee0

由 Dan Williams 提交于 7月 14, 2009

[ Based on an original patch by Yuri Tikhonov ]

Implement the state machine for handling the RAID-6 parities check and
repair functionality.  Note that the raid6 case does not need to check
for new failures, like raid5, as it will always writeback the correct
disks.  The raid5 case can be updated to check zero_sum_result to avoid
getting confused by new failures rather than retrying the entire check
operation.
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d82dfee0

md/raid6: asynchronous handle_stripe_dirtying6 · a9b39a74

由 Yuri Tikhonov 提交于 8月 29, 2009

In the synchronous implementation of stripe dirtying we processed a
degraded stripe with one call to handle_stripe_dirtying6().  I.e.
compute the missing blocks from the other drives, then copy in the new
data and reconstruct the parities.

In the asynchronous case we do not perform stripe operations directly.
Instead, operations are scheduled with flags to be later serviced by
raid_run_ops.  So, for the degraded case the final reconstruction step
can only be carried out after all blocks have been brought up to date by
being read, or computed.  Like the raid5 case schedule_reconstruction()
sets STRIPE_OP_RECONSTRUCT to request a parity generation pass and
through operation chaining can handle compute and reconstruct in a
single raid_run_ops pass.

[dan.j.williams@intel.com: fixup handle_stripe_dirtying6 gating]
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a9b39a74

md/raid6: asynchronous handle_stripe_fill6 · 5599becc

由 Yuri Tikhonov 提交于 8月 29, 2009

Modify handle_stripe_fill6 to work asynchronously by introducing
fetch_block6 as the raid6 analog of fetch_block5 (schedule compute
operations for missing/out-of-sync disks).

[dan.j.williams@intel.com: compute D+Q in one pass]
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5599becc

md/raid5,6: common schedule_reconstruction for raid5/6 · c0f7bddb

由 Yuri Tikhonov 提交于 8月 29, 2009

Extend schedule_reconstruction5 for reuse by the raid6 path.  Add
support for generating Q and BUG() if a request is made to perform
'prexor'.
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c0f7bddb

md/raid6: asynchronous raid6 operations · ac6b53b6

由 Dan Williams 提交于 7月 14, 2009

[ Based on an original patch by Yuri Tikhonov ]

The raid_run_ops routine uses the asynchronous offload api and
the stripe_operations member of a stripe_head to carry out xor+pq+copy
operations asynchronously, outside the lock.

The operations performed by RAID-6 are the same as in the RAID-5 case
except for no support of STRIPE_OP_PREXOR operations. All the others
are supported:
STRIPE_OP_BIOFILL
 - copy data into request buffers to satisfy a read request
STRIPE_OP_COMPUTE_BLK
 - generate missing blocks (1 or 2) in the cache from the other blocks
STRIPE_OP_BIODRAIN
 - copy data out of request buffers to satisfy a write request
STRIPE_OP_RECONSTRUCT
 - recalculate parity for new data that has entered the cache
STRIPE_OP_CHECK
 - verify that the parity is correct

The flow is the same as in the RAID-5 case, and reuses some routines, namely:
1/ ops_complete_postxor (renamed to ops_complete_reconstruct)
2/ ops_complete_compute (updated to set up to 2 targets uptodate)
3/ ops_run_check (renamed to ops_run_check_p for xor parity checks)

[neilb@suse.de: fixes to get it to pass mdadm regression suite]
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ac6b53b6

md/raid5: factor out mark_uptodate from ops_complete_compute5 · 4e7d2c0a

由 Dan Williams 提交于 8月 29, 2009

ops_complete_compute5 can be reused in the raid6 path if it is updated to
generically handle a second target.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4e7d2c0a

async_tx: add sum check flags · ad283ea4

由 Dan Williams 提交于 8月 29, 2009

Replace the flat zero_sum_result with a collection of flags to contain
the P (xor) zero-sum result, and the soon to be utilized Q (raid6 reed
solomon syndrome) zero-sum result.  Use the SUM_CHECK_ namespace instead
of DMA_ since these flags will be used on non-dma-zero-sum enabled
platforms.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ad283ea4

md/raid5,6: add percpu scribble region for buffer lists · d6f38f31

由 Dan Williams 提交于 7月 14, 2009

Use percpu memory rather than stack for storing the buffer lists used in
parity calculations.  Include space for dma address conversions and pass
that to async_tx via the async_submit_ctl.scribble pointer.

[ Impact: move memory pressure from stack to heap ]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d6f38f31

md/raid6: move the spare page to a percpu allocation · 36d1c647

由 Dan Williams 提交于 7月 14, 2009

In preparation for asynchronous handling of raid6 operations move the
spare page to a percpu allocation to allow multiple simultaneous
synchronous raid6 recovery operations.

Make this allocation cpu hotplug aware to maximize allocation
efficiency.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

36d1c647

13 8月, 2009 3 次提交

md/raid5: Properly remove excess drives after shrinking a raid5/6 · 1a67dde0

由 NeilBrown 提交于 8月 13, 2009

We were removing the drives, from the array, but not
removing symlinks from /sys/.... and not marking the device
as having been removed.
Signed-off-by: NNeilBrown <neilb@suse.de>

1a67dde0

md/raid5: make sure a reshape restarts at the correct address. · a639755c

由 NeilBrown 提交于 8月 13, 2009

This "if" don't allow for the possibility that the number of devices
doesn't change, and so sector_nr isn't set correctly in that case.
So change '>' to '>='.
Signed-off-by: NNeilBrown <neilb@suse.de>

a639755c

md/raid5: allow new reshape modes to be restarted in the middle. · 67ac6011

由 NeilBrown 提交于 8月 13, 2009

md/raid5 doesn't allow a reshape to restart if it involves writing
over the same part of disk that it would be reading from.
This happens at the beginning of a reshape that increases the number
of devices, at the end of a reshape that decreases the number of
devices, and continuously for a reshape that does not change the
number of devices.

The current code is correct for the "increase number of devices"
case as the critical section at the start is handled by userspace
performing a backup.

It does not work for reducing the number of devices, or the
no-change case.
For 'reducing', we need to invert the test.  For no-change we cannot
really be sure things will be safe, so simply require the array
to be read-only, which is how the user-space code which carefully
starts such arrays works.
Signed-off-by: NNeilBrown <neilb@suse.de>

67ac6011

03 8月, 2009 3 次提交

md: Use revalidate_disk to effect changes in size of device. · 449aad3e

由 NeilBrown 提交于 8月 03, 2009

As revalidate_disk calls check_disk_size_change, it will cause
any capacity change of a gendisk to be propagated to the blockdev
inode.  So use that instead of mucking about with locks and
i_size_write.

Also add a call to revalidate_disk in do_md_run and a few other places
where the gendisk capacity is changed.
Signed-off-by: NNeilBrown <neilb@suse.de>

449aad3e

md: allow raid5_quiesce to work properly when reshape is happening. · 64bd660b

由 NeilBrown 提交于 8月 03, 2009

The ->quiesce method is not supposed to stop resync/recovery/reshape,
just normal IO.
But in raid5 we don't have a way to know which stripes are being
used for normal IO and which for resync etc, so we need to wait for
all stripes to be idle to be sure that all writes have completed.

However reshape keeps at least some stripe busy for an extended period
of time, so a call to raid5_quiesce can block for several seconds
needlessly.
So arrange for reshape etc to pause briefly while raid5_quiesce is
trying to quiesce the array so that the active_stripes count can
drop to zero.
Signed-off-by: NNeilBrown <neilb@suse.de>

64bd660b

md/raid5: set reshape_position correctly when reshape starts. · e516402c

由 NeilBrown 提交于 8月 03, 2009

As the internal reshape_progress counter is the main driver
for reshape, the fact that reshape_position sometimes starts with the
wrong value has minimal effect.  It is visible in sysfs and that
is all.
Signed-off-by: NNeilBrown <neilb@suse.de>

e516402c

31 7月, 2009 1 次提交

md/raid6: release spare page at ->stop() · 95fc17aa

由 Dan Williams 提交于 7月 31, 2009

Add missing call to safe_put_page from stop() by unifying open coded
raid5_conf_t de-allocation under free_conf().

Cc: <stable@kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

95fc17aa

15 7月, 2009 1 次提交

md/raid6: release spare page at ->stop() · a11034b4

由 Dan Williams 提交于 7月 14, 2009

Add missing call to safe_put_page from stop() by unifying open coded
raid5_conf_t de-allocation under free_conf().
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a11034b4

01 7月, 2009 3 次提交

md: use interruptible wait when duration is controlled by userspace. · e62e58a5

由 NeilBrown 提交于 7月 01, 2009

User space can set various limits on an md array so that resync waits
when it gets to a certain point, or so that I/O is blocked for a short
while.
When md is waiting against one of these limit, it should use an
interruptible wait so as not to add to the load average, and so are
not to trigger a warning if the wait goes on for too long.
Signed-off-by: NNeilBrown <neilb@suse.de>

e62e58a5

md/raid5: suspend shouldn't affect read requests. · a5c308d4

由 NeilBrown 提交于 7月 01, 2009

md allows write to regions on an array to be suspended temporarily.
This allows user-space to participate is aspects of reshape.
In particular, data can be copied with not risk of a race.
We should not be blocking read requests though, so don't.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

a5c308d4

md: Use new topology calls to indicate alignment and I/O sizes · 8f6c2e4b

由 Martin K. Petersen 提交于 7月 01, 2009

Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

8f6c2e4b

18 6月, 2009 10 次提交

md/raid5: correctly update sync_completed when we reach max_resync · 48606a9f

由 NeilBrown 提交于 6月 18, 2009

At the end of reshape_request we update cyrr_resync_completed
if we are about to pause due to reaching resync_max.
However we update it to the wrong value.  We need to add the
"reshape_sectors" that have just been reshaped.
Signed-off-by: NNeilBrown <neilb@suse.de>

48606a9f

md/raid5: add missing call to schedule() after prepare_to_wait() · 7a3ab908

由 Dan Williams 提交于 6月 16, 2009

In the unlikely event that reshape progresses past the current request
while it is waiting for a stripe we need to schedule() before retrying
for 2 reasons:
1/ Prevent list corruption from duplicated list_add() calls without
   intervening list_del().
2/ Give the reshape code a chance to make some progress to resolve the
   conflict.

Cc: <stable@kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7a3ab908

md: Push down reconstruction log message to personality code. · 8c6ac868

由 Andre Noll 提交于 6月 18, 2009

Currently, the md layer checks in analyze_sbs() if the raid level
supports reconstruction (mddev->level >= 1) and if reconstruction is
in progress (mddev->recovery_cp != MaxSector).

Move that printk into the personality code of those raid levels that
care (levels 1, 4, 5, 6, 10).
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

8c6ac868

md: merge reconfig and check_reshape methods. · 50ac168a

由 NeilBrown 提交于 6月 18, 2009

The difference between these two methods is artificial.
Both check that a pending reshape is valid, and perform any
aspect of it that can be done immediately.
'reconfig' handles chunk size and layout.
'check_reshape' handles raid_disks.

So make them just one method.
Signed-off-by: NNeilBrown <neilb@suse.de>

50ac168a

md: remove unnecessary arguments from ->reconfig method. · 597a711b

由 NeilBrown 提交于 6月 18, 2009

Passing the new layout and chunksize as args is not necessary as
the mddev has fields for new_check and new_layout.

This is preparation for combining the check_reshape and reconfig
methods
Signed-off-by: NNeilBrown <neilb@suse.de>

597a711b

md: raid5: check stripe cache is large enough in start_reshape · 01ee22b4

由 NeilBrown 提交于 6月 18, 2009

In reshape cases that do not change the number of devices,
start_reshape is called without first calling check_reshape.

Currently, the check that the stripe_cache is large enough is
only done in check_reshape.  It should be in start_reshape too.
Signed-off-by: NNeilBrown <neilb@suse.de>

01ee22b4

md: fix some comments. · cdc2ae6d

由 Andre Noll 提交于 6月 18, 2009

1/ Raid5 has learned to take over also raid4 and raid6 arrays.
2/ new_chunk in mdp_superblock_1 is in sectors, not bytes.
Signed-off-by: NNeilBrown <neilb@suse.de>

cdc2ae6d

A
md/raid5: Use is_power_of_2() in raid5_reconfig()/raid6_reconfig(). · 0ba459d2
由 Andre Noll 提交于 6月 18, 2009
```
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
0ba459d2

md: convert conf->chunk_size and conf->prev_chunk to sectors. · 09c9e5fa

由 Andre Noll 提交于 6月 18, 2009

This kills some more shifts.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

09c9e5fa

md: Convert mddev->new_chunk to sectors. · 664e7c41

由 Andre Noll 提交于 6月 18, 2009

A straight-forward conversion which gets rid of some
multiplications/divisions/shifts. The patch also introduces a couple
of new ones, most of which are due to conf->chunk_size still being
represented in bytes. This will be cleaned up in subsequent patches.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

664e7c41