提交 · 58691d64c44ae41ddf098ecb31e9a994026e3cff · openanolis / cloud-kernel

30 8月, 2009 8 次提交

由 Dan Williams 提交于 8月 29, 2009

Test raid6 p+q operations with a simple "always multiply by 1" q
calculation to fit into dmatest's current destination verification
scheme.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

58691d64

async_tx: add support for asynchronous RAID6 recovery operations · 0a82a623

由 Dan Williams 提交于 7月 14, 2009

 async_raid6_2data_recov() recovers two data disk failures

 async_raid6_datap_recov() recovers a data disk and the P disk

These routines are a port of the synchronous versions found in
drivers/md/raid6recov.c.  The primary difference is breaking out the xor
operations into separate calls to async_xor.  Two helper routines are
introduced to perform scalar multiplication where needed.
async_sum_product() multiplies two sources by scalar coefficients and
then sums (xor) the result.  async_mult() simply multiplies a single
source by a scalar.

This implemention also includes, in contrast to the original
synchronous-only code, special case handling for the 4-disk and 5-disk
array cases.  In these situations the default N-disk algorithm will
present 0-source or 1-source operations to dma devices.  To cover for
dma devices where the minimum source count is 2 we implement 4-disk and
5-disk handling in the recovery code.

[ Impact: asynchronous raid6 recovery routines for 2data and datap cases ]

Cc: Yuri Tikhonov <yur@emcraft.com>
Cc: Ilya Yanok <yanok@emcraft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0a82a623

async_tx: add support for asynchronous GF multiplication · b2f46fd8

由 Dan Williams 提交于 7月 14, 2009

[ Based on an original patch by Yuri Tikhonov ]

This adds support for doing asynchronous GF multiplication by adding
two additional functions to the async_tx API:

 async_gen_syndrome() does simultaneous XOR and Galois field
    multiplication of sources.

 async_syndrome_val() validates the given source buffers against known P
    and Q values.

When a request is made to run async_pq against more than the hardware
maximum number of supported sources we need to reuse the previous
generated P and Q values as sources into the next operation.  Care must
be taken to remove Q from P' and P from Q'.  For example to perform a 5
source pq op with hardware that only supports 4 sources at a time the
following approach is taken:

p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08}))
p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))

p' = p + q + q + src4 = p + src4
q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4

Note: 4 is the minimum acceptable maxpq otherwise we punt to
synchronous-software path.

The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as
sources (in the above manner) and fill the remaining slots up to maxpq
with the new sources/coefficients.

Note1: Some devices have native support for P+Q continuation and can skip
this extra work.  Devices with this capability can advertise it with
dma_set_maxpq.  It is up to each driver how to handle the
DMA_PREP_CONTINUE flag.

Note2: The api supports disabling the generation of P when generating Q,
this is ignored by the synchronous path but is implemented by some dma
devices to save unnecessary writes.  In this case the continuation
algorithm is simplified to only reuse Q as a source.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b2f46fd8

async_tx: remove walk of tx->parent chain in dma_wait_for_async_tx · 95475e57

由 Dan Williams 提交于 7月 14, 2009

We currently walk the parent chain when waiting for a given tx to
complete however this walk may race with the driver cleanup routine.
The routines in async_raid6_recov.c may fall back to the synchronous
path at any point so we need to be prepared to call async_tx_quiesce()
(which calls  dma_wait_for_async_tx).  To remove the ->parent walk we
guarantee that every time a dependency is attached ->issue_pending() is
invoked, then we can simply poll the initial descriptor until
completion.

This also allows for a lighter weight 'issue pending' implementation as
there is no longer a requirement to iterate through all the channels'
->issue_pending() routines as long as operations have been submitted in
an ordered chain.  async_tx_issue_pending() is added for this case.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

95475e57

async_tx: kill needless module_{init|exit} · af1f951e

由 Dan Williams 提交于 8月 29, 2009

If module_init and module_exit are nops then neither need to be defined.

[ Impact: pure cleanup ]
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

af1f951e

async_tx: add sum check flags · ad283ea4

由 Dan Williams 提交于 8月 29, 2009

Replace the flat zero_sum_result with a collection of flags to contain
the P (xor) zero-sum result, and the soon to be utilized Q (raid6 reed
solomon syndrome) zero-sum result.  Use the SUM_CHECK_ namespace instead
of DMA_ since these flags will be used on non-dma-zero-sum enabled
platforms.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ad283ea4

md/raid5,6: add percpu scribble region for buffer lists · d6f38f31

由 Dan Williams 提交于 7月 14, 2009

Use percpu memory rather than stack for storing the buffer lists used in
parity calculations.  Include space for dma address conversions and pass
that to async_tx via the async_submit_ctl.scribble pointer.

[ Impact: move memory pressure from stack to heap ]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d6f38f31

md/raid6: move the spare page to a percpu allocation · 36d1c647

由 Dan Williams 提交于 7月 14, 2009

In preparation for asynchronous handling of raid6 operations move the
spare page to a percpu allocation to allow multiple simultaneous
synchronous raid6 recovery operations.

Make this allocation cpu hotplug aware to maximize allocation
efficiency.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

36d1c647

15 7月, 2009 1 次提交

md/raid6: release spare page at ->stop() · a11034b4

由 Dan Williams 提交于 7月 14, 2009

Add missing call to safe_put_page from stop() by unifying open coded
raid5_conf_t de-allocation under free_conf().
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a11034b4

04 6月, 2009 3 次提交

async_xor: permit callers to pass in a 'dma/page scribble' region · 04ce9ab3

由 Dan Williams 提交于 6月 03, 2009

async_xor() needs space to perform dma and page address conversions.  In
most cases the code can simply reuse the struct page * array because the
size of the native pointer matches the size of a dma/page address.  In
order to support archs where sizeof(dma_addr_t) is larger than
sizeof(struct page *), or to preserve the input parameters, we utilize a
memory region passed in by the caller.

Since the code is now prepared to handle the case where it cannot
perform address conversions on the stack, we no longer need the
!HIGHMEM64G dependency in drivers/dma/Kconfig.

[ Impact: don't clobber input buffers for address conversions ]
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

04ce9ab3

async_tx: structify submission arguments, add scribble · a08abd8c

由 Dan Williams 提交于 6月 03, 2009

Prepare the api for the arrival of a new parameter, 'scribble'.  This
will allow callers to identify scratchpad memory for dma address or page
address conversions.  As this adds yet another parameter, take this
opportunity to convert the common submission parameters (flags,
dependency, callback, and callback argument) into an object that is
passed by reference.

Also, take this opportunity to fix up the kerneldoc and add notes about
the relevant ASYNC_TX_* flags for each routine.

[ Impact: moves api pass-by-value parameters to a pass-by-reference struct ]
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a08abd8c

async_tx: kill ASYNC_TX_DEP_ACK flag · 88ba2aa5

由 Dan Williams 提交于 4月 09, 2009

In support of inter-channel chaining async_tx utilizes an ack flag to
gate whether a dependent operation can be chained to another.  While the
flag is not set the chain can be considered open for appending.  Setting
the ack flag closes the chain and flags the descriptor for garbage
collection.  The ASYNC_TX_DEP_ACK flag essentially means "close the
chain after adding this dependency".  Since each operation can only have
one child the api now implicitly sets the ack flag at dependency
submission time.  This removes an unnecessary management burden from
clients of the api.

[ Impact: clean up and enforce one dependency per operation ]
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

88ba2aa5

09 4月, 2009 2 次提交

async_tx: rename zero_sum to val · 099f53cb

由 Dan Williams 提交于 4月 08, 2009

'zero_sum' does not properly describe the operation of generating parity
and checking that it validates against an existing buffer.  Change the
name of the operation to 'val' (for 'validate').  This is in
anticipation of the p+q case where it is a requirement to identify the
target parity buffers separately from the source buffers, because the
target parity buffers will not have corresponding pq coefficients.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

099f53cb

D

Merge branch 'dmaengine' into async-tx-raid6 · fd74ea65
由 Dan Williams 提交于 4月 08, 2009

fd74ea65

03 4月, 2009 1 次提交

dma: Add SoF and EoF debugging to ipu_idmac.c, minor cleanup · 8c6db1bb

由 Guennadi Liakhovetski 提交于 4月 02, 2009

Add Start-of-Frame and End-of-Frame debugging to ipu_idmac.c, in the
future it might also be needed for the actual video processing in
mx3-camera, at which point, the ISRs will have to be transferred to
mx3_camera.c, for which ipu_irq_map() and ipu_irq_unmap() functions will
have to be exported.

Also simplify a couple of pointer-dereferences.
Signed-off-by: NGuennadi Liakhovetski <lg@denx.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

8c6db1bb

02 4月, 2009 1 次提交

dw_dmac: add cyclic API to DW DMA driver · d9de4519

由 Hans-Christian Egtvedt 提交于 4月 01, 2009

This patch adds a cyclic DMA interface to the DW DMA driver. This is
very useful if you want to use the DMA controller in combination with a
sound device which uses cyclic buffers.

Using a DMA channel for cyclic DMA will disable the possibility to use
it as a normal DMA engine until the user calls the cyclic free function
on the DMA channel. Also a cyclic DMA list can not be prepared if the
channel is already active.
Signed-off-by: NHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: NHaavard Skinnemoen <haavard.skinnemoen@atmel.com>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d9de4519

31 3月, 2009 24 次提交

md/raid5 revise rules for when to update metadata during reshape · c8f517c4

由 NeilBrown 提交于 3月 31, 2009

We currently update the metadata :
 1/ every 3Megabytes
 2/ When the place we will write new-layout data to is recorded in
    the metadata as still containing old-layout data.

Rule one exists to avoid having to re-do too much reshaping in the
face of a crash/restart.  So it should really be time based rather
than size based.  So change it to "every 10 seconds".

Rule two turns out to be too harsh when restriping an array
'in-place', as in that case the metadata much be updates for every
stripe.
For the in-place update, it can only possibly be safe from a crash if
some user-space program data a backup of every e.g. few hundred
stripes before allowing them to be reshaped.  In that case, the
constant metadata update is pointless.
So only update the metadata if the new metadata will report that the
end of the 'old-layout' data is beyond where we are currently
writing 'new-layout' data.
Signed-off-by: NNeilBrown <neilb@suse.de>

c8f517c4

md/raid5: minor code cleanups in make_request. · b0f9ec04

由 NeilBrown 提交于 3月 31, 2009

... and to be certain the that make_request doesn't wait forever,
add a 'wake_up' when ->reshape_progress has been set to MaxSector
Signed-off-by: NNeilBrown <neilb@suse.de>

b0f9ec04

md: remove CONFIG_MD_RAID_RESHAPE config option. · 2cffc4a0

由 NeilBrown 提交于 3月 31, 2009

This was only needed when the code was experimental.  Most of it
is well tested now, so the option is no longer useful.
Signed-off-by: NNeilBrown <neilb@suse.de>

2cffc4a0

md/raid5: be more careful about write ordering when reshaping. · ab69ae12

由 NeilBrown 提交于 3月 31, 2009

When we are reshaping an array, it is very important that we read
the data from a particular sector offset before writing new data
at that offset.

In most cases when growing or shrinking an array we read long before
we even consider writing.  But when restriping an array without
changing it size, there is a small possibility that we might have
some data to available write before the read has happened at the same
location.  This would require some stripes to be in cache already.

To guard against this small possibility, we check, before writing,
that the 'old' stripe at the same location is not in the process of
being read.  And we ensure that we mark all 'source' stripes as such
before allowing new 'destination' stripes to proceed.
Signed-off-by: NNeilBrown <neilb@suse.de>

ab69ae12

md: don't display meaningless values in sysfs files resync_start and sync_speed · d1a7c503

由 NeilBrown 提交于 3月 31, 2009

When no resync if happening, both of these files currently have
meaningless values (is slightly different ways).
Change them to "none" in that case.
Signed-off-by: NNeilBrown <neilb@suse.de>

d1a7c503

md/raid5: allow layout and chunksize to be changed on active array. · 88ce4930

由 NeilBrown 提交于 3月 31, 2009

If an array has 3 or more devices, we allow the chunksize or layout
to be changed and when a reshape starts, we use these as the 'new'
values.
Signed-off-by: NNeilBrown <neilb@suse.de>

88ce4930

md/raid5: reshape using largest of old and new chunk size · 7a661381

由 NeilBrown 提交于 3月 31, 2009

This ensures that even when old and new stripes are overlapping,
we will try to read all of the old before having to write any
of the new.
Signed-off-by: NNeilBrown <neilb@suse.de>

7a661381

md/raid5: prepare for allowing reshape to change layout · e183eaed

由 NeilBrown 提交于 3月 31, 2009

Add prev_algo to raid5_conf_t along the same lines as prev_chunk
and previous_raid_disks.
Signed-off-by: NNeilBrown <neilb@suse.de>

e183eaed

md/raid5: prepare for allowing reshape to change chunksize. · 784052ec

由 NeilBrown 提交于 3月 31, 2009

Add "prev_chunk" to raid5_conf_t, similar to "previous_raid_disks", to
remember what the chunk size was before the reshape that is currently
underway.

This seems like duplication with "chunk_size" and "new_chunk" in
mddev_t, and to some extent it is, but there are differences.
The values in mddev_t are always defined and often the same.
The prev* values are only defined if a reshape is underway.

Also (and more significantly) the raid5_conf_t values will be changed
at the same time (inside an appropriate lock) that the reshape is
started by setting reshape_position.  In contrast, the new_chunk value
is set when the sysfs file is written which could be well before the
reshape starts.
Signed-off-by: NNeilBrown <neilb@suse.de>

784052ec

md/raid5: clearly differentiate 'before' and 'after' stripes during reshape. · 86b42c71

由 NeilBrown 提交于 3月 31, 2009

During a raid5 reshape, we have some stripes in the cache that are
'before' the reshape (and are still to be processed) and some that are
'after'.  They are currently differentiated by having different
->disks values as the only reshape current supported involves changing
the number of disks.

However we will soon support reshapes that do not change the number
of disks (chunk parity or chunk size).  So make the difference more
explicit with a 'generation' number.
Signed-off-by: NNeilBrown <neilb@suse.de>

86b42c71

Documentation/md.txt update · 11373542

由 NeilBrown 提交于 3月 31, 2009

Update md.txt to reflect recent changes in a number of sysfs
attributes.
Signed-off-by: NNeilBrown <neilb@suse.de>

11373542

md: allow number of drives in raid5 to be reduced · ec32a2bd

由 NeilBrown 提交于 3月 31, 2009

When reshaping a raid5 to have fewer devices, we work from the end of
the array to the beginning.
md_do_sync gives addresses to sync_request that go from the beginning
to the end.  So largely ignore them use the internal state variable
"reshape_progress" to keep track of what to do next.

Never allow the size to be reduced below the minimum (4 for raid6,
3 otherwise).

We require that the size of the array has already been reduced before
the array is reshaped to a smaller size.  This is because simply
reducing the size is an easily reversible operation, while the reshape
is immediately destructive and so is not reversible for the blocks at
the ends of the devices.
Thus to reshape an array to have fewer devices, you must first write
an appropriately small size to md/array_size.

When reshape finished, we remove any drives that are no longer
needed and fix up ->degraded.
Signed-off-by: NNeilBrown <neilb@suse.de>

ec32a2bd

md/raid5: change reshape-progress measurement to cope with reshaping backwards. · fef9c61f

由 NeilBrown 提交于 3月 31, 2009

When reducing the number of devices in a raid4/5/6, the reshape
process has to start at the end of the array and work down to the
beginning.  So we need to handle expand_progress and expand_lo
differently.

This patch renames "expand_progress" and "expand_lo" to avoid the
implication that anything is getting bigger (expand->reshape) and
every place they are used, we make sure that they are used the right
way depending on whether delta_disks is positive or negative.
Signed-off-by: NNeilBrown <neilb@suse.de>

fef9c61f

md: add explicit method to signal the end of a reshape. · cea9c228

由 NeilBrown 提交于 3月 31, 2009

Currently raid5 (the only module that supports restriping)
notices that the reshape has finished be sync_request being
given a large value, and handles any cleanup them.

This patch changes it so md_check_recovery calls into an
explicit finish_reshape method as well.

The clean-up from sync_request can do things that need to be
done promptly, typically things local to the raid5_conf_t
structure.

The "finish_reshape" method is called under the mddev_lock
so it can do things involving reconfiguring the device.

This allows us to get rid of md_set_array_sectors_locked, which
would have caused a deadlock if you tried to stop and array
while a reshape was happening.
Signed-off-by: NNeilBrown <neilb@suse.de>

cea9c228

md/raid5: enhance raid5_size to work correctly with negative delta_disks · 7ec05478

由 NeilBrown 提交于 3月 31, 2009

This is the first of four patches which combine to allow md/raid5 to
reduce the number of devices in the array by restriping the data over
a subset of the devices.

If the number of disks in a raid4/5/6 is being reduced, then the
default size must be based on the new number, not the old number
of devices.
In general, it should be based on the smaller of new and old.
Signed-off-by: NNeilBrown <neilb@suse.de>

7ec05478

md/raid5: drop qd_idx from r6_state · 34e04e87

由 NeilBrown 提交于 3月 31, 2009

We now have this value in stripe_head so we don't need to duplicate
it.
Signed-off-by: NNeilBrown <neilb@suse.de>

34e04e87

md/raid6: move raid6 data processing to raid6_pq.ko · f701d589

由 Dan Williams 提交于 3月 31, 2009

Move the raid6 data processing routines into a standalone module
(raid6_pq) to prepare them to be called from async_tx wrappers and other
non-md drivers/modules.  This precludes a circular dependency of raid456
needing the async modules for data processing while those modules in
turn depend on raid456 for the base level synchronous raid6 routines.

To support this move:
1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
2/ The raid6_call, recovery calls, and table symbols are exported
3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
   compile
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f701d589

md: raid5 run(): Fix max_degraded for raid level 4. · 18b00334

由 Andre Noll 提交于 3月 31, 2009

raid4 allows only one failed disk.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

18b00334

md: 'array_size' sysfs attribute · b522adcd

由 Dan Williams 提交于 3月 31, 2009

Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
   a) If size is set before the array is running, do_md_run will fail
      if size is greater than the default size
   b) A reshape attempt that reduces the default size to less than the set
      array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
   kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size.  Resync/reshape operations
always follow the default size.

Finally, fixup other locations that read a number of 1k-blocks from
userspace to use strict_blocks_to_sectors() which checks for unsigned
long long to sector_t overflow and blocks to sectors overflow.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b522adcd

md: centralize ->array_sectors modifications · 1f403624

由 Dan Williams 提交于 3月 31, 2009

Get personalities out of the business of directly modifying
->array_sectors.  Lays groundwork to introduce policy on when
->array_sectors can be modified.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1f403624

md: add 'size' as a personality method · 80c3a6ce

由 Dan Williams 提交于 3月 17, 2009

In preparation for giving userspace control over ->array_sectors we need
to be able to retrieve the 'default' size, and the 'anticipated' size
when a reshape is requested.  For personalities that do not reshape emit
a warning if anything but the default size is requested.

In the raid5 case we need to update ->previous_raid_disks to make the
new 'default' size available.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

80c3a6ce

md: fix typo in FSF address · 93ed05e2

由 Atsushi SAKAI 提交于 3月 31, 2009

Hello,

 I found a typo Bosto"m" in FSF address.
And I am checking around linux source code.
Here is the only place which uses Bosto"m" (not Boston).
Signed-off-by: NAtsushi SAKAI <sakaia@jp.fujitsu.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

93ed05e2

md: add takeover support for converting raid6 back into raid5 · fc9739c6

由 NeilBrown 提交于 3月 31, 2009

If a raid6 is still in the layout that comes from converting raid5
into a raid6. this will allow us to convert it back again.
Signed-off-by: NNeilBrown <neilb@suse.de>

fc9739c6

N
md: add takeover support for raid4 -> raid5 conversion. · e9d4758f
由 NeilBrown 提交于 3月 31, 2009
```
Signed-off-by: NNeilBrown <neilb@suse.de>
```
e9d4758f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功