提交 · b522adcde9c4d3fb7b579cfa9160d8bde7744be8 · openanolis / cloud-kernel

31 3月, 2009 20 次提交

md: 'array_size' sysfs attribute · b522adcd

由 Dan Williams 提交于 3月 31, 2009

Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
   a) If size is set before the array is running, do_md_run will fail
      if size is greater than the default size
   b) A reshape attempt that reduces the default size to less than the set
      array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
   kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size.  Resync/reshape operations
always follow the default size.

Finally, fixup other locations that read a number of 1k-blocks from
userspace to use strict_blocks_to_sectors() which checks for unsigned
long long to sector_t overflow and blocks to sectors overflow.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b522adcd

md: centralize ->array_sectors modifications · 1f403624

由 Dan Williams 提交于 3月 31, 2009

Get personalities out of the business of directly modifying
->array_sectors.  Lays groundwork to introduce policy on when
->array_sectors can be modified.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1f403624

md: add 'size' as a personality method · 80c3a6ce

由 Dan Williams 提交于 3月 17, 2009

In preparation for giving userspace control over ->array_sectors we need
to be able to retrieve the 'default' size, and the 'anticipated' size
when a reshape is requested.  For personalities that do not reshape emit
a warning if anything but the default size is requested.

In the raid5 case we need to update ->previous_raid_disks to make the
new 'default' size available.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

80c3a6ce

md: add takeover support for converting raid6 back into raid5 · fc9739c6

由 NeilBrown 提交于 3月 31, 2009

If a raid6 is still in the layout that comes from converting raid5
into a raid6. this will allow us to convert it back again.
Signed-off-by: NNeilBrown <neilb@suse.de>

fc9739c6

N
md: add takeover support for raid4 -> raid5 conversion. · e9d4758f
由 NeilBrown 提交于 3月 31, 2009
```
Signed-off-by: NNeilBrown <neilb@suse.de>
```
e9d4758f

md/raid5: allow layout/chunksize to be changed on an active 2-drive raid5. · b3546035

由 NeilBrown 提交于 3月 31, 2009

2-drive raid5's aren't very interesting.  But if you are converting
a raid1 into a raid5, you will at least temporarily have one.  And
that it a good time to set the layout/chunksize for the new RAID5
if you aren't happy with the defaults.

layout and chunksize don't actually affect the placement of data
on a 2-drive raid5, so we just do some internal book-keeping.
Signed-off-by: NNeilBrown <neilb@suse.de>

b3546035

md: add ->takeover method for raid5 to be able to take over raid1 · d562b0c4

由 NeilBrown 提交于 3月 31, 2009

The RAID1 must have two drives and be a suitable size to
be a multiple of a chunksize that isn't too small.
Signed-off-by: NNeilBrown <neilb@suse.de>

d562b0c4

md: add ->takeover method to support changing the personality managing an array · 245f46c2

由 NeilBrown 提交于 3月 31, 2009

Implement this for RAID6 to be able to 'takeover' a RAID5 array.  The
new RAID6 will use a layout which places Q on the last device, and
that device will be missing.
If there are any available spares, one will immediately have Q
recovered onto it.
Signed-off-by: NNeilBrown <neilb@suse.de>

245f46c2

md: md_unregister_thread should cope with being passed NULL · e0cf8f04

由 NeilBrown 提交于 3月 31, 2009

Mostly md_unregister_thread is only called when we know that the
thread is NULL, but sometimes we need to check first.  It is safer
to put the check inside md_unregister_thread itself.
Signed-off-by: NNeilBrown <neilb@suse.de>

e0cf8f04

md/raid5: refactor raid5 "run" · 91adb564

由 NeilBrown 提交于 3月 31, 2009

.. so that the code to create the private data structures is separate.
This will help with future code to change the level of an active
array.
Signed-off-by: NNeilBrown <neilb@suse.de>

91adb564

md/raid5: finish support for DDF/raid6 · 67cc2b81

由 NeilBrown 提交于 3月 31, 2009

DDF requires RAID6 calculations over different devices in a different
order.
For md/raid6, we calculate over just the data devices, starting
immediately after the 'Q' block.
For ddf/raid6 we calculate over all devices, using zeros in place of
the P and Q blocks.

This requires unfortunately complex loops...
Signed-off-by: NNeilBrown <neilb@suse.de>

67cc2b81

md/raid5: Add support for new layouts for raid5 and raid6. · 99c0fb5f

由 NeilBrown 提交于 3月 31, 2009

DDF uses different layouts for P and Q blocks than current md/raid6
so add those that are missing.
Also add support for RAID6 layouts that are identical to various
raid5 layouts with the simple addition of one device to hold all of
the 'Q' blocks.
Finally add 'raid5' layouts to match raid4.
These last to will allow online level conversion.

Note that this does not provide correct support for DDF/raid6 yet
as the order in which data blocks are summed to produce the Q block
is significant and different between current md code and DDF
requirements.
Signed-off-by: NNeilBrown <neilb@suse.de>

99c0fb5f

md/raid5: simplify raid5_compute_sector interface · 911d4ee8

由 NeilBrown 提交于 3月 31, 2009

Rather than passing 'pd_idx' and 'qd_idx' to be filled in, pass
a 'struct stripe_head *' and fill in the relevant fields.  This is
more extensible.
Signed-off-by: NNeilBrown <neilb@suse.de>

911d4ee8

md/raid6: remove expectation that Q device is immediately after P device. · d0dabf7e

由 NeilBrown 提交于 3月 31, 2009


Code currently assumes that the devices in a raid6 stripe are
  0 1 ... N-1 P Q
in some rotated order.  We will shortly add new layouts in which
this strict pattern is broken.
So remove this expectation.  We still assume that the data disks
are roughly in-order.  However P and Q can be inserted anywhere within
that order.
Signed-off-by: NNeilBrown <neilb@suse.de>

d0dabf7e

md/raid5: change raid5_compute_sector and stripe_to_pdidx to take a 'previous' argument · 112bf897

由 NeilBrown 提交于 3月 31, 2009

This similar to the recent change to get_active_stripe.
There is no functional change, just come rearrangement to make
future patches cleaner.
Signed-off-by: NNeilBrown <neilb@suse.de>

112bf897

md/raid5: simplify interface for init_stripe and get_active_stripe · b5663ba4

由 NeilBrown 提交于 3月 31, 2009

Rather than passing 'pd_idx' and 'disks' to these functions, just pass
'previous' which tells whether to use the 'previous' or 'current'
geometry during a reshape, and let init_stripe calculate
disks and pd_idx and anything else it might need.

This is not a substantial simplification and even adds a division.
However we will shortly be adding more complexity to init_stripe
to handle more interesting 'reshape' activities, and without this
change, the interface to these functions would get very complex.
Signed-off-by: NNeilBrown <neilb@suse.de>

b5663ba4

md: Make mddev->size sector-based. · 58c0fed4

由 Andre Noll 提交于 3月 31, 2009

This patch renames the "size" field of struct mddev_s to "dev_sectors"
and stores the number of 512-byte sectors instead of the number of
1K-blocks in it.

All users of that field, including raid levels 1,4-6,10, are adjusted
accordingly. This simplifies the code a bit because it allows to get
rid of a couple of divisions/multiplications by two.

In order to make checkpatch happy, some minor coding style issues
have also been addressed. In particular, size_store() now uses
strict_strtoull() instead of simple_strtoull().
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

58c0fed4

N
md: move md_k.h from include/linux/raid/ to drivers/md/ · 43b2e5d8
由 NeilBrown 提交于 3月 31, 2009
```
It really is nicer to keep related code together..
Signed-off-by: NNeilBrown <neilb@suse.de>
```
43b2e5d8

md: move lots of #include lines out of .h files and into .c · bff61975

由 NeilBrown 提交于 3月 31, 2009

This makes the includes more explicit, and is preparation for moving
md_k.h to drivers/md/md.h

Remove include/raid/md.h as its only remaining use was to #include
other files.
Signed-off-by: NNeilBrown <neilb@suse.de>

bff61975

md: move headers out of include/linux/raid/ · ef740c37

由 Christoph Hellwig 提交于 3月 31, 2009

Move the headers with the local structures for the disciplines and
bitmap.h into drivers/md/ so that they are more easily grepable for
hacking and not far away.  md.h is left where it is for now as there
are some uses from the outside.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

ef740c37

09 1月, 2009 1 次提交

md: use list_for_each_entry macro directly · 159ec1fc

由 Cheng Renquan 提交于 1月 09, 2009

The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to
list_for_each_entry_safe, from <linux/list.h>, it should be defined to
use list_for_each_entry_safe, instead of reinventing the wheel.

But some calls to each_entry_safe don't really need a safe version,
just a direct list_for_each_entry is enough, this could save a temp
variable (tmp) in every function that used rdev_for_each.

In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
totally save many tmp vars; and only in the other situations that will call
list_del to delete an entry, the safe version is used.
Signed-off-by: NCheng Renquan <crquan@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

159ec1fc

13 10月, 2008 3 次提交

md: Relax minimum size restrictions on chunk_size. · 4bbf3771

由 NeilBrown 提交于 10月 13, 2008

Currently, the 'chunk_size' of an array must be at-least PAGE_SIZE.

This makes moving an array to a machine with a larger PAGE_SIZE, or
changing the kernel to use a larger PAGE_SIZE, can stop an array from
working.

For RAID10 and RAID4/5/6, this is non-trivial to fix as the resync
process works on whole pages at a time, and assumes them to be wholly
within a stripe.  For other raid personalities, this restriction is
not needed at all and can be dropped.

So remove the test on chunk_size from common can, and add it in just
the places where it is needed: raid10 and raid4/5/6.
Signed-off-by: NNeilBrown <neilb@suse.de>

4bbf3771

md: remove space after function name in declaration and call. · d710e138

由 NeilBrown 提交于 10月 13, 2008

Having
   function (args)
instead of
   function(args)

make is harder to search for calls of particular functions.
So remove all those spaces.
Signed-off-by: NNeilBrown <neilb@suse.de>

d710e138

N
md: Remove unnecessary #includes, #defines, and function declarations. · fb4d8c76
由 NeilBrown 提交于 10月 13, 2008
```
A lot of cruft has gathered over the years.  Time to remove it.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
fb4d8c76

09 10月, 2008 4 次提交

block: move stats from disk to part0 · 074a7aca

由 Tejun Heo 提交于 8月 25, 2008

Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...

* part_stat_*() now updates part0 together if the specified partition
  is not part0.  ie. part_stat_*() are now essentially all_stat_*().

* {disk|all}_stat_*() are gone.

* part_round_stats() is updated similary.  It handles part0 stats
  automatically and disk_round_stats() is killed.

* part_{inc|dec}_in_fligh() is implemented which automatically updates
  part0 stats for parts other than part0.

* disk_map_sector_rcu() is updated to return part0 if no part matches.
  Combined with the above changes, this makes NULL special case
  handling in callers unnecessary.

* Separate stats show code paths for disk are collapsed into part
  stats show code paths.

* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

074a7aca

block: fix diskstats access · c9959059

由 Tejun Heo 提交于 8月 25, 2008

There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters.  It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.

This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access).  diskstats access
should always be enclosed between the two functions.  As such, there's
no need for the versions which disables preemption.  They're removed
and double underbars ones are renamed to drop the underbars.  As an
extra argument is added, there's no danger of using the old version
unconverted.

disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.

This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others.  Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c9959059

block: make bi_phys_segments an unsigned int instead of short · 5b99c2ff

由 Jens Axboe 提交于 8月 15, 2008

raid5 can overflow with more than 255 stripes, and we can increase it
to an int for free on both 32 and 64-bit archs due to the padding.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5b99c2ff

J
block: raid fixups for removal of bi_hw_segments · 960e739d
由 Jens Axboe 提交于 8月 15, 2008
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
960e739d

05 8月, 2008 2 次提交

Don't let a blocked_rdev interfere with read request in raid5/6 · ac4090d2

由 NeilBrown 提交于 8月 05, 2008

When we have externally managed metadata, we need to mark a failed
device as 'Blocked' and not allow any writes until that device
have been marked as faulty in the metadata and the Blocked flag has
been removed.

However it is perfectly OK to allow read requests when there is a
Blocked device, and with a readonly array, there may not be any
metadata-handler watching for blocked devices.

So in raid5/raid6 only allow a Blocked device to interfere with
Write request or resync.  Read requests go through untouched.

raid1 and raid10 already differentiate between read and write
properly.
Signed-off-by: NNeilBrown <neilb@suse.de>

ac4090d2

Fail safely when trying to grow an array with a write-intent bitmap. · dba034ee

由 NeilBrown 提交于 8月 05, 2008

We cannot currently change the size of a write-intent bitmap.
So if we change the size of an array which has such a bitmap, it
tries to set bits beyond the end of the bitmap.

For now, simply reject any request to change the size of an array
which has a bitmap.  mdadm can remove the bitmap and add a new one
after the array has changed size.
Signed-off-by: NNeilBrown <neilb@suse.de>

dba034ee

29 7月, 2008 1 次提交

md: do not progress the resync process if the stripe was blocked · df10cfbc

由 Dan Williams 提交于 7月 28, 2008

handle_stripe will take no action on a stripe when waiting for userspace
to unblock the array, so do not report completed sectors.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

df10cfbc

24 7月, 2008 2 次提交

md: fix merge error · 23397883

由 Dan Williams 提交于 7月 23, 2008

The original STRIPE_OP_IO removal patch had the following hunk:

-               for (i = conf->raid_disks; i--; ) {
+               for (i = conf->raid_disks; i--; )
                        set_bit(R5_Wantwrite, &sh->dev[i].flags);
-                       if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
-                               sh->ops.count++;
-               }

However it appears the hunk became broken after merging:
-               for (i = conf->raid_disks; i--; ) {
+               for (i = conf->raid_disks; i--; )
                        set_bit(R5_Wantwrite, &sh->dev[i].flags);
                        set_bit(R5_LOCKED, &dev->flags);
                        s.locked++;
-                       if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
-                               sh->ops.count++;
-               }
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

23397883

md: move async_tx_issue_pending_all outside spin_lock_irq · c9f21aaf

由 Dan Williams 提交于 7月 23, 2008

Some dma drivers need to call spin_lock_bh in their device_issue_pending
routines.  This change avoids:

WARNING: at kernel/softirq.c:136 local_bh_enable_ip+0x3a/0x85()
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c9f21aaf

21 7月, 2008 1 次提交

md: Make mddev->array_size sector-based. · f233ea5c

由 Andre Noll 提交于 7月 21, 2008

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

f233ea5c

10 7月, 2008 1 次提交

md: ensure all blocks are uptodate or locked when syncing · 7a1fc53c

由 Dan Williams 提交于 7月 10, 2008

Remove the dubious attempt to prefer 'compute' over 'read'.  Not only is it
wrong given commit c337869d (md: do not compute parity unless it is on a failed
drive), but it can trigger a BUG_ON in handle_parity_checks5().

Cc: <stable@kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

7a1fc53c

03 7月, 2008 1 次提交

Add bvec_merge_data to handle stacked devices and ->merge_bvec() · cc371e66

由 Alasdair G Kergon 提交于 7月 03, 2008

When devices are stacked, one device's merge_bvec_fn may need to perform
the mapping and then call one or more functions for its underlying devices.

The following bio fields are used:
  bio->bi_sector
  bio->bi_bdev
  bio->bi_size
  bio->bi_rw  using bio_data_dir()

This patch creates a new struct bvec_merge_data holding a copy of those
fields to avoid having to change them directly in the struct bio when
going down the stack only to have to change them back again on the way
back up.  (And then when the bio gets mapped for real, the whole
exercise gets repeated, but that's a problem for another day...)
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cc371e66

01 7月, 2008 1 次提交

md: resolve external metadata handling deadlock in md_allow_write · b5470dc5

由 Dan Williams 提交于 6月 27, 2008

md_allow_write() marks the metadata dirty while holding mddev->lock and then
waits for the write to complete.  For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN.  The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail.  Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b5470dc5

28 6月, 2008 3 次提交

md: rationalize raid5 function names · 1fe797e6

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

Commit a4456856 refactored some of the deep code paths in raid5.c into separate
functions.  The names chosen at the time do not consistently indicate what is
going to happen to the stripe.  So, update the names, and since a stripe is a
cache element use cache semantics like fill, dirty, and clean.

(also, fix up the indentation in fetch_block5)
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

1fe797e6

md: handle operation chaining in raid5_run_ops · 7b3a871e

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

Neil said:
> At the end of ops_run_compute5 you have:
>         /* ack now if postxor is not set to be run */
>         if (tx && !test_bit(STRIPE_OP_POSTXOR, &s->ops_run))
>                 async_tx_ack(tx);
>
> It looks odd having that test there.  Would it fit in raid5_run_ops
> better?

The intended global interpretation is that raid5_run_ops can build a chain
of xor and memcpy operations.  When MD registers the compute-xor it tells
async_tx to keep the operation handle around so that another item in the
dependency chain can be submitted. If we are just computing a block to
satisfy a read then we can terminate the chain immediately.  raid5_run_ops
gives a better context for this test since it cares about the entire chain.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

7b3a871e

md: replace R5_WantPrexor with R5_WantDrain, add 'prexor' reconstruct_states · d8ee0728

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

Currently ops_run_biodrain and other locations have extra logic to determine
which blocks are processed in the prexor and non-prexor cases.  This can be
eliminated if handle_write_operations5 flags the blocks to be processed in all
cases via R5_Wantdrain.  The presence of the prexor operation is tracked in
sh->reconstruct_state.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

d8ee0728

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功