提交 · ccacc7d2cf03114a24ab903f710118e9e5d43273 · openanolis / cloud-kernel

09 1月, 2009 5 次提交

md: raid0: make hash_spacing and preshift sector-based. · ccacc7d2

由 Andre Noll 提交于 1月 09, 2009

This patch renames the hash_spacing and preshift members of struct
raid0_private_data to spacing and sector_shift respectively and
changes the semantics as follows:

We always have spacing = 2 * hash_spacing. In case
sizeof(sector_t) > sizeof(u32) we also have sector_shift = preshift + 1
while sector_shift = preshift = 0 otherwise.

Note that the values of nb_zone and zone are unaffected by these changes
because in the sector_div() preceeding the assignement of these two
variables both arguments double.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

ccacc7d2

md: raid0: Represent the size of strip zones in sectors. · 83838ed8

由 Andre Noll 提交于 1月 09, 2009

This completes the block -> sector conversion of struct strip_zone.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

83838ed8

md: raid0: Represent zone->zone_offset in sectors. · 6199d3db

由 Andre Noll 提交于 1月 09, 2009

For the same reason as in the previous patch, rename it from zone_offset
to zone_start.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

6199d3db

md: raid0: Represent device offset in sectors. · 019c4e2f

由 Andre Noll 提交于 1月 09, 2009

Rename zone->dev_offset to zone->dev_start to make sure all users
have been converted.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

019c4e2f

md: use sysfs_notify_dirent to notify changes to md/sync_action. · 0c3573f1

由 NeilBrown 提交于 1月 09, 2009

There is no compelling need for this, but sysfs_notify_dirent is a
nicer interface and the change is good for consistency.
Signed-off-by: NNeilBrown <neilb@suse.de>

0c3573f1

21 10月, 2008 2 次提交

md: use sysfs_notify_dirent to notify changes to md/dev-xxx/state · 3c0ee63a

由 NeilBrown 提交于 10月 21, 2008

The 'state' file for a device reports, for example, when the device
has failed.  Changes should be reported to userspace ASAP without
the possibility of blocking on low-memory.  sysfs_notify does
have that possibility (as it takes a mutex which can be held
across a kmalloc) so use sysfs_notify_dirent instead.
Signed-off-by: NNeilBrown <neilb@suse.de>

3c0ee63a

md: use sysfs_notify_dirent to notify changes to md/array_state · b62b7590

由 NeilBrown 提交于 10月 21, 2008

Now that we have sysfs_notify_dirent, use it to notify changes
to md/array_state.
As sysfs_notify_dirent can be called in atomic context, we can
remove the delayed notify and the MD_NOTIFY_ARRAY_STATE flag.
Signed-off-by: NNeilBrown <neilb@suse.de>

b62b7590

13 10月, 2008 4 次提交

md: remove space after function name in declaration and call. · d710e138

由 NeilBrown 提交于 10月 13, 2008

Having
   function (args)
instead of
   function(args)

make is harder to search for calls of particular functions.
So remove all those spaces.
Signed-off-by: NNeilBrown <neilb@suse.de>

d710e138

N
md: Remove unnecessary #includes, #defines, and function declarations. · fb4d8c76
由 NeilBrown 提交于 10月 13, 2008
```
A lot of cruft has gathered over the years.  Time to remove it.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
fb4d8c76

md: Convert remaining 1k representations in linear.c to sectors. · ab5bd5cb

由 Andre Noll 提交于 10月 13, 2008

This patch renames hash_spacing and preshift to  spacing and
sector_shift respectively with the following change of semantics:

Case 1: (sizeof(sector_t) <= sizeof(u32)).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this case, we have sector_shift = preshift = 0 and spacing =
2 * hash_spacing.

Hence, the index for the hash table which is computed by the new code
in which_dev() as sector / spacing equals the old value which was
(sector/2) / hash_spacing.

Note also that the value of nb_zone stays the same because both sz
and base double.

Case 2: (sizeof(sector_t) > sizeof(u32)).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(aka the shifting dance case). Here we have sector_shift = preshift +
1 and

spacing = 2 * hash_spacing

during the computation of nb_zone and curr_sector, but

spacing = hash_spacing

in which_dev() because in the last hunk of the patch for linear.c we
shift down conf->spacing (= 2 * hash_spacing) by one more bit than
in the old code.

Hence in the computation of nb_zone, sz and base have the same value
as before, so nb_zone is not affected. Also curr_sector in the next
hunk stays the same.

In which_dev() the hash table index is computed as

(sector >> sector_shift) / spacing

In view of sector_shift = preshift + 1 and spacing = hash_spacing,
this equals

((sector/2) >> preshift) / hash_spacing

which is the value computed by the old code.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

ab5bd5cb

md: linear: Represent dev_info->size and dev_info->offset in sectors. · 6283815d

由 Andre Noll 提交于 10月 13, 2008

Rename them to num_sectors and start_sector which is more descriptive.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

6283815d

24 7月, 2008 1 次提交

md: delay notification of 'active_idle' to the recovery thread · d8e64406

由 Dan Williams 提交于 7月 23, 2008

sysfs_notify might sleep, so do not call it from md_safemode_timeout.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d8e64406

21 7月, 2008 4 次提交

md: Protect access to mddev->disks list using RCU · 4b80991c

由 NeilBrown 提交于 7月 21, 2008

All modifications and most access to the mddev->disks list are made
under the reconfig_mutex lock.  However there are three places where
the list is walked without any locking.  If a reconfig happens at this
time, havoc (and oops) can ensue.

So use RCU to protect these accesses:
  - wrap them in rcu_read_{,un}lock()
  - use list_for_each_entry_rcu
  - add to the list with list_add_rcu
  - delete from the list with list_del_rcu
  - delay the 'free' with call_rcu rather than schedule_work

Note that export_rdev did a list_del_init on this list.  In almost all
cases the entry was not in the list anymore so it was a no-op and so
safe.  It is no longer safe as after list_del_rcu we may not touch
the list_head.
An audit shows that export_rdev is called:
  - after unbind_rdev_from_array, in which case the delete has
     already been done,
  - after bind_rdev_to_array fails, in which case the delete isn't needed.
  - before the device has been put on a list at all (e.g. in
      add_new_disk where reading the superblock fails).
  - and in autorun devices after a failure when the device is on a
      different list.

So remove the list_del_init call from export_rdev, and add it back
immediately before the called to export_rdev for that last case.

Note also that ->same_set is sometimes used for lists other than
mddev->list (e.g. candidates).  In these cases rcu is not needed.
Signed-off-by: NNeilBrown <neilb@suse.de>

4b80991c

md: only count actual openers as access which prevent a 'stop' · f2ea68cf

由 NeilBrown 提交于 7月 21, 2008

Open isn't the only thing that increments ->active.  e.g. reading
/proc/mdstat will increment it briefly.  So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.
Signed-off-by: NNeilBrown <neilb@suse.de>

f2ea68cf

A
md: linear: Make array_size sector-based and rename it to array_sectors. · d6e22150
由 Andre Noll 提交于 7月 21, 2008
```
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
d6e22150

md: Make mddev->array_size sector-based. · f233ea5c

由 Andre Noll 提交于 7月 21, 2008

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

f233ea5c

11 7月, 2008 2 次提交

md: Remove some unused macros. · 7e93a892

由 Andre Noll 提交于 7月 11, 2008

Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

7e93a892

md: Turn rdev->sb_offset into a sector-based quantity. · 0f420358

由 Andre Noll 提交于 7月 11, 2008

Rename it to sb_start to make sure all users have been converted.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

0f420358

01 7月, 2008 1 次提交

md: resolve external metadata handling deadlock in md_allow_write · b5470dc5

由 Dan Williams 提交于 6月 27, 2008

md_allow_write() marks the metadata dirty while holding mddev->lock and then
waits for the write to complete.  For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN.  The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail.  Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b5470dc5

28 6月, 2008 9 次提交

md: replace R5_WantPrexor with R5_WantDrain, add 'prexor' reconstruct_states · d8ee0728

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

Currently ops_run_biodrain and other locations have extra logic to determine
which blocks are processed in the prexor and non-prexor cases.  This can be
eliminated if handle_write_operations5 flags the blocks to be processed in all
cases via R5_Wantdrain.  The presence of the prexor operation is tracked in
sh->reconstruct_state.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

d8ee0728

md: replace STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} with 'reconstruct_states' · 600aa109

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

Track the state of reconstruct operations (recalculating the parity block
usually due to incoming writes, or as part of array expansion)  Reduces the
scope of the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags to only tracking whether
a reconstruct operation has been requested via the ops_request field of struct
stripe_head_state.

This is the final step in the removal of ops.{pending,ack,complete,count}, i.e.
the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags only request an operation and do
not track the state of the operation.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

600aa109

md: replace STRIPE_OP_CHECK with 'check_states' · ecc65c9b

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

The STRIPE_OP_* flags record the state of stripe operations which are
performed outside the stripe lock. Their use in indicating which
operations need to be run is straightforward; however, interpolating what
the next state of the stripe should be based on a given combination of
these flags is not straightforward, and has led to bugs. An easier to read
implementation with minimal degrees of freedom is needed.

Towards this goal, this patch introduces explicit states to replace what was
previously interpolated from the STRIPE_OP_* flags. For now this only converts
the handle_parity_checks5 path, removing a user of the
ops.{pending,ack,complete,count} fields of struct stripe_operations.

This conversion also found a remaining issue with the current code. There is
a small window for a drive to fail between when we schedule a repair and when
the parity calculation for that repair completes. When this happens we will
writeback to 'failed_num' when we really want to write back to 'pd_idx'.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

ecc65c9b

md: kill STRIPE_OP_IO flag · 2b7497f0

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

The R5_Want{Read,Write} flags already gate i/o.  So, this flag is
superfluous and we can unconditionally call ops_run_io().
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

2b7497f0

md: kill STRIPE_OP_MOD_DMA in raid5 offload · b203886e

由 Dan Williams 提交于 6月 28, 2008

From: Dan Williams <dan.j.williams@intel.com>

This micro-optimization allowed the raid code to skip a re-read of the
parity block after checking parity.  It took advantage of the fact that
xor-offload-engines have their own internal result buffer and can check
parity without writing to memory.  Remove it for the following reasons:

1/ It is a layering violation for MD to need to manage the DMA and
   non-DMA paths within async_xor_zero_sum
2/ Bad precedent to toggle the 'ops' flags outside the lock
3/ Hard to realize a performance gain as reads will not need an updated
   parity block and writes will dirty it anyways.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

b203886e

Make sure all changes to md/dev-XX/state are notified · 52664732

由 Neil Brown 提交于 6月 28, 2008

The important state change happens during an interrupt
in md_error.  So just set a flag there and call sysfs_notify
later in process context.
Signed-off-by: NNeil Brown <neilb@suse.de>

52664732

Make sure all changes to md/sync_action are notified. · 72a23c21

由 Neil Brown 提交于 6月 28, 2008

When the 'resync' thread starts or stops, when we explicitly
set sync_action, or when we determine that there is definitely nothing
to do, we notify sync_action.

To stop "sync_action" from occasionally showing the wrong value,
we introduce a new flags - MD_RECOVERY_RECOVER - to say that a
recovery is probably needed or happening, and we make sure
that we set MD_RECOVERY_RUNNING before clearing MD_RECOVERY_NEEDED.
Signed-off-by: NNeil Brown <neilb@suse.de>

72a23c21

Allow setting start point for requested check/repair · 5e96ee65

由 Neil Brown 提交于 6月 28, 2008

This makes it possible to just resync a small part of an array.
e.g. if a drive reports that it has questionable sectors,
a 'repair' of just the region covering those sectors will
cause them to be read and, if there is an error, re-written
with correct data.
Signed-off-by: NNeil Brown <neilb@suse.de>

5e96ee65

Improve setting of "events_cleared" for write-intent bitmaps. · a0da84f3

由 Neil Brown 提交于 6月 28, 2008

When an array is degraded, bits in the write-intent bitmap are not
cleared, so that if the missing device is re-added, it can be synced
by only updated those parts of the device that have changed since
it was removed.

The enable this a 'events_cleared' value is stored. It is the event
counter for the array the last time that any bits were cleared.

Sometimes - if a device disappears from an array while it is 'clean' -
the events_cleared value gets updated incorrectly (there are subtle
ordering issues between updateing events in the main metadata and the
bitmap metadata) resulting in the missing device appearing to require
a full resync when it is re-added.

With this patch, we update events_cleared precisely when we are about
to clear a bit in the bitmap.  We record events_cleared when we clear
the bit internally, and copy that to the superblock which is written
out before the bit on storage.  This makes it more "obviously correct".

We also need to update events_cleared when the event_count is going
backwards (as happens on a dirty->clean transition of a non-degraded
array).

Thanks to Mike Snitzer for identifying this problem and testing early
"fixes".

Cc:  "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

a0da84f3

25 5月, 2008 4 次提交

md: restart recovery cleanly after device failure. · dfc70645

由 NeilBrown 提交于 5月 23, 2008

When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.

For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.

We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
  - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
    which causes the recovery not be be checkpointed.
  - We remove spares and then re-added them which loses important state
    information.

The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed.  If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error.  So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.

Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded).  Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.

Issue:  If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available,  do we want to:
 1/ complete the current rebuild, then go back and rebuild the new spare or
 2/ restart the rebuild from the start and rebuild both devices in
    parallel.

Both options can be argued for.  The code currently takes option 2 as
  a/ this requires least code change
  b/ this results in a minimally-degraded array in minimal time.

Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfc70645

md: allow parallel resync of md-devices. · 90b08710

由 Bernd Schubert 提交于 5月 23, 2008

In some configurations, a raid6 resync can be limited by CPU speed
(Calculating P and Q and moving data) rather than by device speed.  In
these cases there is nothing to be gained byt serialising resync of arrays
that share a device, and doing the resync in parallel can provide benefit.
 So add a sysfs tunable to flag an array as being allowed to resync in
parallel with other arrays that use (a different part of) the same device.
Signed-off-by: NBernd Schubert <bs@q-leap.de>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90b08710

md: kill file_path wrapper · 6bcfd601

由 Christoph Hellwig 提交于 5月 23, 2008

Kill the trivial and rather pointless file_path wrapper around d_path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6bcfd601

md: proper extern for mdp_major · 03de250a

由 Adrian Bunk 提交于 5月 23, 2008

This patch adds a proper extern for mdp_major in include/linux/raid/md.h
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

03de250a

30 4月, 2008 1 次提交

md: support blocking writes to an array on device failure · 6bfe0b49

由 Dan Williams 提交于 4月 30, 2008

Allows a userspace metadata handler to take action upon detecting a device
failure.

Based on an original patch by Neil Brown.

Changes:
-added blocked_wait waitqueue to rdev
-don't qualify Blocked with Faulty always let userspace block writes
-added md_wait_for_blocked_rdev to wait for the block device to be clear, if
 userspace misses the notification another one is sent every 5 seconds
-set MD_RECOVERY_NEEDED after clearing "blocked"
-kill DoBlock flag, just test mddev->external
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6bfe0b49

28 4月, 2008 1 次提交

md: introduce get_priority_stripe() to improve raid456 write performance · 8b3e6cdc

由 Dan Williams 提交于 4月 28, 2008

Improve write performance by preventing the delayed_list from dumping all its
stripes onto the handle_list in one shot.  Delayed stripes are now further
delayed by being held on the 'hold_list'.  The 'hold_list' is bypassed when:

  * a STRIPE_IO_STARTED stripe is found at the head of 'handle_list'
  * 'handle_list' is empty and i/o is being done to satisfy full stripe-width
    write requests
  * 'bypass_count' is less than 'bypass_threshold'.  By default the threshold
    is 1, i.e. every other stripe handled is a preread stripe provided the
    top two conditions are false.

Benchmark data:
System: 2x Xeon 5150, 4x SATA, mem=1GB
Baseline: 2.6.24-rc7
Configuration: mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 5 --assume-clean
Test1: dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
  * patched:  +33% (stripe_cache_size = 256), +25% (stripe_cache_size = 512)

Test2: tiobench --size 2048 --numruns 5 --block 4096 --block 131072 (XFS)
  * patched: +13%
  * patched + preread_bypass_threshold = 0: +37%

Changes since v1:
* reduce bypass_threshold from (chunk_size / sectors_per_chunk) to (1) and
  make it configurable.  This defaults to fairness and modest performance
  gains out of the box.
Changes since v2:
* [neilb@suse.de]: kill STRIPE_PRIO_HI and preread_needed as they are not
  necessary, the important change was clearing STRIPE_DELAYED in
  add_stripe_bio and this has been moved out to make_request for the hang
  fix.
* [neilb@suse.de]: simplify get_priority_stripe
* [dan.j.williams@intel.com]: reset the bypass_count when ->hold_list is
  sampled empty (+11%)
* [dan.j.williams@intel.com]: decrement the bypass_count at the detection
  of stripes being naturally promoted off of hold_list +2%.  Note, resetting
  bypass_count instead of decrementing on these events yields +4% but that is
  probably too aggressive.
Changes since v3:
* cosmetic fixups
Tested-by: NJames W. Laferriere <babydr@baby-dragons.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b3e6cdc

19 4月, 2008 1 次提交

include: Remove unnecessary inclusions of asm/semaphore.h · 5a6483fe

由 Matthew Wilcox 提交于 2月 26, 2008

None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they (or some user of them) rely
on it dragging in some unrelated header file, but I can't build all
these files, so we'll have to fix any build failures as they come up.
Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>

5a6483fe

05 3月, 2008 2 次提交

md: clean up irregularity with raid autodetect · d0fae18f

由 NeilBrown 提交于 3月 04, 2008

When a raid1 array is stopped, all components currently get added to the list
for auto-detection.  However we should really only add components that were
found by autodetection in the first place.  So add a flag to record that
information, and use it.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0fae18f

md: reduce CPU wastage on idle md array with a write-intent bitmap · 8311c29d

由 NeilBrown 提交于 3月 04, 2008

On an md array with a write-intent bitmap, a thread wakes up every few seconds
and scans the bitmap looking for work to do. If the array is idle, there will
be no work to do, but a lot of scanning is done to discover this.

So cache the fact that the bitmap is completely clean, and avoid scanning the
whole bitmap when the cache is known to be clean.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8311c29d

07 2月, 2008 3 次提交

md: change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING. · 73c34431

由 NeilBrown 提交于 2月 06, 2008

Finish ITERATE_ to for_each conversion.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73c34431

md: change ITERATE_RDEV to rdev_for_each · d089c6af

由 NeilBrown 提交于 2月 06, 2008

As this is more in line with common practice in the kernel.  Also swap the
args around to be more like list_for_each.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d089c6af

md: allow devices to be shared between md arrays · c5d79adb

由 NeilBrown 提交于 2月 06, 2008

Currently, a given device is "claimed" by a particular array so that it cannot
be used by other arrays.

This is not ideal for DDF and other metadata schemes which have their own
partitioning concept.

So for externally managed metadata, just claim the device for md in general,
require that "offset" and "size" are set properly for each device, and make
sure that if a device is included in different arrays then the active sections
do not overlap.

This involves adding another flag to the rdev which makes it awkward to set
"->flags = 0" to clear certain flags.  So now clear flags explicitly by name
when we want to clear things.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5d79adb

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功