1. 17 2月, 2010 1 次提交
    • T
      percpu: add __percpu sparse annotations to what's left · a29d8b8e
      Tejun Heo 提交于
      Add __percpu sparse annotations to places which didn't make it in one
      of the previous patches.  All converions are trivial.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      a29d8b8e
  2. 14 12月, 2009 4 次提交
    • N
      md: add MODULE_DESCRIPTION for all md related modules. · 0efb9e61
      NeilBrown 提交于
      Suggested by  Oren Held <orenhe@il.ibm.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0efb9e61
    • N
      md/raid5: don't complete make_request on barrier until writes are scheduled · 729a1866
      NeilBrown 提交于
      The post-barrier-flush is sent by md as soon as make_request on the
      barrier write completes.  For raid5, the data might not be in the
      per-device queues yet.  So for barrier requests, wait for any
      pre-reading to be done so that the request will be in the per-device
      queues.
      
      We use the 'preread_active' count to check that nothing is still in
      the preread phase, and delay the decrement of this count until after
      write requests have been submitted to the underlying devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      729a1866
    • N
      md: support barrier requests on all personalities. · a2826aa9
      NeilBrown 提交于
      Previously barriers were only supported on RAID1.  This is because
      other levels requires synchronisation across all devices and so needed
      a different approach.
      Here is that approach.
      
      When a barrier arrives, we send a zero-length barrier to every active
      device.  When that completes - and if the original request was not
      empty -  we submit the barrier request itself (with the barrier flag
      cleared) and then submit a fresh load of zero length barriers.
      
      The barrier request itself is asynchronous, but any subsequent
      request will block until the barrier completes.
      
      The reason for clearing the barrier flag is that a barrier request is
      allowed to fail.  If we pass a non-empty barrier through a striping
      raid level it is conceivable that part of it could succeed and part
      could fail.  That would be way too hard to deal with.
      So if the first run of zero length barriers succeed, we assume all is
      sufficiently well that we send the request and ignore errors in the
      second run of barriers.
      
      RAID5 needs extra care as write requests may not have been submitted
      to the underlying devices yet.  So we flush the stripe cache before
      proceeding with the barrier.
      
      Note that the second set of zero-length barriers are submitted
      immediately after the original request is submitted.  Thus when
      a personality finds mddev->barrier to be set during make_request,
      it should not return from make_request until the corresponding
      per-device request(s) have been queued.
      
      That will be done in later patches.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reviewed-by: NAndre Noll <maan@systemlinux.org>
      a2826aa9
    • N
      md/raid5: remove some sparse warnings. · 8553fe7e
      NeilBrown 提交于
      qd_idx is previously declared and given exactly the same value!
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8553fe7e
  3. 13 11月, 2009 2 次提交
    • N
      md/raid5: Allow dirty-degraded arrays to be assembled when only party is degraded. · c148ffdc
      NeilBrown 提交于
      Normally is it not safe to allow a raid5 that is both dirty and
      degraded to be assembled without explicit request from that admin, as
      it can cause hidden data corruption.
      This is because 'dirty' means that the parity cannot be trusted, and
      'degraded' means that the parity needs to be used.
      
      However, if the device that is missing contains only parity, then
      there is no issue and assembly can continue.
      This particularly applies when a RAID5 is being converted to a RAID6
      and there is an unclean shutdown while the conversion is happening.
      
      So check for whether the degraded space only contains parity, and
      in that case, allow the assembly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c148ffdc
    • N
      Don't unconditionally set in_sync on newly added device in raid5_reshape · 7ef90146
      NeilBrown 提交于
      When a reshape finds that it can add spare devices into the array,
      those devices might already be 'in_sync' if they are beyond the old
      size of the array, or they might not if they are within the array.
      
      The first case happens when we change an N-drive RAID5 to an
      N+1-drive RAID5.
      The second happens when we convert an N-drive RAID5 to an
      N+1-drive RAID6.
      
      So set the flag more carefully.
      Also, ->recovery_offset is only meaningful when the flag is clear,
      so only set it in that case.
      
      This change needs the preceding two to ensure that the non-in_sync
      device doesn't get evicted from the array when it is stopped, in the
      case where v0.90 metadata is used.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7ef90146
  4. 06 11月, 2009 1 次提交
    • N
      md/raid5: make sure curr_sync_completes is uptodate when reshape starts · 8dee7211
      NeilBrown 提交于
      This value is visible through sysfs and is used by mdadm
      when it manages a reshape (backing up data that is about to be
      rearranged).  So it is important that it is always correct.
      Current it does not get updated properly when a reshape
      starts which can cause problems when assembling an array
      that is in the middle of being reshaped.
      
      This is suitable for 2.6.31.y stable kernels.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8dee7211
  5. 20 10月, 2009 1 次提交
  6. 16 10月, 2009 6 次提交
    • N
      md/async: don't pass a memory pointer as a page pointer. · 5dd33c9a
      NeilBrown 提交于
      md/raid6 passes a list of 'struct page *' to the async_tx routines,
      which then either DMA map them for offload, or take the page_address
      for CPU based calculations.
      
      For RAID6 we sometime leave 'blanks' in the list of pages.
      For CPU based calcs, we want to treat theses as a page of zeros.
      For offloaded calculations, we simply don't pass a page to the
      hardware.
      
      Currently the 'blanks' are encoded as a pointer to
      raid6_empty_zero_page.  This is a 4096 byte memory region, not a
      'struct page'.  This is mostly handled correctly but is rather ugly.
      
      So change the code to pass and expect a NULL pointer for the blanks.
      When taking page_address of a page, we need to check for a NULL and
      in that case use raid6_empty_zero_page.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5dd33c9a
    • N
      md: Fix handling of raid5 array which is being reshaped to fewer devices. · 5e5e3e78
      NeilBrown 提交于
      When a raid5 (or raid6) array is being reshaped to have fewer devices,
      conf->raid_disks is the latter and hence smaller number of devices.
      However sometimes we want to use a number which is the total number of
      currently required devices - the larger of the 'old' and 'new' sizes.
      Before we implemented reducing the number of devices, this was always
      'new' i.e. ->raid_disks.
      Now we need max(raid_disks, previous_raid_disks) in those places.
      
      This particularly affects assembling an array that was shutdown while
      in the middle of a reshape to fewer devices.
      
      md.c needs a similar fix when interpreting the md metadata.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5e5e3e78
    • N
      md: fix problems with RAID6 calculations for DDF. · e4424fee
      NeilBrown 提交于
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e4424fee
    • D
      md/raid456: downlevel multicore operations to raid_run_ops · 417b8d4a
      Dan Williams 提交于
      The percpu conversion allowed a straightforward handoff of stripe
      processing to the async subsytem that initially showed some modest gains
      (+4%).  However, this model is too simplistic and leads to stripes
      bouncing between raid5d and the async thread pool for every invocation
      of handle_stripe().  As reported by Holger this can fall into a
      pathological situation severely impacting throughput (6x performance
      loss).
      
      By downleveling the parallelism to raid_run_ops the pathological
      stripe_head bouncing is eliminated.  This version still exhibits an
      average 11% throughput loss for:
      
      	mdadm --create /dev/md0 /dev/sd[b-q] -n 16 -l 6
      	echo 1024 > /sys/block/md0/md/stripe_cache_size
      	dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
      
      ...but the results are at least stable and can be used as a base for
      further multicore experimentation.
      Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      417b8d4a
    • D
      md/raid5: initialize conf->device_lock earlier · f5efd45a
      Dan Williams 提交于
      Deallocating a raid5_conf_t structure requires taking 'device_lock'.
      Ensure it is initialized before it is used, i.e. initialize the lock
      before attempting any further initializations that might fail.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f5efd45a
    • N
      Revert "md: do not progress the resync process if the stripe was blocked" · 1442577b
      NeilBrown 提交于
      This reverts commit df10cfbc.
      
      This patch was based on a misunderstanding and risks introducing a busy-wait loop.
      So revert it.
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1442577b
  7. 23 9月, 2009 3 次提交
  8. 17 9月, 2009 2 次提交
  9. 11 9月, 2009 1 次提交
  10. 09 9月, 2009 1 次提交
    • D
      dmaengine: add fence support · 0403e382
      Dan Williams 提交于
      Some engines optimize operation by reading ahead in the descriptor chain
      such that descriptor2 may start execution before descriptor1 completes.
      If descriptor2 depends on the result from descriptor1 then a fence is
      required (on descriptor2) to disable this optimization.  The async_tx
      api could implicitly identify dependencies via the 'depend_tx'
      parameter, but that would constrain cases where the dependency chain
      only specifies a completion order rather than a data dependency.  So,
      provide an ASYNC_TX_FENCE to explicitly identify data dependencies.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0403e382
  11. 30 8月, 2009 12 次提交
  12. 13 8月, 2009 3 次提交
    • N
      md/raid5: Properly remove excess drives after shrinking a raid5/6 · 1a67dde0
      NeilBrown 提交于
      We were removing the drives, from the array, but not
      removing symlinks from /sys/.... and not marking the device
      as having been removed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1a67dde0
    • N
      md/raid5: make sure a reshape restarts at the correct address. · a639755c
      NeilBrown 提交于
      This "if" don't allow for the possibility that the number of devices
      doesn't change, and so sector_nr isn't set correctly in that case.
      So change '>' to '>='.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a639755c
    • N
      md/raid5: allow new reshape modes to be restarted in the middle. · 67ac6011
      NeilBrown 提交于
      md/raid5 doesn't allow a reshape to restart if it involves writing
      over the same part of disk that it would be reading from.
      This happens at the beginning of a reshape that increases the number
      of devices, at the end of a reshape that decreases the number of
      devices, and continuously for a reshape that does not change the
      number of devices.
      
      The current code is correct for the "increase number of devices"
      case as the critical section at the start is handled by userspace
      performing a backup.
      
      It does not work for reducing the number of devices, or the
      no-change case.
      For 'reducing', we need to invert the test.  For no-change we cannot
      really be sure things will be safe, so simply require the array
      to be read-only, which is how the user-space code which carefully
      starts such arrays works.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      67ac6011
  13. 03 8月, 2009 3 次提交
    • N
      md: Use revalidate_disk to effect changes in size of device. · 449aad3e
      NeilBrown 提交于
      As revalidate_disk calls check_disk_size_change, it will cause
      any capacity change of a gendisk to be propagated to the blockdev
      inode.  So use that instead of mucking about with locks and
      i_size_write.
      
      Also add a call to revalidate_disk in do_md_run and a few other places
      where the gendisk capacity is changed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      449aad3e
    • N
      md: allow raid5_quiesce to work properly when reshape is happening. · 64bd660b
      NeilBrown 提交于
      The ->quiesce method is not supposed to stop resync/recovery/reshape,
      just normal IO.
      But in raid5 we don't have a way to know which stripes are being
      used for normal IO and which for resync etc, so we need to wait for
      all stripes to be idle to be sure that all writes have completed.
      
      However reshape keeps at least some stripe busy for an extended period
      of time, so a call to raid5_quiesce can block for several seconds
      needlessly.
      So arrange for reshape etc to pause briefly while raid5_quiesce is
      trying to quiesce the array so that the active_stripes count can
      drop to zero.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      64bd660b
    • N
      md/raid5: set reshape_position correctly when reshape starts. · e516402c
      NeilBrown 提交于
      As the internal reshape_progress counter is the main driver
      for reshape, the fact that reshape_position sometimes starts with the
      wrong value has minimal effect.  It is visible in sysfs and that
      is all.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e516402c