1. 23 2月, 2015 1 次提交
  2. 06 2月, 2015 10 次提交
    • N
      md: make reconfig_mutex optional for writes to md sysfs files. · 6791875e
      NeilBrown 提交于
      Rather than using mddev_lock() to take the reconfig_mutex
      when writing to any md sysfs file, we only take mddev_lock()
      in the particular _store() functions that require it.
      Admittedly this is most, but it isn't all.
      
      This also allows us to remove special-case handling for new_dev_store
      (in md_attr_store).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6791875e
    • N
      md: move mddev_lock and related to md.h · 5c47daf6
      NeilBrown 提交于
      The one which is not inline (mddev_unlock) gets EXPORTed.
      
      This makes the locking available to personality modules so that it
      doesn't have to be imposed upon them.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5c47daf6
    • N
      md: use mddev->lock to protect updates to resync_{min,max}. · 23da422b
      NeilBrown 提交于
      There are interdependencies between these two sysfs attributes
      and whether a resync is currently running.
      
      Rather than depending on reconfig_mutex to ensure no races when
      testing these interdependencies are met, use the spinlock.
      This will allow the mutex to be remove from protecting this
      code in a subsequent patch.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      23da422b
    • N
      md: minor cleanup in safe_delay_store. · 1b30e66f
      NeilBrown 提交于
      There isn't really much room for races with ->safemode_delay.
      But as I am trying to clean up any racy code and will soon
      be removing reconfig_mutex protection from most _store()
      functions:
       - only set mddev->safemode_delay once, to ensure no code
         can see an intermediate value
       - use safemode_timer to call md_safemode_timeout() rather than
         calling it directly, to ensure it never races with itself.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1b30e66f
    • N
      md: move GET_BITMAP_FILE ioctl out from mddev_lock. · 4af1a041
      NeilBrown 提交于
      It makes more sense to report bitmap_info->file, rather than
      bitmap->file (the later is only available once the array is
      active).
      
      With that change, use mddev->lock to protect bitmap_info being
      set to NULL, and we can call get_bitmap_file() without taking
      the mutex.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4af1a041
    • N
      md: tidy up set_bitmap_file · 1e594bb2
      NeilBrown 提交于
      1/ delay setting mddev->bitmap_info.file until 'f' looks
         usable, so we don't have to unset it.
      2/ Don't allow bitmap file to be set if bitmap_info.file
         is already set.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1e594bb2
    • N
      md: remove unnecessary 'buf' from get_bitmap_file. · f4ad3d38
      NeilBrown 提交于
      'buf' is only used because d_path fills from the end of the
      buffer instead of from the start.
      We don't need a separate buf to handle that, we just need to use
      memmove() to move the string to the start.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f4ad3d38
    • N
      md: remove mddev_lock from rdev_attr_show() · 758bfc8a
      NeilBrown 提交于
      No rdev attributes need locking for 'show', though
      state_show() might benefit from ensuring it sees a
      consistent set of flags.
      
      None even use rdev->mddev, so testing for it isn't really
      needed and it certainly doesn't need to be held constant.
      
      So improve state_show() and remove the locking.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      758bfc8a
    • N
      md: remove mddev_lock() from md_attr_show() · b7b17c9b
      NeilBrown 提交于
      Most attributes can be read safely without any locking.
      A race might lead to a slightly out-dated value, but nothing wrong.
      
      We already have locking in some places where needed.
      All that remains is can_clear_show(), behind_writes_used_show()
      and action_show() which are easily fixed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b7b17c9b
    • N
      md: remove need for mddev_lock() in md_seq_show() · f97fcad3
      NeilBrown 提交于
      The only access in md_seq_show that could suffer from races
      not protected by ->lock is walking the rdev list.
      This can receive sufficient protection from 'rcu'.
      
      So use rdev_for_each_rcu() and get rid of mddev_lock().
      
      Now reading /proc/mdstat will never block in md_seq_show.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f97fcad3
  3. 04 2月, 2015 7 次提交
    • N
      md: protect ->pers changes with mddev->lock · 36d091f4
      NeilBrown 提交于
      ->pers is already protected by ->reconfig_mutex, and
      cannot possibly change when there are threads running or
      outstanding IO.
      
      However there are some places where we access ->pers
      not in a thread or IO context, and where ->reconfig_mutex
      is unnecessarily heavy-weight:  level_show and md_seq_show().
      
      So protect all changes, and those accesses, with ->lock.
      This is a step toward taking those accesses out from under
      reconfig_mutex.
      
      [Fixed missing "mddev->pers" -> "pers" conversion, thanks to
       Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: NNeilBrown <neilb@suse.de>
      36d091f4
    • N
      md: level_store: group all important changes into one place. · db721d32
      NeilBrown 提交于
      Gather all the changes that can happen atomically and might
      be relevant to other code into one place.  This will
      make it easier to refine the locking.
      
      Note that this puts quite a few things between mddev_detach()
      and ->free().  Enabling this was the point of some recent patches.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      db721d32
    • N
      md: rename ->stop to ->free · afa0f557
      NeilBrown 提交于
      Now that the ->stop function only frees the private data,
      rename is accordingly.
      
      Also pass in the private pointer as an arg rather than using
      mddev->private.  This flexibility will be useful in level_store().
      
      Finally, don't clear ->private.  It doesn't make sense to clear
      it seeing that isn't what we free, and it is no longer necessary
      to clear ->private (it was some time ago before  ->to_remove was
      introduced).
      
      Setting ->to_remove in ->free() is a bit of a wart, but not a
      big problem at the moment.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      afa0f557
    • N
      md: split detach operation out from ->stop. · 5aa61f42
      NeilBrown 提交于
      Each md personality has a 'stop' operation which does two
      things:
       1/ it finalizes some aspects of the array to ensure nothing
          is accessing the ->private data
       2/ it frees the ->private data.
      
      All the steps in '1' can apply to all arrays and so can be
      performed in common code.
      
      This is useful as in the case where we change the personality which
      manages an array (in level_store()), it would be helpful to do
      step 1 early, and step 2 later.
      
      So split the 'step 1' functionality out into a new mddev_detach().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5aa61f42
    • N
      md: make merge_bvec_fn more robust in face of personality changes. · 64590f45
      NeilBrown 提交于
      There is no locking around calls to merge_bvec_fn(), so
      it is possible that calls which coincide with a level (or personality)
      change could go wrong.
      
      So create a central dispatch point for these functions and use
      rcu_read_lock().
      If the array is suspended, reject any merge that can be rejected.
      If not, we know it is safe to call the function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      64590f45
    • N
      md: make ->congested robust against personality changes. · 5c675f83
      NeilBrown 提交于
      There is currently no locking around calls to the 'congested'
      bdi function.  If called at an awkward time while an array is
      being converted from one level (or personality) to another, there
      is a tiny chance of running code in an unreferenced module etc.
      
      So add a 'congested' function to the md_personality operations
      structure, and call it with appropriate locking from a central
      'mddev_congested'.
      
      When the array personality is changing the array will be 'suspended'
      so no IO is processed.
      If mddev_congested detects this, it simply reports that the
      array is congested, which is a safe guess.
      As mddev_suspend calls synchronize_rcu(), mddev_congested can
      avoid races by included the whole call inside an rcu_read_lock()
      region.
      This require that the congested functions for all subordinate devices
      can be run under rcu_lock.  Fortunately this is the case.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5c675f83
    • N
      md: rename mddev->write_lock to mddev->lock · 85572d7c
      NeilBrown 提交于
      This lock is used for (slightly) more than helping with writing
      superblocks, and it will soon be extended further.  So the
      name is inappropriate.
      
      Also, the _irq variant hasn't been needed since 2.6.37 as it is
      never taking from interrupt or bh context.
      
      So:
        -rename write_lock to lock
        -document what it protects
        -remove _irq ... except in md_flush_request() as there
           is no wait_event_lock() (with no _irq).  This can be
           cleaned up after appropriate changes to wait.h.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      85572d7c
  4. 11 12月, 2014 1 次提交
    • N
      md: Check MD_RECOVERY_RUNNING as well as ->sync_thread. · f851b60d
      NeilBrown 提交于
      A recent change to md started the ->sync_thread from a asynchronously
      from a work_queue rather than synchronously.  This means that there
      can be a small window between the time when MD_RECOVERY_RUNNING is set
      and when ->sync_thread is set.
      
      So code that checks ->sync_thread might now conclude that the thread
      has not been started and (because a lock is held) will not be started.
      That is no longer the case.
      
      Most of those places are best fixed by testing MD_RECOVERY_RUNNING
      as well.  To make this completely reliable, we wake_up(&resync_wait)
      after clearing that flag as well as after clearing ->sync_thread.
      
      Other places are better served by flushing the relevant workqueue
      to ensure that that if the sync thread was starting, it has now
      started.  This is particularly best if we are about to stop the
      sync thread.
      
      Fixes: ac05f256Signed-off-by: NNeilBrown <neilb@suse.de>
      f851b60d
  5. 03 12月, 2014 1 次提交
  6. 24 11月, 2014 1 次提交
  7. 17 11月, 2014 1 次提交
    • N
      md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN · 45eaf45d
      NeilBrown 提交于
      md_check_recovery will skip any recovery and also clear
      MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set.
      So when we clear _FROZEN, we must set _NEEDED and ensure that
      md_check_recovery gets run.
      Otherwise we could miss out on something that is needed.
      
      In particular, this can make it impossible to remove a
      failed device from an array is the  'recovery-needed' processing
      didn't happen.
      Suitable for stable kernels since 3.13.
      
      Cc: stable@vger.kernel.org (3.13+)
      Reported-and-tested-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Fixes: 30b8feb7Signed-off-by: NNeilBrown <neilb@suse.de>
      45eaf45d
  8. 14 10月, 2014 14 次提交
  9. 08 8月, 2014 2 次提交
  10. 31 7月, 2014 1 次提交
    • N
      md: disable probing for md devices 512 and over. · af5628f0
      NeilBrown 提交于
      The way md devices are traditionally created in the kernel
      is simply to open the device with the desired major/minor number.
      
      This can be problematic as some support tools, notably udev and
      programs run by udev, can open a device just to see what is there, and
      find that it has created something.  It is easy for a race to cause
      udev to open an md device just after it was destroy, causing it to
      suddenly re-appear.
      
      For some time we have had an alternate way to create md devices
        echo md_somename > /sys/modules/md_mod/paramaters/new_array
      
      This will always use a minor number of 512 or higher, which mdadm
      normally avoids.
      Using this makes the creation-by-opening unnecessary, but does
      not disable it, so it is still there to cause problems.
      
      This patch disable probing for devices with a major of 9 (MD_MAJOR)
      and a minor of 512 and up.  This devices created by writing to
      new_array cannot be re-created by opening the node in /dev.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      af5628f0
  11. 03 7月, 2014 1 次提交
    • N
      md: flush writes before starting a recovery. · 133d4527
      NeilBrown 提交于
      When we write to a degraded array which has a bitmap, we
      make sure the relevant bit in the bitmap remains set when
      the write completes (so a 're-add' can quickly rebuilt a
      temporarily-missing device).
      
      If, immediately after such a write starts, we incorporate a spare,
      commence recovery, and skip over the region where the write is
      happening (because the 'needs recovery' flag isn't set yet),
      then that write will not get to the new device.
      
      Once the recovery finishes the new device will be trusted, but will
      have incorrect data, leading to possible corruption.
      
      We cannot set the 'needs recovery' flag when we start the write as we
      do not know easily if the write will be "degraded" or not.  That
      depends on details of the particular raid level and particular write
      request.
      
      This patch fixes a corruption issue of long standing and so it
      suitable for any -stable kernel.  It applied correctly to 3.0 at
      least and will minor editing to earlier kernels.
      Reported-by: NBill <billstuff2001@sbcglobal.net>
      Tested-by: NBill <billstuff2001@sbcglobal.net>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.netSigned-off-by: NNeilBrown <neilb@suse.de>
      133d4527