1. 06 2月, 2015 1 次提交
    • N
      md: remove need for mddev_lock() in md_seq_show() · f97fcad3
      NeilBrown 提交于
      The only access in md_seq_show that could suffer from races
      not protected by ->lock is walking the rdev list.
      This can receive sufficient protection from 'rcu'.
      
      So use rdev_for_each_rcu() and get rid of mddev_lock().
      
      Now reading /proc/mdstat will never block in md_seq_show.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f97fcad3
  2. 04 2月, 2015 7 次提交
    • N
      md: protect ->pers changes with mddev->lock · 36d091f4
      NeilBrown 提交于
      ->pers is already protected by ->reconfig_mutex, and
      cannot possibly change when there are threads running or
      outstanding IO.
      
      However there are some places where we access ->pers
      not in a thread or IO context, and where ->reconfig_mutex
      is unnecessarily heavy-weight:  level_show and md_seq_show().
      
      So protect all changes, and those accesses, with ->lock.
      This is a step toward taking those accesses out from under
      reconfig_mutex.
      
      [Fixed missing "mddev->pers" -> "pers" conversion, thanks to
       Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: NNeilBrown <neilb@suse.de>
      36d091f4
    • N
      md: level_store: group all important changes into one place. · db721d32
      NeilBrown 提交于
      Gather all the changes that can happen atomically and might
      be relevant to other code into one place.  This will
      make it easier to refine the locking.
      
      Note that this puts quite a few things between mddev_detach()
      and ->free().  Enabling this was the point of some recent patches.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      db721d32
    • N
      md: rename ->stop to ->free · afa0f557
      NeilBrown 提交于
      Now that the ->stop function only frees the private data,
      rename is accordingly.
      
      Also pass in the private pointer as an arg rather than using
      mddev->private.  This flexibility will be useful in level_store().
      
      Finally, don't clear ->private.  It doesn't make sense to clear
      it seeing that isn't what we free, and it is no longer necessary
      to clear ->private (it was some time ago before  ->to_remove was
      introduced).
      
      Setting ->to_remove in ->free() is a bit of a wart, but not a
      big problem at the moment.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      afa0f557
    • N
      md: split detach operation out from ->stop. · 5aa61f42
      NeilBrown 提交于
      Each md personality has a 'stop' operation which does two
      things:
       1/ it finalizes some aspects of the array to ensure nothing
          is accessing the ->private data
       2/ it frees the ->private data.
      
      All the steps in '1' can apply to all arrays and so can be
      performed in common code.
      
      This is useful as in the case where we change the personality which
      manages an array (in level_store()), it would be helpful to do
      step 1 early, and step 2 later.
      
      So split the 'step 1' functionality out into a new mddev_detach().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5aa61f42
    • N
      md: make merge_bvec_fn more robust in face of personality changes. · 64590f45
      NeilBrown 提交于
      There is no locking around calls to merge_bvec_fn(), so
      it is possible that calls which coincide with a level (or personality)
      change could go wrong.
      
      So create a central dispatch point for these functions and use
      rcu_read_lock().
      If the array is suspended, reject any merge that can be rejected.
      If not, we know it is safe to call the function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      64590f45
    • N
      md: make ->congested robust against personality changes. · 5c675f83
      NeilBrown 提交于
      There is currently no locking around calls to the 'congested'
      bdi function.  If called at an awkward time while an array is
      being converted from one level (or personality) to another, there
      is a tiny chance of running code in an unreferenced module etc.
      
      So add a 'congested' function to the md_personality operations
      structure, and call it with appropriate locking from a central
      'mddev_congested'.
      
      When the array personality is changing the array will be 'suspended'
      so no IO is processed.
      If mddev_congested detects this, it simply reports that the
      array is congested, which is a safe guess.
      As mddev_suspend calls synchronize_rcu(), mddev_congested can
      avoid races by included the whole call inside an rcu_read_lock()
      region.
      This require that the congested functions for all subordinate devices
      can be run under rcu_lock.  Fortunately this is the case.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5c675f83
    • N
      md: rename mddev->write_lock to mddev->lock · 85572d7c
      NeilBrown 提交于
      This lock is used for (slightly) more than helping with writing
      superblocks, and it will soon be extended further.  So the
      name is inappropriate.
      
      Also, the _irq variant hasn't been needed since 2.6.37 as it is
      never taking from interrupt or bh context.
      
      So:
        -rename write_lock to lock
        -document what it protects
        -remove _irq ... except in md_flush_request() as there
           is no wait_event_lock() (with no _irq).  This can be
           cleaned up after appropriate changes to wait.h.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      85572d7c
  3. 11 12月, 2014 1 次提交
    • N
      md: Check MD_RECOVERY_RUNNING as well as ->sync_thread. · f851b60d
      NeilBrown 提交于
      A recent change to md started the ->sync_thread from a asynchronously
      from a work_queue rather than synchronously.  This means that there
      can be a small window between the time when MD_RECOVERY_RUNNING is set
      and when ->sync_thread is set.
      
      So code that checks ->sync_thread might now conclude that the thread
      has not been started and (because a lock is held) will not be started.
      That is no longer the case.
      
      Most of those places are best fixed by testing MD_RECOVERY_RUNNING
      as well.  To make this completely reliable, we wake_up(&resync_wait)
      after clearing that flag as well as after clearing ->sync_thread.
      
      Other places are better served by flushing the relevant workqueue
      to ensure that that if the sync thread was starting, it has now
      started.  This is particularly best if we are about to stop the
      sync thread.
      
      Fixes: ac05f256Signed-off-by: NNeilBrown <neilb@suse.de>
      f851b60d
  4. 03 12月, 2014 1 次提交
  5. 24 11月, 2014 1 次提交
  6. 17 11月, 2014 1 次提交
    • N
      md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN · 45eaf45d
      NeilBrown 提交于
      md_check_recovery will skip any recovery and also clear
      MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set.
      So when we clear _FROZEN, we must set _NEEDED and ensure that
      md_check_recovery gets run.
      Otherwise we could miss out on something that is needed.
      
      In particular, this can make it impossible to remove a
      failed device from an array is the  'recovery-needed' processing
      didn't happen.
      Suitable for stable kernels since 3.13.
      
      Cc: stable@vger.kernel.org (3.13+)
      Reported-and-tested-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Fixes: 30b8feb7Signed-off-by: NNeilBrown <neilb@suse.de>
      45eaf45d
  7. 14 10月, 2014 14 次提交
  8. 08 8月, 2014 2 次提交
  9. 31 7月, 2014 1 次提交
    • N
      md: disable probing for md devices 512 and over. · af5628f0
      NeilBrown 提交于
      The way md devices are traditionally created in the kernel
      is simply to open the device with the desired major/minor number.
      
      This can be problematic as some support tools, notably udev and
      programs run by udev, can open a device just to see what is there, and
      find that it has created something.  It is easy for a race to cause
      udev to open an md device just after it was destroy, causing it to
      suddenly re-appear.
      
      For some time we have had an alternate way to create md devices
        echo md_somename > /sys/modules/md_mod/paramaters/new_array
      
      This will always use a minor number of 512 or higher, which mdadm
      normally avoids.
      Using this makes the creation-by-opening unnecessary, but does
      not disable it, so it is still there to cause problems.
      
      This patch disable probing for devices with a major of 9 (MD_MAJOR)
      and a minor of 512 and up.  This devices created by writing to
      new_array cannot be re-created by opening the node in /dev.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      af5628f0
  10. 03 7月, 2014 2 次提交
  11. 29 5月, 2014 3 次提交
    • N
      md: md_clear_badblocks should return an error code on failure. · 8b32bf5e
      NeilBrown 提交于
      Julia Lawall and coccinelle report that md_clear_badblocks always
      returns 0, despite appearing to have an error path.
      The error path really should return an error code.  ENOSPC is
      reasonably appropriate.
      Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8b32bf5e
    • N
      md: refuse to change shape of array if it is active but read-only · bd8839e0
      NeilBrown 提交于
      read-only arrays should not be changed.  This includes changing
      the level, layout, size, or number of devices.
      
      So reject those changes for readonly arrays.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bd8839e0
    • N
      md: always set MD_RECOVERY_INTR when interrupting a reshape thread. · 2ac295a5
      NeilBrown 提交于
      Commit 8313b8e5
         md: fix problem when adding device to read-only array with bitmap.
      
      added a called to md_reap_sync_thread() which cause a reshape thread
      to be interrupted (in particular, it could cause md_thread() to never even
      call md_do_sync()).
      However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not
      know that the reshape didn't complete.
      
      This only happens when mddev->ro is set and normally reshape threads
      don't run in that situation.  But raid5 and raid10 can start a reshape
      thread during "run" is the array is in the middle of a reshape.
      They do this even if ->ro is set.
      
      So it is best to set MD_RECOVERY_INTR before abortingg the
      sync thread, just in case.
      
      Though it rare for this to trigger a problem it can cause data corruption
      because the reshape isn't finished properly.
      So it is suitable for any stable which the offending commit was applied to.
      (3.2 or later)
      
      Fixes: 8313b8e5
      Cc: stable@vger.kernel.org (3.2+)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2ac295a5
  12. 28 5月, 2014 1 次提交
    • N
      md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync". · 3991b31e
      NeilBrown 提交于
      If mddev->ro is set, md_to_sync will (correctly) abort.
      However in that case MD_RECOVERY_INTR isn't set.
      
      If a RESHAPE had been requested, then ->finish_reshape() will be
      called and it will think the reshape was successful even though
      nothing happened.
      
      Normally a resync will not be requested if ->ro is set, but if an
      array is stopped while a reshape is on-going, then when the array is
      started, the reshape will be restarted.  If the array is also set
      read-only at this point, the reshape will instantly appear to success,
      resulting in data corruption.
      
      Consequently, this patch is suitable for any -stable kernel.
      
      Cc: stable@vger.kernel.org (any)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3991b31e
  13. 06 5月, 2014 1 次提交
    • N
      md: avoid possible spinning md thread at shutdown. · 0f62fb22
      NeilBrown 提交于
      If an md array with externally managed metadata (e.g. DDF or IMSM)
      is in use, then we should not set safemode==2 at shutdown because:
      
      1/ this is ineffective: user-space need to be involved in any 'safemode' handling,
      2/ The safemode management code doesn't cope with safemode==2 on external metadata
         and md_check_recover enters an infinite loop.
      
      Even at shutdown, an infinite-looping process can be problematic, so this
      could cause shutdown to hang.
      
      Cc: stable@vger.kernel.org (any kernel)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0f62fb22
  14. 09 4月, 2014 2 次提交
    • N
      md: avoid oops on unload if some process is in poll or select. · e2f23b60
      NeilBrown 提交于
      If md-mod is unloaded while some process is in poll() or select(),
      then that process maintains a pointer to md_event_waiters, and when
      the try to unlink from that list, they will oops.
      
      The procfs infrastructure ensures that ->poll won't be called after
      remove_proc_entry, but doesn't provide a wait_queue_head for us to
      use, and the waitqueue code doesn't provide a way to remove all
      listeners from a waitqueue.
      
      So we need to:
       1/ make sure no further references to md_event_waiters are taken (by
          setting md_unloading)
       2/ wake up all processes currently waiting, and
       3/ wait until all those processes have disconnected from our
          wait_queue_head.
      Reported-by: N"majianpeng" <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e2f23b60
    • N
      md/bitmap: don't abuse i_writecount for bitmap files. · 035328c2
      NeilBrown 提交于
      md bitmap code currently tries to use i_writecount to stop any other
      process from writing to out bitmap file.  But that is really an abuse
      and has bit-rotted so locking is all wrong.
      
      So discard that - root should be allowed to shoot self in foot.
      
      Still use it in a much less intrusive way to stop the same file being
      used as bitmap on two different array, and apply other checks to
      ensure the file is at least vaguely usable for bitmap storage
      (is regular, is open for write.  Support for ->bmap is already checked
      elsewhere).
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      035328c2
  15. 16 1月, 2014 1 次提交
    • N
      md: check command validity early in md_ioctl(). · cb335f88
      Nicolas Schichan 提交于
      Verify that the cmd parameter passed to md_ioctl() is valid before
      doing anything.
      
      This fixes mddev->hold_active being set to 0 when an invalid ioctl
      command is passed to md_ioctl() before the array has been configured.
      
      Clearing mddev->hold_active in that case can lead to a livelock
      situation when an invalid ioctl number is given to md_ioctl() by a
      process when the mddev is currently being opened by another process:
      
      Process 1				Process 2
      ---------				---------
      
      md_alloc()
        mddev_find()
        -> returns a new mddev with
           hold_active == UNTIL_IOCTL
        add_disk()
        -> sends KOBJ_ADD uevent
      
      					(sees KOBJ_ADD uevent for device)
                          			md_open()
                          			md_ioctl(INVALID_IOCTL)
                          			-> returns ENODEV and clears
                             			   mddev->hold_active
                          			md_release()
                            			md_put()
                            			-> deletes the mddev as
                               		   hold_active is 0
      
      md_open()
        mddev_find()
        -> returns a newly
          allocated mddev with
          mddev->gendisk == NULL
      -> returns with ERESTARTSYS
         (kernel restarts the open syscall)
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cb335f88
  16. 14 1月, 2014 1 次提交
    • N
      md: ensure metadata is writen after raid level change. · 830778a1
      NeilBrown 提交于
      level_store() currently does not make sure the metadata is
      updates to reflect the new raid level.  It simply sets MD_CHANGE_DEVS.
      
      Any level with a ->thread will quickly notice this and update the
      metadata.  However RAID0 and Linear do not have a thread so no
      metadata update happens until the array is stopped.  At that point the
      metadata is written.
      
      This is later that we would like.  While the delay doesn't risk any
      data it can cause confusion.  So if there is no md thread, immediately
      update the metadata after a level change.
      Reported-by: NRichard Michael <rmichael@edgeofthenet.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      830778a1