1. 14 1月, 2014 3 次提交
    • N
      md: allow a partially recovered device to be hot-added to an array. · 7eb41885
      NeilBrown 提交于
      When adding a new device into an array it is normally important to
      clear any stale data from ->recovery_offset else the new device may
      not be recovered properly.
      
      However when re-adding a device which is known to be nearly in-sync,
      this is not needed and can be detrimental.  The (bitmap-based)
      resync will still happen, and further recovery is only needed from
      where-ever it was already up to.
      
      So if save_raid_disk is set, signifying a re-add, don't clear
      ->recovery_offset.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7eb41885
    • N
      md: Change handling of save_raid_disk and metadata update during recovery. · f466722c
      NeilBrown 提交于
      Since commit d70ed2e4
         MD: Allow restarting an interrupted incremental recovery.
      
      we don't write out the metadata to devices while they are recovering.
      This had a good reason, but has unfortunate consequences.  This patch
      changes things to make them work better.
      
      At issue is what happens if the array is shut down while a recovery is
      happening, particularly a bitmap-guided recovery.
      Ideally the recovery should pick up where it left off.
      However the metadata cannot represent the state "A recovery is in
      process which is guided by the bitmap".
      
      Before the above mentioned commit, we wrote metadata to the device
      which said "this is being recovered and it is up to <here>".  So after
      a restart, a full recovery (not bitmap-guided) would happen from
      where-ever it was up to.
      
      After the commit the metadata wasn't updated so it still said "This
      device is fully in sync with <this> event count".  That leads to a
      bitmap-based recovery following the whole bitmap, which should be a
      lot less work than a full recovery from some starting point.  So this
      was an improvement.
      
      However updates some metadata but not all leads to other problems.
      In particular, the metadata written to the fully-up-to-date device
      record that the array has all devices present (even though some are
      recovering).  So on restart, mdadm wants to find all devices and
      expects them to have current event counts.
      Obviously it doesn't (some have old event counts) so (when assembling
      with --incremental) it waits indefinitely for the rest of the expected
      devices.
      
      It really is wrong to not update all the metadata together.  Do that
      is bound to cause confusion.
      Instead, we should make it possible to record the truth in the
      metadata.  i.e. we need to be able to record that a device is being
      recovered based on the bitmap.
      We already have a Feature flag to say that recovery is happening.  We
      now add another one to say that it is a bitmap-based recovery.
      
      With this we can remove the code that disables the write-out of
      metadata on some devices.
      
      So this patch:
       - moves the setting of 'saved_raid_disk' from add_new_disk to
         the validate_super methods.  This makes sure it is always set
         properly, both when adding a new device to an array, and when
         assembling an array from a collection of devices.
       - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
         used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
         bitmap-based recovery is allowed.
         This is only present in v1.x metadata. v0.90 doesn't support
         devices which are in the middle of recovery at all.
       - Only skips writing metadata to Faulty devices.
      
       - Also allows rdev state to be set to "-insync" via sysfs.
         This can be used for external-metadata arrays.  When the
         'role' is set the device is assumed to be in-sync.  If, after
         setting the role, we set the state to "-insync", the role is
         moved to saved_raid_disk which effectively says the device is
         partly in-sync with that slot and needs a bitmap recovery.
      
      Cc: Andrei Warkentin <andreiw@vmware.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f466722c
    • N
      md: fix problem when adding device to read-only array with bitmap. · 8313b8e5
      NeilBrown 提交于
      If an array is started degraded, and then the missing device
      is found it can be re-added and a minimal bitmap-based recovery
      will bring it fully up-to-date.
      
      If the array is read-only a recovery would not be allowed.
      But also if the array is read-only and the missing device was
      present very recently, then there could be no need for any
      recovery at all, so we simply include the device in the read-only
      array without any recovery.
      
      However... if the missing device was removed a little longer ago
      it could be missing some updates, but if a bitmap is present it will
      be conditionally accepted pending a bitmap-based update.  We don't
      currently detect this case properly and will include that old
      device into the read-only array with no recovery even though it really
      needs a recovery.
      
      This patch keeps track of whether a bitmap-based-recovery is really
      needed or not in the new Bitmap_sync rdev flag.  If that is set,
      then the device will not be added to a read-only array.
      
      Cc: Andrei Warkentin <andreiw@vmware.com>
      Fixes: d70ed2e4
      Cc: stable@vger.kernel.org (3.2+)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8313b8e5
  2. 28 11月, 2013 1 次提交
  3. 19 11月, 2013 4 次提交
    • J
      md: Convert use of typedef ctl_table to struct ctl_table · 82592c38
      Joe Perches 提交于
      This typedef is unnecessary and should just be removed.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      82592c38
    • N
      md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes. · 30b8feb7
      NeilBrown 提交于
      When raid5 recovery hits a fresh badblock, this badblock will flagged as unack
      badblock until md_update_sb() is called.
      But md_stop will take reconfig lock which means raid5d can't call
      md_update_sb() in md_check_recovery(), the badblock will always
      be unack, so raid5d thread enters an infinite loop and md_stop_write()
      can never stop sync_thread. This causes deadlock.
      
      To solve this, when STOP_ARRAY ioctl is issued and sync_thread is
      running, we need set md->recovery FROZEN and INTR flags and wait for
      sync_thread to stop before we (re)take reconfig lock.
      
      This requires that raid5 reshape_request notices MD_RECOVERY_INTR
      (which it probably should have noticed anyway) and stops waiting for a
      metadata update in that case.
      Reported-by: NJianpeng Ma <majianpeng@gmail.com>
      Reported-by: NBian Yu <bianyu@kedacom.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      30b8feb7
    • N
      md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread. · c91abf5a
      NeilBrown 提交于
      We currently use kthread_should_stop() in various places in the
      sync/reshape code to abort early.
      However some places set MD_RECOVERY_INTR but don't immediately call
      md_reap_sync_thread() (and we will shortly get another one).
      When this happens we are relying on md_check_recovery() to reap the
      thread and that only happen when it finishes normally.
      So MD_RECOVERY_INTR must lead to a normal finish without the
      kthread_should_stop() test.
      
      So replace all relevant tests, and be more careful when the thread is
      interrupted not to acknowledge that latest step in a reshape as it may
      not be fully committed yet.
      
      Also add a test on MD_RECOVERY_INTR in the 'is_mddev_idle' loop
      so we don't wait have to wait for the speed to drop before we can abort.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c91abf5a
    • N
      md: fix some places where mddev_lock return value is not checked. · 29f097c4
      NeilBrown 提交于
      Sometimes we need to lock and mddev and cannot cope with
      failure due to interrupt.
      In these cases we should use mutex_lock, not mutex_lock_interruptible.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      29f097c4
  4. 14 11月, 2013 1 次提交
    • N
      md: fix calculation of stacking limits on level change. · 02e5f5c0
      NeilBrown 提交于
      The various ->run routines of md personalities assume that the 'queue'
      has been initialised by the blk_set_stacking_limits() call in
      md_alloc().
      
      However when the level is changed (by level_store()) the ->run routine
      for the new level is called for an array which has already had the
      stacking limits modified.  This can result in incorrect final
      settings.
      
      So call blk_set_stacking_limits() before ->run in level_store().
      
      A specific consequence of this bug is that it causes
      discard_granularity to be set incorrectly when reshaping a RAID4 to a
      RAID0.
      
      This is suitable for any -stable kernel since 3.3 in which
      blk_set_stacking_limits() was introduced.
      
      Cc: stable@vger.kernel.org (3.3+)
      Reported-and-tested-by: N"Baldysiak, Pawel" <pawel.baldysiak@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      02e5f5c0
  5. 09 11月, 2013 1 次提交
  6. 24 10月, 2013 1 次提交
    • B
      md: avoid deadlock when md_set_badblocks. · 905b0297
      Bian Yu 提交于
      When operate harddisk and hit errors, md_set_badblocks is called after
      scsi_restart_operations which already disabled the irq. but md_set_badblocks
      will call write_sequnlock_irq and enable irq. so softirq can preempt the
      current thread and that may cause a deadlock. I think this situation should
      use write_sequnlock_irqsave/irqrestore instead.
      
      I met the situation and the call trace is below:
      [  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
      [  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
      [  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
      [  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
      [  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
      [  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
      [  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
      [  638.933884] Call Trace:
      [  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
      [  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
      [  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
      [  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
      [  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
      [  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
      [  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
      [  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
      [  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
      [  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
      [  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
      [  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
      [  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
      [  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
      [  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
      [  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
      [  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
      [  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
      [  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
      [  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
      [  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
      [  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
      [  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
      [  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
      [  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
      [  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
      [  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
      [  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
      [  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
      [  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
      [  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
      [  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
      [  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
      [  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
      [  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
      [  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
      [  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
      [  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
      [  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
      [  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      [  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
      [  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      
      This bug was introduce in commit  2e8ac303
      (the first time rdev_set_badblock was call from interrupt context),
      so this patch is appropriate for 3.5 and subsequent kernels.
      
      Cc: <stable@vger.kernel.org> (3.5+)
      Signed-off-by: NBian Yu <bianyu@kedacom.com>
      Reviewed-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      905b0297
  7. 27 9月, 2013 1 次提交
    • T
      sysfs: clean up sysfs_get_dirent() · 388975cc
      Tejun Heo 提交于
      The pre-existing sysfs interfaces which take explicit namespace
      argument are weird in that they place the optional @ns in front of
      @name which is contrary to the established convention.  For example,
      we end up forcing vast majority of sysfs_get_dirent() users to do
      sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
      especially as @ns and @name may be interchanged without causing
      compilation warning.
      
      This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
      positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
      around sysfs_get_dirent_ns().  This makes confusions a lot less
      likely.
      
      There are other interfaces which take @ns before @name.  They'll be
      updated by following patches.
      
      This patch doesn't introduce any functional changes.
      
      v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
          error on module builds.  Reported by build test robot.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      388975cc
  8. 27 8月, 2013 5 次提交
    • N
      md: avoid deadlock when dirty buffers during md_stop. · 260fa034
      NeilBrown 提交于
      When the last process closes /dev/mdX sync_blockdev will be called so
      that all buffers get flushed.
      So if it is then opened for the STOP_ARRAY ioctl to be sent there will
      be nothing to flush.
      
      However if we open /dev/mdX in order to send the STOP_ARRAY ioctl just
      moments before some other process which was writing closes their file
      descriptor, then there won't be a 'last close' and the buffers might
      not get flushed.
      
      So do_md_stop() calls sync_blockdev().  However at this point it is
      holding ->reconfig_mutex.  So if the array is currently 'clean' then
      the writes from sync_blockdev() will not complete until the array
      can be marked dirty and that won't happen until some other thread
      can get ->reconfig_mutex.  So we deadlock.
      
      We need to move the sync_blockdev() call to before we take
      ->reconfig_mutex.
      However then some other thread could open /dev/mdX and write to it
      after we call sync_blockdev() and before we actually stop the array.
      This can leave dirty data in the page cache which is awkward.
      
      So introduce new flag MD_STILL_CLOSED.  Set it before calling
      sync_blockdev(), clear it if anyone does open the file, and abort the
      STOP_ARRAY attempt if it gets set before we lock against further
      opens.
      
      It is still possible to get problems if you open /dev/mdX, write to
      it, then issue the STOP_ARRAY ioctl.  Just don't do that.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      260fa034
    • N
      md: Don't test all of mddev->flags at once. · 7a0a5355
      NeilBrown 提交于
      mddev->flags is mostly used to record if an update of the
      metadata is needed.  Sometimes the whole field is tested
      instead of just the important bits.  This makes it difficult
      to introduce more state bits.
      
      So replace all bare tests of mddev->flags with tests for the bits
      that actually need testing.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7a0a5355
    • D
      md: Fix apparent cut-and-paste error in super_90_validate · c9ad020f
      Dave Jones 提交于
      Setting a variable to itself probably wasn't the intention here.
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c9ad020f
    • N
      md: fix safe_mode buglet. · 275c51c4
      NeilBrown 提交于
      Whe we set the safe_mode_timeout to a smaller value we trigger a timeout
      immediately - otherwise the small value might not be honoured.
      However if the previous timeout was 0 meaning "no timeout", we didn't.
      This would mean that no timeout happens until the next write completes,
      which could be a long time.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      275c51c4
    • N
      md: don't call md_allow_write in get_bitmap_file. · 60559da4
      NeilBrown 提交于
      There is no really need as GFP_NOIO is very likely sufficient,
      and failure is not catastrophic.
      
      Calling md_allow_write here will convert a read-auto array to
      read/write which could be confusing when you are just performing
      a read operation.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      60559da4
  9. 18 7月, 2013 1 次提交
    • N
      md: Remove recent change which allows devices to skip recovery. · 5024c298
      NeilBrown 提交于
      commit 7ceb17e8
          md: Allow devices to be re-added to a read-only array.
      
      allowed a bit more than just that.  It also allows devices to be added
      to a read-write array and to end up skipping recovery.
      
      This patch removes the offending piece of code pending a rewrite for a
      subsequent release.
      
      More specifically:
       If the array has a bitmap, then the device will still need a bitmap
       based resync ('saved_raid_disk' is set under different conditions
       is a bitmap is present).
       If the array doesn't have a bitmap, then this is correct as long as
       nothing has been written to the array since the metadata was checked
       by ->validate_super.  However there is no locking to ensure that there
       was no write.
      
      Bug was introduced in 3.10 and causes data corruption so
      patch is suitable for 3.10-stable.
      
      Cc: stable@vger.kernel.org (3.10)
      Reported-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5024c298
  10. 26 6月, 2013 1 次提交
    • J
      MD: Remember the last sync operation that was performed · c4a39551
      Jonathan Brassow 提交于
      MD:  Remember the last sync operation that was performed
      
      This patch adds a field to the mddev structure to track the last
      sync operation that was performed.  This is especially useful when
      it comes to what is recorded in mismatch_cnt in sysfs.  If the
      last operation was "data-check", then it reports the number of
      descrepancies found by the user-initiated check.  If it was a
      "repair" operation, then it is reporting the number of
      descrepancies repaired.  etc.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c4a39551
  11. 14 6月, 2013 2 次提交
  12. 13 6月, 2013 1 次提交
  13. 07 5月, 2013 1 次提交
  14. 30 4月, 2013 1 次提交
    • N
      md: bad block list should default to disabled. · 486adf72
      NeilBrown 提交于
      Maintenance of a bad-block-list currently defaults to 'enabled'
      and is then disabled when it cannot be supported.
      This is backwards and causes problem for dm-raid which didn't know
      to disable it.
      
      So fix the defaults, and only enabled for v1.x metadata which
      explicitly has bad blocks enabled.
      
      The problem with dm-raid has been present since badblock support was
      added in v3.1, so this patch is suitable for any -stable from 3.1
      onwards.
      
      Cc: stable@vger.kernel.org (3.1+)
      Reported-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      486adf72
  15. 24 4月, 2013 6 次提交
    • J
      MD: Export 'md_reap_sync_thread' function · a91d5ac0
      Jonathan Brassow 提交于
      MD: Export 'md_reap_sync_thread' function
      
      Make 'md_reap_sync_thread' available to other files, specifically dm-raid.c.
      - rename reap_sync_thread to md_reap_sync_thread
      - move the fn after md_check_recovery to match md.h declaration placement
      - export md_reap_sync_thread
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a91d5ac0
    • N
      md: don't update metadata when stopping a read-only array. · b6d428c6
      NeilBrown 提交于
      read-only arrays should stay that way as much as possible.
      Updating the metadata - which could be triggered by a re-add
      while assembling the array metadata - should be avoided.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b6d428c6
    • N
      md: Allow devices to be re-added to a read-only array. · 7ceb17e8
      NeilBrown 提交于
      When assembling an array incrementally we might want to make
      it device available when "enough" devices are present, but maybe
      not "all" devices are present.
      If the remaining devices appear before the array is actually used,
      they should be added transparently.
      
      We do this by using the "read-auto" mode where the array acts like
      it is read-only until a write request arrives.
      
      Current an add-device request switches a read-auto array to active.
      This means that only one device can be added after the array is first
      made read-auto.  This isn't a problem for RAID5, but is not ideal for
      RAID6 or RAID10.
      Also we don't really want to switch the array to read-auto at all
      when re-adding a device as this doesn't really imply any change.
      
      So:
       - remove the "md_update_sb()" call from add_new_disk().  This isn't
         really needed as just adding a disk doesn't require a metadata
         update.  Instead, just set MD_CHANGE_DEVS.  This will effect a
         metadata update soon enough, once the array is not read-only.
      
       - Allow the ADD_NEW_DISK ioctl to succeed without activating a
         read-auto array, providing the MD_DISK_SYNC flag is set.
         In this case, the device will be rejected if it cannot be added
         with the correct device number, or has an incorrect event count.
      
       - Teach remove_and_add_spares() to be careful about adding spares
         when the array is read-only (or read-mostly) - only add devices
         that are thought to be in-sync, and only do it if the array is
         in-sync itself.
      
       - In md_check_recovery, use remove_and_add_spares in the read-only
         case, rather than open coding just the 'remove' part of it.
      Reported-by: NMartin Wilck <mwilck@arcor.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7ceb17e8
    • N
      md: HOT_DISK_REMOVE shouldn't make a read-auto device active. · 3ea8929d
      NeilBrown 提交于
      If a fail device or a spare is removed from an array, there is
      not need to make the array 'active'.  If/when the array does become
      active for some other reason the metadata will be update to reflect
      the removal.
      If that never happens and the array is stopped while still read-auto,
      then there is no loss in forgetting the that the device had 'failed'.
      
      A read-only array will leave failed devices attached to
      the array personality, so we need to explicitly call
      remove_and_add_spares() to free it (clearing Blocked just
      like we do in store_slot()).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3ea8929d
    • N
      md: use common code for all calls to ->hot_remove_disk() · 746d3207
      NeilBrown 提交于
      slot_store and remove_and_add_spares both call ->hot_remove_disk(),
      but with slightly different tests and consequences, which is
      at least untidy and might be buggy.
      
      So modify remove_and_add_spaces() so that it can be asked
      to remove a specific device, and call it from slot_store().
      
      We also clear the Blocked flag to ensure that doesn't prevent
      removal.  The purpose of Blocked is to prevent automatic removal
      by the kernel before an error is acknowledged.
      If the array is read/write then user-space would have not reason
      to remove a device unless it was known to be 'spare' or 'faulty' in
      which it would have already cleared the Blocked flag.
      If the array is read-only, the flag might still be blocked, but
      there is no harm in clearing the flag for read-only arrays.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      746d3207
    • N
      md: never update metadata when array is read-only. · d87f064f
      NeilBrown 提交于
      Normally we don't even try to update the metadata if
      the array is read-only.  However future patches
      will increase the number of things that can happen on a read-only
      array, so it is safest to explicitly disable this.
      
      Every time that mddev->ro is set to 0, either
       - md_update_sb will be called again (at least if MD_CHANGE_DEVS
         is set) or
       - the mddev->thread is scheduled, which will also run
         md_update_sb if needed.
      
      So this is safe: if the array ever become read-write the
      metadata will be updated.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d87f064f
  16. 24 3月, 2013 1 次提交
  17. 20 3月, 2013 1 次提交
    • J
      MD: Prevent sysfs operations on uninitialized kobjects · 90584fc9
      Jonathan Brassow 提交于
      MD: Prevent sysfs operations on uninitialized kobjects
      
      Device-mapper does not use sysfs; but when device-mapper is leveraging
      MD's RAID personalities, MD sometimes attempts to update sysfs.  This
      patch adds checks for 'mddev-kobj.sd' in sysfs_[un]link_rdev to ensure
      it is about to operate on something valid.  This patch also checks for
      'mddev->kobj.sd' before calling 'sysfs_notify' in 'remove_and_add_spares'.
      Although 'sysfs_notify' already makes this check, doing so in
      'remove_and_add_spares' prevents an additional mutex operation.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      90584fc9
  18. 28 2月, 2013 1 次提交
  19. 26 2月, 2013 1 次提交
    • N
      md: fix two bugs when attempting to resize RAID0 array. · a6468539
      NeilBrown 提交于
      You cannot resize a RAID0 array (in terms of making the devices
      bigger), but the code doesn't entirely stop you.
      So:
      
       disable setting of the available size on each device for
       RAID0 and Linear devices.  This must not change as doing so
       can change the effective layout of data.
      
       Make sure that the size that raid0_size() reports is accurate,
       but rounding devices sizes to chunk sizes.  As the device sizes
       cannot change now, this isn't so important, but it is best to be
       safe.
      
      Without this change:
        mdadm --grow /dev/md0 -z max
        mdadm --grow /dev/md0 -Z max
        then read to the end of the array
      
      can cause a BUG in a RAID0 array.
      
      These bugs have been present ever since it became possible
      to resize any device, which is a long time.  So the fix is
      suitable for any -stable kerenl.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a6468539
  20. 21 2月, 2013 1 次提交
  21. 13 12月, 2012 3 次提交
  22. 11 12月, 2012 2 次提交
    • N
      md.c: re-indent various 'switch' statements. · c02c0aeb
      NeilBrown 提交于
      Intent was unnecessarily deep.
      
      Also change one 'switch' which has a single case element, into an
      'if'.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c02c0aeb
    • N
      md: close race between removing and adding a device. · a7a3f08d
      NeilBrown 提交于
      When we remove a device from an md array, the final removal of
      the "dev-XX" sys entry is run asynchronously.
      If we then re-add that device immediately before the worker thread
      gets to run, we can end up trying to add the "dev-XX" sysfs entry back
      before it has been removed.
      
      So in both places where we add a device, call
        flush_workqueue(md_misc_wq);
      before taking the md lock (as holding the md lock can prevent removal
      to complete).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a7a3f08d