1. 29 7月, 2016 1 次提交
  2. 20 7月, 2016 3 次提交
  3. 14 6月, 2016 3 次提交
  4. 10 6月, 2016 1 次提交
    • C
      md: use a mutex to protect a global list · 5b1f5bc3
      Cong Wang 提交于
      We saw a list corruption in the list all_detected_devices:
      
       WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9()
       list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0).
       Modules linked in: ahci libahci libata sd_mod scsi_mod
       CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1
       Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013
       Workqueue: events_unbound async_run_entry_fn
        0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48
        0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828
        ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0
       Call Trace:
        [<ffffffff81502872>] dump_stack+0x4d/0x63
        [<ffffffff810692a5>] warn_slowpath_common+0xa1/0xbb
        [<ffffffff812ad02c>] ? __list_add+0x3c/0xa9
        [<ffffffff81069305>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff812ad02c>] __list_add+0x3c/0xa9
        [<ffffffff81406f28>] md_autodetect_dev+0x41/0x62
        [<ffffffff81285862>] rescan_partitions+0x25f/0x29d
        [<ffffffff81506372>] ? mutex_lock+0x13/0x31
        [<ffffffff811a090f>] __blkdev_get+0x1aa/0x3cd
        [<ffffffff811a0b91>] blkdev_get+0x5f/0x294
        [<ffffffff81377ceb>] ? put_device+0x17/0x19
        [<ffffffff8128227c>] ? disk_put_part+0x12/0x14
        [<ffffffff812836f3>] add_disk+0x29d/0x407
        [<ffffffff81384345>] ? __pm_runtime_use_autosuspend+0x5c/0x64
        [<ffffffffa004a724>] sd_probe_async+0x115/0x1af [sd_mod]
        [<ffffffff81083177>] async_run_entry_fn+0x72/0x12c
        [<ffffffff8107c44c>] process_one_work+0x198/0x2ce
        [<ffffffff8107cac7>] worker_thread+0x1dd/0x2bb
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff81080d9c>] kthread+0xae/0xb6
        [<ffffffff81080000>] ? param_array_set+0x40/0xfa
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
        [<ffffffff81508152>] ret_from_fork+0x42/0x70
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
      
      I suspect it is because there is no lock protecting this
      global list, autostart_arrays() is called in ioctl() path
      where there is no lock.
      
      Cc: Shaohua Li <shli@kernel.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      5b1f5bc3
  5. 04 6月, 2016 2 次提交
    • G
      md: simplify the code with md_kick_rdev_from_array · db767672
      Guoqing Jiang 提交于
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      db767672
    • G
      md-cluster: fix deadlock issue when add disk to an recoverying array · bb8bf15b
      Guoqing Jiang 提交于
      Add a disk to an array which is performing recovery
      is a little complicated, we need to do both reap the
      sync thread and perform add disk for the case, then
      it caused deadlock as follows.
      
      linux44:~ # ps aux|grep md|grep D
      root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
      root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
      linux44:~ # cat /proc/1848/stack
      [<ffffffff8107afde>] kthread_stop+0x6e/0x120
      [<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
      [<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
      [<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
      [<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
      [<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
      [<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
      [<ffffffff811a75b8>] SyS_write+0x48/0xa0
      [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
      [<00007f068ea1ed30>] 0x7f068ea1ed30
      linux44:~ # cat /proc/1822/stack
      [<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
      [<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
      [<ffffffff8107ad94>] kthread+0xb4/0xc0
      [<ffffffff8152a518>] ret_from_fork+0x58/0x90
      
                              Task1848                                Task1822
      md_attr_store (held reconfig_mutex by call mddev_lock())
                              action_store
      			md_reap_sync_thread
      			md_unregister_thread
      			kthread_stop                    md_wakeup_thread(mddev->thread);
      						wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))
      
      md_check_recovery is triggered by wakeup mddev->thread,
      but it can't clear MD_CHANGE_PENDING flag since it can't
      get lock which was held by md_attr_store already.
      
      To solve the deadlock problem, we move "->resync_finish()"
      from md_do_sync to md_reap_sync_thread (after md_update_sb),
      also MD_HELD_RESYNC_LOCK is introduced since it is possible
      that node can't get resync lock in md_do_sync.
      
      Then we do not need to wait for MD_CHANGE_PENDING is cleared
      or not since metadata should be updated after md_update_sb,
      so just call resync_finish if MD_HELD_RESYNC_LOCK is set.
      
      We also unified the code after skip label, since set PENDING
      for non-clustered case should be harmless.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      bb8bf15b
  6. 10 5月, 2016 2 次提交
    • G
      md: set MD_CHANGE_PENDING in a atomic region · 85ad1d13
      Guoqing Jiang 提交于
      Some code waits for a metadata update by:
      
      1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
      2. setting MD_CHANGE_PENDING and waking the management thread
      3. waiting for MD_CHANGE_PENDING to be cleared
      
      If the first two are done without locking, the code in md_update_sb()
      which checks if it needs to repeat might test if an update is needed
      before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
      in the wait returning early.
      
      So make sure all places that set MD_CHANGE_PENDING are atomicial, and
      bit_clear_unless (suggested by Neil) is introduced for the purpose.
      
      Cc: Martin Kepplinger <martink@posteo.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: <linux-kernel@vger.kernel.org>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      85ad1d13
    • H
      md: md.c: fix oops in mddev_suspend for raid0 · 092398dc
      Heinz Mauelshagen 提交于
      Introduced by upstream commit 70d9798b
      
      The raid0 personality does not create mddev->thread as oposed to
      other personalities leading to its unconditional access in
      mddev_suspend() causing an oops.
      
      Patch checks for mddev->thread in order to keep the
      intention of aforementioned commit.
      
      Fixes: 70d9798b ("MD: warn for potential deadlock")
      Cc: stable@vger.kernel.org (4.5+)
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      092398dc
  7. 05 5月, 2016 4 次提交
  8. 26 4月, 2016 1 次提交
  9. 13 4月, 2016 1 次提交
  10. 01 4月, 2016 2 次提交
    • S
      MD: add rdev reference for super write · ed3b98c7
      Shaohua Li 提交于
      Xiao Ni reported below crash:
      [26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
      [26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
      [26396.349449] PGD 0
      [26396.351468] Oops: 0002 [#1] SMP
      [26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td
      [26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1
      [26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014
      [26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000
      [26396.429074] RIP: 0010:[<ffffffffa0425b00>]  [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
      [26396.437952] RSP: 0018:ffff8808365f7c38  EFLAGS: 00010046
      [26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198
      [26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900
      [26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7
      [26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00
      [26396.478849] FS:  00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000
      [26396.486922] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0
      [26396.499775] Stack:
      [26396.501781]  ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68
      [26396.509199]  ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637
      [26396.516618]  00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000
      [26396.524038] Call Trace:
      [26396.526485]  [<ffffffff81308cd0>] bio_endio+0x40/0x60
      [26396.531529]  [<ffffffff81310637>] blk_update_request+0x87/0x320
      [26396.537439]  [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70
      [26396.543261]  [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0
      [26396.549517]  [<ffffffff81313ccf>] flush_end_io+0x15f/0x240
      [26396.554993]  [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70
      [26396.560815]  [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0
      [26396.567246]  [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20
      [26396.573506]  [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop]
      [26396.579764]  [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0
      [26396.585242]  [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170
      [26396.591065]  [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0
      [26396.597582]  [<ffffffff810a7238>] kthread+0xd8/0xf0
      [26396.602453]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
      [26396.607929]  [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70
      [26396.613319]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
      
      md_super_write() and corresponding md_super_wait() generally are called
      with reconfig_mutex locked, which prevents disk disappears. There is one
      case this rule is broken. write_sb_page of bitmap.c doesn't hold the
      mutex. next_active_rdev does increase rdev reference, but it decreases
      the reference too early (eg, before IO finish). disk can disappear at
      the window. We unconditionally increase rdev reference in
      md_super_write() to avoid the race.
      Reported-and-tested-by: NXiao Ni <xni@redhat.com>
      Reviewed-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      ed3b98c7
    • W
      md: fix a trivial typo in comments · 466ad292
      Wei Fang 提交于
      Fix a trivial typo in md_ioctl().
      Signed-off-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      466ad292
  11. 27 2月, 2016 2 次提交
    • S
      MD: warn for potential deadlock · 70d9798b
      Shaohua Li 提交于
      The personality thread shouldn't call mddev_suspend(). Because
      mddev_suspend() will for all IO finish, but IO is handled in personality
      thread, so this could cause deadlock. To trigger this early, add a
      warning if mddev_suspend() is called from personality thread.
      Suggested-by: NNeilBrown <neilb@suse.com>
      Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      70d9798b
    • S
      md: Drop sending a change uevent when stopping · 399146b8
      Sebastian Parschauer 提交于
      When stopping an MD device, then its device node /dev/mdX may still
      exist afterwards or it is recreated by udev. The next open() call
      can lead to creation of an inoperable MD device. The reason for
      this is that a change event (KOBJ_CHANGE) is sent to udev which
      races against the remove event (KOBJ_REMOVE) from md_free().
      So drop sending the change event.
      
      A change is likely also required in mdadm as many versions send the
      change event to udev as well.
      
      Neil mentioned the change event is a workaround for old kernel
      Commit: 934d9c23 ("md: destroy partitions and notify udev when md array is stopped.")
      new mdadm can handle device remove now, so this isn't required any more.
      
      Cc: NeilBrown <neilb@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NSebastian Parschauer <sebastian.riemer@profitbricks.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      399146b8
  12. 14 1月, 2016 3 次提交
    • D
      md/raid: only permit hot-add of compatible integrity profiles · 1501efad
      Dan Williams 提交于
      It is not safe for an integrity profile to be changed while i/o is
      in-flight in the queue.  Prevent adding new disks or otherwise online
      spares to an array if the device has an incompatible integrity profile.
      
      The original change to the blk_integrity_unregister implementation in
      md, commmit c7bfced9 "md: suspend i/o during runtime
      blk_integrity_unregister" introduced an immediate hang regression.
      
      This policy of disallowing changes the integrity profile once one has
      been established is shared with DM.
      
      Here is an abbreviated log from a test run that:
      1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [   59.076127]
      2/ Tries to add an integrity-disabled device (pmem1m) [   90.489209]
      3/ Retries with an integrity-enabled device (pmem1s) [  205.671277]
      
      [   59.076127] md/raid1:md0: active with 1 out of 2 mirrors
      [   59.078302] md: data integrity enabled on md0
      [..]
      [   90.489209] md0: incompatible integrity profile for pmem1m
      [..]
      [  205.671277] md: super_written gets error=-5
      [  205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
      [  205.677386] md/raid1:md0: Operation continuing on 1 devices.
      [  205.683037] RAID1 conf printout:
      [  205.684699]  --- wd:1 rd:2
      [  205.685972]  disk 0, wo:0, o:1, dev:pmem0s
      [  205.687562]  disk 1, wo:1, o:1, dev:pmem1s
      [  205.691717] md: recovery of RAID array md0
      
      Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister")
      Cc: <stable@vger.kernel.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reported-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      1501efad
    • S
      MD: add journal with array suspended · 87d4d916
      Shaohua Li 提交于
      Hot add journal disk in recovery thread context brings a lot of trouble
      as IO could be running. Unlike spare disk hot add, adding journal disk
      with array suspended makes more sense and implmentation is much easier.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      87d4d916
    • S
      md: set MD_HAS_JOURNAL in correct places · a62ab49e
      Shaohua Li 提交于
      Set MD_HAS_JOURNAL when a array is loaded or journal is initialized.
      This is to avoid the flags set too early in journal disk hotadd.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      a62ab49e
  13. 10 1月, 2016 2 次提交
  14. 07 1月, 2016 2 次提交
  15. 06 1月, 2016 9 次提交
    • S
      raid5-cache: add journal hot add/remove support · f6b6ec5c
      Shaohua Li 提交于
      Add support for journal disk hot add/remove. Mostly trival checks in md
      part. The raid5 part is a little tricky. For hot-remove, we can't wait
      pending write as it's called from raid5d. The wait will cause deadlock.
      We simplily fail the hot-remove. A hot-remove retry can success
      eventually since if journal disk is faulty all pending write will be
      failed and finish. For hot-add, since an array supporting journal but
      without journal disk will be marked read-only, we are safe to hot add
      journal without stopping IO (should be read IO, while journal only
      handles write IO).
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      f6b6ec5c
    • D
      drivers: md: use ktime_get_real_seconds() · 9ebc6ef1
      Deepa Dinamani 提交于
      get_seconds() API is not y2038 safe on 32 bit systems and the API
      is deprecated. Replace it with calls to ktime_get_real_seconds()
      API instead. Change mddev structure types to time64_t accordingly.
      
      32 bit signed timestamps will overflow in the year 2038.
      
      Change the user interface mdu_array_info_s structure timestamps:
      ctime and utime values used in ioctls GET_ARRAY_INFO and
      SET_ARRAY_INFO to unsigned int. This will extend the field to last
      until the year 2106.
      The long term plan is to get rid of ctime and utime values in
      this structure as this information can be read from the on-disk
      meta data directly.
      
      Clamp the tim64_t timestamps to positive values with a max of U32_MAX
      when returning from GET_ARRAY_INFO ioctl to accommodate above changes
      in the data type of timestamps to unsigned int.
      
      v0.90 on disk meta data uses u32 for maintaining time stamps.
      So this will also last until year 2106.
      Assumption is that the usage of v0.90 will be deprecated by
      year 2106.
      
      Timestamp fields in the on disk meta data for v1.0 version already
      use 64 bit data types. Remove the truncation of the bits while
      writing to or reading from these from the disk.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      9ebc6ef1
    • A
      md: avoid warning for 32-bit sector_t · 3312c951
      Arnd Bergmann 提交于
      When CONFIG_LBDAF is not set, sector_t is only 32-bits wide, which
      means we cannot have devices with more than 2TB, and the code that
      is trying to handle compatibility support for large devices in
      md version 0.90 is meaningless but also causes a compile-time warning:
      
      drivers/md/md.c: In function 'super_90_load':
      drivers/md/md.c:1029:19: warning: large integer implicitly truncated to unsigned type [-Woverflow]
      drivers/md/md.c: In function 'super_90_rdev_size_change':
      drivers/md/md.c:1323:17: warning: large integer implicitly truncated to unsigned type [-Woverflow]
      
      This adds a check for CONFIG_LBDAF to avoid even getting into this
      code path, and also adds an explicit cast to let the compiler know
      it doesn't have to warn about the truncation.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      3312c951
    • G
      md: update comment for md_allow_write · abf3508d
      Guoqing Jiang 提交于
      MD_CHANGE_CLEAN had been replaced with MD_CHANGE_PENDING after
      commit 070dc6 ("md: resolve confusion of MD_CHANGE_CLEAN"),
      so make the change accordingly.
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      abf3508d
    • G
      md-cluster: Defer MD reloading to mddev->thread · 15858fa5
      Guoqing Jiang 提交于
      Reloading of superblock must be performed under reconfig_mutex. However,
      this cannot be done with md_reload_sb because it would deadlock with
      the message DLM lock. So, we defer it in md_check_recovery() which is
      executed by mddev->thread.
      
      This introduces a new flag, MD_RELOAD_SB, which if set, will reload the
      superblock. And good_device_nr is also added to 'struct mddev' which is
      used to get the num of the good device within cluster raid.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      15858fa5
    • G
      md-cluster: append some actions when change bitmap from clustered to none · f6a2dc64
      Guoqing Jiang 提交于
      For clustered raid, we need to do extra actions when change
      bitmap to none.
      
      1. check if all the bitmap lock could be get or not, if yes then
         we can continue the change since cluster raid is only active
         in current node. Otherwise return fail and unlock the related
         bitmap locks
      2. set nodes to 0 and then leave cluster environment.
      3. release other nodes's bitmap lock.
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      f6a2dc64
    • G
      md-cluster: Allow spare devices to be marked as faulty · 09afd2a8
      Goldwyn Rodrigues 提交于
      If a spare device was marked faulty, it would not be reflected
      in receiving nodes because it would mark it as activated and continue.
      Continue the operation, so it may be set as faulty.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      09afd2a8
    • G
      md-cluster: Fix the remove sequence with the new MD reload code · 54a88392
      Goldwyn Rodrigues 提交于
      The remove disk message does not need metadata_update_start(), but
      can be an independent message.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      54a88392
    • G
      md-cluster: remove a disk asynchronously from cluster environment · 659b254f
      Guoqing Jiang 提交于
      For cluster raid, if one disk couldn't be reach in one node, then
      other nodes would receive the REMOVE message for the disk.
      
      In receiving node, we can't call md_kick_rdev_from_array to remove
      the disk from array synchronously since the disk might still be busy
      in this node. So let's set a ClusterRemove flag on the disk, then
      let the thread to do the removal job eventually.
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      659b254f
  16. 21 12月, 2015 1 次提交
    • N
      md: remove check for MD_RECOVERY_NEEDED in action_store. · 312045ee
      NeilBrown 提交于
      md currently doesn't allow a 'sync_action' such as 'reshape' to be set
      while MD_RECOVERY_NEEDED is set.
      
      This s a problem, particularly since commit 738a2738 as that can
      cause ->check_shape to call mddev_resume() which sets
      MD_RECOVERY_NEEDED.  So by the time we come to start 'reshape' it is
      very likely that MD_RECOVERY_NEEDED is still set.
      
      Testing for this flag is not really needed and is in any case very
      racy as it can be set at any moment - asynchronously.  Any race
      between setting a sync_action and setting MD_RECOVERY_NEEDED must
      already be handled properly in some locked code, probably
      md_check_recovery(), so remove the test here.
      
      The test on MD_RECOVERY_RUNNING is also racy in the 'reshape' case
      so we should test it again after getting mddev_lock().
      
      As this fixes a race and a regression which can cause 'reshape' to
      fail, it is suitable for -stable kernels since 4.1
      Reported-by: NXiao Ni <xni@redhat.com>
      Fixes: 738a2738 ("md/raid5: fix allocation of 'scribble' array.")
      Cc: stable@vger.kernel.org (v4.1+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      312045ee
  17. 18 12月, 2015 1 次提交