1. 14 6月, 2016 4 次提交
  2. 10 6月, 2016 1 次提交
    • C
      md: use a mutex to protect a global list · 5b1f5bc3
      Cong Wang 提交于
      We saw a list corruption in the list all_detected_devices:
      
       WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9()
       list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0).
       Modules linked in: ahci libahci libata sd_mod scsi_mod
       CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1
       Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013
       Workqueue: events_unbound async_run_entry_fn
        0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48
        0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828
        ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0
       Call Trace:
        [<ffffffff81502872>] dump_stack+0x4d/0x63
        [<ffffffff810692a5>] warn_slowpath_common+0xa1/0xbb
        [<ffffffff812ad02c>] ? __list_add+0x3c/0xa9
        [<ffffffff81069305>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff812ad02c>] __list_add+0x3c/0xa9
        [<ffffffff81406f28>] md_autodetect_dev+0x41/0x62
        [<ffffffff81285862>] rescan_partitions+0x25f/0x29d
        [<ffffffff81506372>] ? mutex_lock+0x13/0x31
        [<ffffffff811a090f>] __blkdev_get+0x1aa/0x3cd
        [<ffffffff811a0b91>] blkdev_get+0x5f/0x294
        [<ffffffff81377ceb>] ? put_device+0x17/0x19
        [<ffffffff8128227c>] ? disk_put_part+0x12/0x14
        [<ffffffff812836f3>] add_disk+0x29d/0x407
        [<ffffffff81384345>] ? __pm_runtime_use_autosuspend+0x5c/0x64
        [<ffffffffa004a724>] sd_probe_async+0x115/0x1af [sd_mod]
        [<ffffffff81083177>] async_run_entry_fn+0x72/0x12c
        [<ffffffff8107c44c>] process_one_work+0x198/0x2ce
        [<ffffffff8107cac7>] worker_thread+0x1dd/0x2bb
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff81080d9c>] kthread+0xae/0xb6
        [<ffffffff81080000>] ? param_array_set+0x40/0xfa
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
        [<ffffffff81508152>] ret_from_fork+0x42/0x70
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
      
      I suspect it is because there is no lock protecting this
      global list, autostart_arrays() is called in ioctl() path
      where there is no lock.
      
      Cc: Shaohua Li <shli@kernel.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      5b1f5bc3
  3. 04 6月, 2016 2 次提交
    • G
      md: simplify the code with md_kick_rdev_from_array · db767672
      Guoqing Jiang 提交于
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      db767672
    • G
      md-cluster: fix deadlock issue when add disk to an recoverying array · bb8bf15b
      Guoqing Jiang 提交于
      Add a disk to an array which is performing recovery
      is a little complicated, we need to do both reap the
      sync thread and perform add disk for the case, then
      it caused deadlock as follows.
      
      linux44:~ # ps aux|grep md|grep D
      root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
      root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
      linux44:~ # cat /proc/1848/stack
      [<ffffffff8107afde>] kthread_stop+0x6e/0x120
      [<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
      [<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
      [<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
      [<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
      [<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
      [<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
      [<ffffffff811a75b8>] SyS_write+0x48/0xa0
      [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
      [<00007f068ea1ed30>] 0x7f068ea1ed30
      linux44:~ # cat /proc/1822/stack
      [<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
      [<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
      [<ffffffff8107ad94>] kthread+0xb4/0xc0
      [<ffffffff8152a518>] ret_from_fork+0x58/0x90
      
                              Task1848                                Task1822
      md_attr_store (held reconfig_mutex by call mddev_lock())
                              action_store
      			md_reap_sync_thread
      			md_unregister_thread
      			kthread_stop                    md_wakeup_thread(mddev->thread);
      						wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))
      
      md_check_recovery is triggered by wakeup mddev->thread,
      but it can't clear MD_CHANGE_PENDING flag since it can't
      get lock which was held by md_attr_store already.
      
      To solve the deadlock problem, we move "->resync_finish()"
      from md_do_sync to md_reap_sync_thread (after md_update_sb),
      also MD_HELD_RESYNC_LOCK is introduced since it is possible
      that node can't get resync lock in md_do_sync.
      
      Then we do not need to wait for MD_CHANGE_PENDING is cleared
      or not since metadata should be updated after md_update_sb,
      so just call resync_finish if MD_HELD_RESYNC_LOCK is set.
      
      We also unified the code after skip label, since set PENDING
      for non-clustered case should be harmless.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      bb8bf15b
  4. 26 5月, 2016 1 次提交
  5. 13 5月, 2016 4 次提交
  6. 10 5月, 2016 6 次提交
  7. 06 5月, 2016 7 次提交
  8. 05 5月, 2016 13 次提交
  9. 30 4月, 2016 1 次提交
  10. 26 4月, 2016 1 次提交