1. 08 11月, 2016 1 次提交
    • T
      md: add bad block support for external metadata · 35b785f7
      Tomasz Majchrzak 提交于
      Add new rdev flag which external metadata handler can use to switch
      on/off bad block support. If new bad block is encountered, notify it via
      rdev 'unacknowledged_bad_blocks' sysfs file. If bad block has been
      cleared, notify update to rdev 'bad_blocks' sysfs file.
      
      When bad blocks support is being removed, just clear rdev flag. It is
      not necessary to reset badblocks->shift field. If there are bad blocks
      cleared or added at the same time, it is ok for those changes to be
      applied to the structure. The array is in blocked state and the drive
      which cannot handle bad blocks any more will be removed from the array
      before it is unlocked.
      
      Simplify state_show function by adding a separator at the end of each
      string and overwrite last separator with new line.
      Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
      Reviewed-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      35b785f7
  2. 29 10月, 2016 1 次提交
    • N
      md: be careful not lot leak internal curr_resync value into metadata. -- (all) · 1217e1d1
      NeilBrown 提交于
      mddev->curr_resync usually records where the current resync is up to,
      but during the starting phase it has some "magic" values.
      
       1 - means that the array is trying to start a resync, but has yielded
           to another array which shares physical devices, and also needs to
           start a resync
       2 - means the array is trying to start resync, but has found another
           array which shares physical devices and has already started resync.
      
       3 - means that resync has commensed, but it is possible that nothing
           has actually been resynced yet.
      
      It is important that this value not be visible to user-space and
      particularly that it doesn't get written to the metadata, as the
      resync or recovery checkpoint.  In part, this is because it may be
      slightly higher than the correct value, though this is very rare.
      In part, because it is not a multiple of 4K, and some devices only
      support 4K aligned accesses.
      
      There are two places where this value is propagates into either
      ->curr_resync_completed or ->recovery_cp or ->recovery_offset.
      These currently avoid the propagation of values 1 and 3, but will
      allow 3 to leak through.
      
      Change them to only propagate the value if it is > 3.
      
      As this can cause an array to fail, the patch is suitable for -stable.
      
      Cc: stable@vger.kernel.org (v3.7+)
      Reported-by: NViswesh <viswesh.vichu@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      1217e1d1
  3. 25 10月, 2016 1 次提交
    • T
      md: report 'write_pending' state when array in sync · 16f88949
      Tomasz Majchrzak 提交于
      If there is a bad block on a disk and there is a recovery performed from
      this disk, the same bad block is reported for a new disk. It involves
      setting MD_CHANGE_PENDING flag in rdev_set_badblocks. For external
      metadata this flag is not being cleared as array state is reported as
      'clean'. The read request to bad block in RAID5 array gets stuck as it
      is waiting for a flag to be cleared - as per commit c3cce6cd
      ("md/raid5: ensure device failure recorded before write request
      returns.").
      
      The meaning of MD_CHANGE_PENDING and MD_CHANGE_CLEAN flags has been
      clarified in commit 070dc6dd ("md: resolve confusion of
      MD_CHANGE_CLEAN"), however MD_CHANGE_PENDING flag has been used in
      personality error handlers since and it doesn't fully comply with
      initial purpose. It was supposed to notify that write request is about
      to start, however now it is also used to request metadata update.
      Initially (in md_allow_write, md_write_start) MD_CHANGE_PENDING flag has
      been set and in_sync has been set to 0 at the same time. Error handlers
      just set the flag without modifying in_sync value. Sysfs array state is
      a single value so now it reports 'clean' when MD_CHANGE_PENDING flag is
      set and in_sync is set to 1. Userspace has no idea it is expected to
      take some action.
      
      Swap the order that array state is checked so 'write_pending' is
      reported ahead of 'clean' ('write_pending' is a misleading name but it
      is too late to rename it now).
      Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      16f88949
  4. 04 10月, 2016 1 次提交
  5. 22 9月, 2016 4 次提交
    • S
      md: fix a potential deadlock · 90bcf133
      Shaohua Li 提交于
      lockdep reports a potential deadlock. Fix this by droping the mutex
      before md_import_device
      
      [ 1137.126601] ======================================================
      [ 1137.127013] [ INFO: possible circular locking dependency detected ]
      [ 1137.127013] 4.8.0-rc4+ #538 Not tainted
      [ 1137.127013] -------------------------------------------------------
      [ 1137.127013] mdadm/16675 is trying to acquire lock:
      [ 1137.127013]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81243cf3>] __blkdev_get+0x63/0x450
      [ 1137.127013]
      but task is already holding lock:
      [ 1137.127013]  (detected_devices_mutex){+.+.+.}, at: [<ffffffff81a5138c>] md_ioctl+0x2ac/0x1f50
      [ 1137.127013]
      which lock already depends on the new lock.
      
      [ 1137.127013]
      the existing dependency chain (in reverse order) is:
      [ 1137.127013]
      -> #1 (detected_devices_mutex){+.+.+.}:
      [ 1137.127013]        [<ffffffff810b6f19>] lock_acquire+0xb9/0x220
      [ 1137.127013]        [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0
      [ 1137.127013]        [<ffffffff81a4eeaf>] md_autodetect_dev+0x3f/0x90
      [ 1137.127013]        [<ffffffff81595be8>] rescan_partitions+0x1a8/0x2c0
      [ 1137.127013]        [<ffffffff81590081>] __blkdev_reread_part+0x71/0xb0
      [ 1137.127013]        [<ffffffff815900e5>] blkdev_reread_part+0x25/0x40
      [ 1137.127013]        [<ffffffff81590c4b>] blkdev_ioctl+0x51b/0xa30
      [ 1137.127013]        [<ffffffff81242bf1>] block_ioctl+0x41/0x50
      [ 1137.127013]        [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0
      [ 1137.127013]        [<ffffffff81215321>] SyS_ioctl+0x41/0x70
      [ 1137.127013]        [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [ 1137.127013]
      -> #0 (&bdev->bd_mutex){+.+.+.}:
      [ 1137.127013]        [<ffffffff810b6af2>] __lock_acquire+0x1662/0x1690
      [ 1137.127013]        [<ffffffff810b6f19>] lock_acquire+0xb9/0x220
      [ 1137.127013]        [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0
      [ 1137.127013]        [<ffffffff81243cf3>] __blkdev_get+0x63/0x450
      [ 1137.127013]        [<ffffffff81244307>] blkdev_get+0x227/0x350
      [ 1137.127013]        [<ffffffff812444f6>] blkdev_get_by_dev+0x36/0x50
      [ 1137.127013]        [<ffffffff81a46d65>] lock_rdev+0x35/0x80
      [ 1137.127013]        [<ffffffff81a49bb4>] md_import_device+0xb4/0x1b0
      [ 1137.127013]        [<ffffffff81a513d6>] md_ioctl+0x2f6/0x1f50
      [ 1137.127013]        [<ffffffff815909b3>] blkdev_ioctl+0x283/0xa30
      [ 1137.127013]        [<ffffffff81242bf1>] block_ioctl+0x41/0x50
      [ 1137.127013]        [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0
      [ 1137.127013]        [<ffffffff81215321>] SyS_ioctl+0x41/0x70
      [ 1137.127013]        [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [ 1137.127013]
      other info that might help us debug this:
      
      [ 1137.127013]  Possible unsafe locking scenario:
      
      [ 1137.127013]        CPU0                    CPU1
      [ 1137.127013]        ----                    ----
      [ 1137.127013]   lock(detected_devices_mutex);
      [ 1137.127013]                                lock(&bdev->bd_mutex);
      [ 1137.127013]                                lock(detected_devices_mutex);
      [ 1137.127013]   lock(&bdev->bd_mutex);
      [ 1137.127013]
       *** DEADLOCK ***
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      90bcf133
    • G
      md-cluster: clean related infos of cluster · c20c33f0
      Guoqing Jiang 提交于
      cluster_info and bitmap_info.nodes also need to be
      cleared when array is stopped.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      c20c33f0
    • G
      md: changes for MD_STILL_CLOSED flag · af8d8e6f
      Guoqing Jiang 提交于
      When stop clustered raid while it is pending on resync,
      MD_STILL_CLOSED flag could be cleared since udev rule
      is triggered to open the mddev. So obviously array can't
      be stopped soon and returns EBUSY.
      
      	mdadm -Ss          md-raid-arrays.rules
        set MD_STILL_CLOSED          md_open()
      	... ... ...          clear MD_STILL_CLOSED
      	do_md_stop
      
      We make below changes to resolve this issue:
      
      1. rename MD_STILL_CLOSED to MD_CLOSING since it is set
         when stop array and it means we are stopping array.
      2. let md_open returns early if CLOSING is set, so no
         other threads will open array if one thread is trying
         to close it.
      3. no need to clear CLOSING bit in md_open because 1 has
         ensure the bit is cleared, then we also don't need to
         test CLOSING bit in do_md_stop.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      af8d8e6f
    • G
      md-cluster: call md_kick_rdev_from_array once ack failed · e566aef1
      Guoqing Jiang 提交于
      The new_disk_ack could return failure if WAITING_FOR_NEWDISK
      is not set, so we need to kick the dev from array in case
      failure happened.
      
      And we missed to check err before call new_disk_ack othwise
      we could kick a rdev which isn't in array, thanks for the
      reminder from Shaohua.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      e566aef1
  6. 09 9月, 2016 1 次提交
  7. 25 8月, 2016 1 次提交
  8. 18 8月, 2016 2 次提交
  9. 17 8月, 2016 1 次提交
  10. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  11. 29 7月, 2016 1 次提交
  12. 20 7月, 2016 3 次提交
  13. 14 6月, 2016 3 次提交
  14. 10 6月, 2016 1 次提交
    • C
      md: use a mutex to protect a global list · 5b1f5bc3
      Cong Wang 提交于
      We saw a list corruption in the list all_detected_devices:
      
       WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9()
       list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0).
       Modules linked in: ahci libahci libata sd_mod scsi_mod
       CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1
       Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013
       Workqueue: events_unbound async_run_entry_fn
        0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48
        0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828
        ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0
       Call Trace:
        [<ffffffff81502872>] dump_stack+0x4d/0x63
        [<ffffffff810692a5>] warn_slowpath_common+0xa1/0xbb
        [<ffffffff812ad02c>] ? __list_add+0x3c/0xa9
        [<ffffffff81069305>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff812ad02c>] __list_add+0x3c/0xa9
        [<ffffffff81406f28>] md_autodetect_dev+0x41/0x62
        [<ffffffff81285862>] rescan_partitions+0x25f/0x29d
        [<ffffffff81506372>] ? mutex_lock+0x13/0x31
        [<ffffffff811a090f>] __blkdev_get+0x1aa/0x3cd
        [<ffffffff811a0b91>] blkdev_get+0x5f/0x294
        [<ffffffff81377ceb>] ? put_device+0x17/0x19
        [<ffffffff8128227c>] ? disk_put_part+0x12/0x14
        [<ffffffff812836f3>] add_disk+0x29d/0x407
        [<ffffffff81384345>] ? __pm_runtime_use_autosuspend+0x5c/0x64
        [<ffffffffa004a724>] sd_probe_async+0x115/0x1af [sd_mod]
        [<ffffffff81083177>] async_run_entry_fn+0x72/0x12c
        [<ffffffff8107c44c>] process_one_work+0x198/0x2ce
        [<ffffffff8107cac7>] worker_thread+0x1dd/0x2bb
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff81080d9c>] kthread+0xae/0xb6
        [<ffffffff81080000>] ? param_array_set+0x40/0xfa
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
        [<ffffffff81508152>] ret_from_fork+0x42/0x70
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
      
      I suspect it is because there is no lock protecting this
      global list, autostart_arrays() is called in ioctl() path
      where there is no lock.
      
      Cc: Shaohua Li <shli@kernel.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      5b1f5bc3
  15. 08 6月, 2016 3 次提交
  16. 04 6月, 2016 2 次提交
    • G
      md: simplify the code with md_kick_rdev_from_array · db767672
      Guoqing Jiang 提交于
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      db767672
    • G
      md-cluster: fix deadlock issue when add disk to an recoverying array · bb8bf15b
      Guoqing Jiang 提交于
      Add a disk to an array which is performing recovery
      is a little complicated, we need to do both reap the
      sync thread and perform add disk for the case, then
      it caused deadlock as follows.
      
      linux44:~ # ps aux|grep md|grep D
      root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
      root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
      linux44:~ # cat /proc/1848/stack
      [<ffffffff8107afde>] kthread_stop+0x6e/0x120
      [<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
      [<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
      [<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
      [<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
      [<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
      [<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
      [<ffffffff811a75b8>] SyS_write+0x48/0xa0
      [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
      [<00007f068ea1ed30>] 0x7f068ea1ed30
      linux44:~ # cat /proc/1822/stack
      [<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
      [<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
      [<ffffffff8107ad94>] kthread+0xb4/0xc0
      [<ffffffff8152a518>] ret_from_fork+0x58/0x90
      
                              Task1848                                Task1822
      md_attr_store (held reconfig_mutex by call mddev_lock())
                              action_store
      			md_reap_sync_thread
      			md_unregister_thread
      			kthread_stop                    md_wakeup_thread(mddev->thread);
      						wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))
      
      md_check_recovery is triggered by wakeup mddev->thread,
      but it can't clear MD_CHANGE_PENDING flag since it can't
      get lock which was held by md_attr_store already.
      
      To solve the deadlock problem, we move "->resync_finish()"
      from md_do_sync to md_reap_sync_thread (after md_update_sb),
      also MD_HELD_RESYNC_LOCK is introduced since it is possible
      that node can't get resync lock in md_do_sync.
      
      Then we do not need to wait for MD_CHANGE_PENDING is cleared
      or not since metadata should be updated after md_update_sb,
      so just call resync_finish if MD_HELD_RESYNC_LOCK is set.
      
      We also unified the code after skip label, since set PENDING
      for non-clustered case should be harmless.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      bb8bf15b
  17. 10 5月, 2016 2 次提交
    • G
      md: set MD_CHANGE_PENDING in a atomic region · 85ad1d13
      Guoqing Jiang 提交于
      Some code waits for a metadata update by:
      
      1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
      2. setting MD_CHANGE_PENDING and waking the management thread
      3. waiting for MD_CHANGE_PENDING to be cleared
      
      If the first two are done without locking, the code in md_update_sb()
      which checks if it needs to repeat might test if an update is needed
      before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
      in the wait returning early.
      
      So make sure all places that set MD_CHANGE_PENDING are atomicial, and
      bit_clear_unless (suggested by Neil) is introduced for the purpose.
      
      Cc: Martin Kepplinger <martink@posteo.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: <linux-kernel@vger.kernel.org>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      85ad1d13
    • H
      md: md.c: fix oops in mddev_suspend for raid0 · 092398dc
      Heinz Mauelshagen 提交于
      Introduced by upstream commit 70d9798b
      
      The raid0 personality does not create mddev->thread as oposed to
      other personalities leading to its unconditional access in
      mddev_suspend() causing an oops.
      
      Patch checks for mddev->thread in order to keep the
      intention of aforementioned commit.
      
      Fixes: 70d9798b ("MD: warn for potential deadlock")
      Cc: stable@vger.kernel.org (4.5+)
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      092398dc
  18. 05 5月, 2016 4 次提交
  19. 26 4月, 2016 1 次提交
  20. 13 4月, 2016 1 次提交
  21. 01 4月, 2016 2 次提交
    • S
      MD: add rdev reference for super write · ed3b98c7
      Shaohua Li 提交于
      Xiao Ni reported below crash:
      [26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
      [26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
      [26396.349449] PGD 0
      [26396.351468] Oops: 0002 [#1] SMP
      [26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td
      [26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1
      [26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014
      [26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000
      [26396.429074] RIP: 0010:[<ffffffffa0425b00>]  [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
      [26396.437952] RSP: 0018:ffff8808365f7c38  EFLAGS: 00010046
      [26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198
      [26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900
      [26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7
      [26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00
      [26396.478849] FS:  00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000
      [26396.486922] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0
      [26396.499775] Stack:
      [26396.501781]  ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68
      [26396.509199]  ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637
      [26396.516618]  00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000
      [26396.524038] Call Trace:
      [26396.526485]  [<ffffffff81308cd0>] bio_endio+0x40/0x60
      [26396.531529]  [<ffffffff81310637>] blk_update_request+0x87/0x320
      [26396.537439]  [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70
      [26396.543261]  [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0
      [26396.549517]  [<ffffffff81313ccf>] flush_end_io+0x15f/0x240
      [26396.554993]  [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70
      [26396.560815]  [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0
      [26396.567246]  [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20
      [26396.573506]  [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop]
      [26396.579764]  [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0
      [26396.585242]  [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170
      [26396.591065]  [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0
      [26396.597582]  [<ffffffff810a7238>] kthread+0xd8/0xf0
      [26396.602453]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
      [26396.607929]  [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70
      [26396.613319]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
      
      md_super_write() and corresponding md_super_wait() generally are called
      with reconfig_mutex locked, which prevents disk disappears. There is one
      case this rule is broken. write_sb_page of bitmap.c doesn't hold the
      mutex. next_active_rdev does increase rdev reference, but it decreases
      the reference too early (eg, before IO finish). disk can disappear at
      the window. We unconditionally increase rdev reference in
      md_super_write() to avoid the race.
      Reported-and-tested-by: NXiao Ni <xni@redhat.com>
      Reviewed-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      ed3b98c7
    • W
      md: fix a trivial typo in comments · 466ad292
      Wei Fang 提交于
      Fix a trivial typo in md_ioctl().
      Signed-off-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      466ad292
  22. 27 2月, 2016 2 次提交
    • S
      MD: warn for potential deadlock · 70d9798b
      Shaohua Li 提交于
      The personality thread shouldn't call mddev_suspend(). Because
      mddev_suspend() will for all IO finish, but IO is handled in personality
      thread, so this could cause deadlock. To trigger this early, add a
      warning if mddev_suspend() is called from personality thread.
      Suggested-by: NNeilBrown <neilb@suse.com>
      Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      70d9798b
    • S
      md: Drop sending a change uevent when stopping · 399146b8
      Sebastian Parschauer 提交于
      When stopping an MD device, then its device node /dev/mdX may still
      exist afterwards or it is recreated by udev. The next open() call
      can lead to creation of an inoperable MD device. The reason for
      this is that a change event (KOBJ_CHANGE) is sent to udev which
      races against the remove event (KOBJ_REMOVE) from md_free().
      So drop sending the change event.
      
      A change is likely also required in mdadm as many versions send the
      change event to udev as well.
      
      Neil mentioned the change event is a workaround for old kernel
      Commit: 934d9c23 ("md: destroy partitions and notify udev when md array is stopped.")
      new mdadm can handle device remove now, so this isn't required any more.
      
      Cc: NeilBrown <neilb@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NSebastian Parschauer <sebastian.riemer@profitbricks.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      399146b8
  23. 14 1月, 2016 1 次提交
    • D
      md/raid: only permit hot-add of compatible integrity profiles · 1501efad
      Dan Williams 提交于
      It is not safe for an integrity profile to be changed while i/o is
      in-flight in the queue.  Prevent adding new disks or otherwise online
      spares to an array if the device has an incompatible integrity profile.
      
      The original change to the blk_integrity_unregister implementation in
      md, commmit c7bfced9 "md: suspend i/o during runtime
      blk_integrity_unregister" introduced an immediate hang regression.
      
      This policy of disallowing changes the integrity profile once one has
      been established is shared with DM.
      
      Here is an abbreviated log from a test run that:
      1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [   59.076127]
      2/ Tries to add an integrity-disabled device (pmem1m) [   90.489209]
      3/ Retries with an integrity-enabled device (pmem1s) [  205.671277]
      
      [   59.076127] md/raid1:md0: active with 1 out of 2 mirrors
      [   59.078302] md: data integrity enabled on md0
      [..]
      [   90.489209] md0: incompatible integrity profile for pmem1m
      [..]
      [  205.671277] md: super_written gets error=-5
      [  205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
      [  205.677386] md/raid1:md0: Operation continuing on 1 devices.
      [  205.683037] RAID1 conf printout:
      [  205.684699]  --- wd:1 rd:2
      [  205.685972]  disk 0, wo:0, o:1, dev:pmem0s
      [  205.687562]  disk 1, wo:1, o:1, dev:pmem1s
      [  205.691717] md: recovery of RAID array md0
      
      Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister")
      Cc: <stable@vger.kernel.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reported-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      1501efad