1. 22 9月, 2016 3 次提交
  2. 09 9月, 2016 1 次提交
  3. 25 8月, 2016 1 次提交
  4. 18 8月, 2016 2 次提交
  5. 17 8月, 2016 1 次提交
  6. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  7. 29 7月, 2016 1 次提交
  8. 20 7月, 2016 3 次提交
  9. 14 6月, 2016 3 次提交
  10. 10 6月, 2016 1 次提交
    • C
      md: use a mutex to protect a global list · 5b1f5bc3
      Cong Wang 提交于
      We saw a list corruption in the list all_detected_devices:
      
       WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9()
       list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0).
       Modules linked in: ahci libahci libata sd_mod scsi_mod
       CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1
       Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013
       Workqueue: events_unbound async_run_entry_fn
        0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48
        0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828
        ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0
       Call Trace:
        [<ffffffff81502872>] dump_stack+0x4d/0x63
        [<ffffffff810692a5>] warn_slowpath_common+0xa1/0xbb
        [<ffffffff812ad02c>] ? __list_add+0x3c/0xa9
        [<ffffffff81069305>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff812ad02c>] __list_add+0x3c/0xa9
        [<ffffffff81406f28>] md_autodetect_dev+0x41/0x62
        [<ffffffff81285862>] rescan_partitions+0x25f/0x29d
        [<ffffffff81506372>] ? mutex_lock+0x13/0x31
        [<ffffffff811a090f>] __blkdev_get+0x1aa/0x3cd
        [<ffffffff811a0b91>] blkdev_get+0x5f/0x294
        [<ffffffff81377ceb>] ? put_device+0x17/0x19
        [<ffffffff8128227c>] ? disk_put_part+0x12/0x14
        [<ffffffff812836f3>] add_disk+0x29d/0x407
        [<ffffffff81384345>] ? __pm_runtime_use_autosuspend+0x5c/0x64
        [<ffffffffa004a724>] sd_probe_async+0x115/0x1af [sd_mod]
        [<ffffffff81083177>] async_run_entry_fn+0x72/0x12c
        [<ffffffff8107c44c>] process_one_work+0x198/0x2ce
        [<ffffffff8107cac7>] worker_thread+0x1dd/0x2bb
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff8107c8ea>] ? cancel_delayed_work_sync+0x15/0x15
        [<ffffffff81080d9c>] kthread+0xae/0xb6
        [<ffffffff81080000>] ? param_array_set+0x40/0xfa
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
        [<ffffffff81508152>] ret_from_fork+0x42/0x70
        [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
      
      I suspect it is because there is no lock protecting this
      global list, autostart_arrays() is called in ioctl() path
      where there is no lock.
      
      Cc: Shaohua Li <shli@kernel.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      5b1f5bc3
  11. 08 6月, 2016 3 次提交
  12. 04 6月, 2016 2 次提交
    • G
      md: simplify the code with md_kick_rdev_from_array · db767672
      Guoqing Jiang 提交于
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      db767672
    • G
      md-cluster: fix deadlock issue when add disk to an recoverying array · bb8bf15b
      Guoqing Jiang 提交于
      Add a disk to an array which is performing recovery
      is a little complicated, we need to do both reap the
      sync thread and perform add disk for the case, then
      it caused deadlock as follows.
      
      linux44:~ # ps aux|grep md|grep D
      root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
      root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
      linux44:~ # cat /proc/1848/stack
      [<ffffffff8107afde>] kthread_stop+0x6e/0x120
      [<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
      [<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
      [<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
      [<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
      [<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
      [<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
      [<ffffffff811a75b8>] SyS_write+0x48/0xa0
      [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
      [<00007f068ea1ed30>] 0x7f068ea1ed30
      linux44:~ # cat /proc/1822/stack
      [<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
      [<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
      [<ffffffff8107ad94>] kthread+0xb4/0xc0
      [<ffffffff8152a518>] ret_from_fork+0x58/0x90
      
                              Task1848                                Task1822
      md_attr_store (held reconfig_mutex by call mddev_lock())
                              action_store
      			md_reap_sync_thread
      			md_unregister_thread
      			kthread_stop                    md_wakeup_thread(mddev->thread);
      						wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))
      
      md_check_recovery is triggered by wakeup mddev->thread,
      but it can't clear MD_CHANGE_PENDING flag since it can't
      get lock which was held by md_attr_store already.
      
      To solve the deadlock problem, we move "->resync_finish()"
      from md_do_sync to md_reap_sync_thread (after md_update_sb),
      also MD_HELD_RESYNC_LOCK is introduced since it is possible
      that node can't get resync lock in md_do_sync.
      
      Then we do not need to wait for MD_CHANGE_PENDING is cleared
      or not since metadata should be updated after md_update_sb,
      so just call resync_finish if MD_HELD_RESYNC_LOCK is set.
      
      We also unified the code after skip label, since set PENDING
      for non-clustered case should be harmless.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      bb8bf15b
  13. 10 5月, 2016 2 次提交
    • G
      md: set MD_CHANGE_PENDING in a atomic region · 85ad1d13
      Guoqing Jiang 提交于
      Some code waits for a metadata update by:
      
      1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
      2. setting MD_CHANGE_PENDING and waking the management thread
      3. waiting for MD_CHANGE_PENDING to be cleared
      
      If the first two are done without locking, the code in md_update_sb()
      which checks if it needs to repeat might test if an update is needed
      before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
      in the wait returning early.
      
      So make sure all places that set MD_CHANGE_PENDING are atomicial, and
      bit_clear_unless (suggested by Neil) is introduced for the purpose.
      
      Cc: Martin Kepplinger <martink@posteo.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: <linux-kernel@vger.kernel.org>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      85ad1d13
    • H
      md: md.c: fix oops in mddev_suspend for raid0 · 092398dc
      Heinz Mauelshagen 提交于
      Introduced by upstream commit 70d9798b
      
      The raid0 personality does not create mddev->thread as oposed to
      other personalities leading to its unconditional access in
      mddev_suspend() causing an oops.
      
      Patch checks for mddev->thread in order to keep the
      intention of aforementioned commit.
      
      Fixes: 70d9798b ("MD: warn for potential deadlock")
      Cc: stable@vger.kernel.org (4.5+)
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      092398dc
  14. 05 5月, 2016 4 次提交
  15. 26 4月, 2016 1 次提交
  16. 13 4月, 2016 1 次提交
  17. 01 4月, 2016 2 次提交
    • S
      MD: add rdev reference for super write · ed3b98c7
      Shaohua Li 提交于
      Xiao Ni reported below crash:
      [26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
      [26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
      [26396.349449] PGD 0
      [26396.351468] Oops: 0002 [#1] SMP
      [26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td
      [26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1
      [26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014
      [26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000
      [26396.429074] RIP: 0010:[<ffffffffa0425b00>]  [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
      [26396.437952] RSP: 0018:ffff8808365f7c38  EFLAGS: 00010046
      [26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198
      [26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900
      [26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7
      [26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00
      [26396.478849] FS:  00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000
      [26396.486922] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0
      [26396.499775] Stack:
      [26396.501781]  ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68
      [26396.509199]  ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637
      [26396.516618]  00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000
      [26396.524038] Call Trace:
      [26396.526485]  [<ffffffff81308cd0>] bio_endio+0x40/0x60
      [26396.531529]  [<ffffffff81310637>] blk_update_request+0x87/0x320
      [26396.537439]  [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70
      [26396.543261]  [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0
      [26396.549517]  [<ffffffff81313ccf>] flush_end_io+0x15f/0x240
      [26396.554993]  [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70
      [26396.560815]  [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0
      [26396.567246]  [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20
      [26396.573506]  [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop]
      [26396.579764]  [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0
      [26396.585242]  [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170
      [26396.591065]  [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0
      [26396.597582]  [<ffffffff810a7238>] kthread+0xd8/0xf0
      [26396.602453]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
      [26396.607929]  [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70
      [26396.613319]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
      
      md_super_write() and corresponding md_super_wait() generally are called
      with reconfig_mutex locked, which prevents disk disappears. There is one
      case this rule is broken. write_sb_page of bitmap.c doesn't hold the
      mutex. next_active_rdev does increase rdev reference, but it decreases
      the reference too early (eg, before IO finish). disk can disappear at
      the window. We unconditionally increase rdev reference in
      md_super_write() to avoid the race.
      Reported-and-tested-by: NXiao Ni <xni@redhat.com>
      Reviewed-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      ed3b98c7
    • W
      md: fix a trivial typo in comments · 466ad292
      Wei Fang 提交于
      Fix a trivial typo in md_ioctl().
      Signed-off-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      466ad292
  18. 27 2月, 2016 2 次提交
    • S
      MD: warn for potential deadlock · 70d9798b
      Shaohua Li 提交于
      The personality thread shouldn't call mddev_suspend(). Because
      mddev_suspend() will for all IO finish, but IO is handled in personality
      thread, so this could cause deadlock. To trigger this early, add a
      warning if mddev_suspend() is called from personality thread.
      Suggested-by: NNeilBrown <neilb@suse.com>
      Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      70d9798b
    • S
      md: Drop sending a change uevent when stopping · 399146b8
      Sebastian Parschauer 提交于
      When stopping an MD device, then its device node /dev/mdX may still
      exist afterwards or it is recreated by udev. The next open() call
      can lead to creation of an inoperable MD device. The reason for
      this is that a change event (KOBJ_CHANGE) is sent to udev which
      races against the remove event (KOBJ_REMOVE) from md_free().
      So drop sending the change event.
      
      A change is likely also required in mdadm as many versions send the
      change event to udev as well.
      
      Neil mentioned the change event is a workaround for old kernel
      Commit: 934d9c23 ("md: destroy partitions and notify udev when md array is stopped.")
      new mdadm can handle device remove now, so this isn't required any more.
      
      Cc: NeilBrown <neilb@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NSebastian Parschauer <sebastian.riemer@profitbricks.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      399146b8
  19. 14 1月, 2016 3 次提交
    • D
      md/raid: only permit hot-add of compatible integrity profiles · 1501efad
      Dan Williams 提交于
      It is not safe for an integrity profile to be changed while i/o is
      in-flight in the queue.  Prevent adding new disks or otherwise online
      spares to an array if the device has an incompatible integrity profile.
      
      The original change to the blk_integrity_unregister implementation in
      md, commmit c7bfced9 "md: suspend i/o during runtime
      blk_integrity_unregister" introduced an immediate hang regression.
      
      This policy of disallowing changes the integrity profile once one has
      been established is shared with DM.
      
      Here is an abbreviated log from a test run that:
      1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [   59.076127]
      2/ Tries to add an integrity-disabled device (pmem1m) [   90.489209]
      3/ Retries with an integrity-enabled device (pmem1s) [  205.671277]
      
      [   59.076127] md/raid1:md0: active with 1 out of 2 mirrors
      [   59.078302] md: data integrity enabled on md0
      [..]
      [   90.489209] md0: incompatible integrity profile for pmem1m
      [..]
      [  205.671277] md: super_written gets error=-5
      [  205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
      [  205.677386] md/raid1:md0: Operation continuing on 1 devices.
      [  205.683037] RAID1 conf printout:
      [  205.684699]  --- wd:1 rd:2
      [  205.685972]  disk 0, wo:0, o:1, dev:pmem0s
      [  205.687562]  disk 1, wo:1, o:1, dev:pmem1s
      [  205.691717] md: recovery of RAID array md0
      
      Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister")
      Cc: <stable@vger.kernel.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reported-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      1501efad
    • S
      MD: add journal with array suspended · 87d4d916
      Shaohua Li 提交于
      Hot add journal disk in recovery thread context brings a lot of trouble
      as IO could be running. Unlike spare disk hot add, adding journal disk
      with array suspended makes more sense and implmentation is much easier.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      87d4d916
    • S
      md: set MD_HAS_JOURNAL in correct places · a62ab49e
      Shaohua Li 提交于
      Set MD_HAS_JOURNAL when a array is loaded or journal is initialized.
      This is to avoid the flags set too early in journal disk hotadd.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      a62ab49e
  20. 10 1月, 2016 2 次提交
  21. 07 1月, 2016 1 次提交
    • N
      md: Remove 'ready' field from mddev. · 274d8cbd
      NeilBrown 提交于
      This field is always set in tandem with ->pers, and when it is tested
      ->pers is also tested.  So ->ready is not needed.
      
      It was needed once, but code rearrangement and locking changes have
      removed that needed.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      274d8cbd