1. 08 5月, 2017 1 次提交
    • W
      block/mq: fix potential deadlock during cpu hotplug · 51d638b1
      Wanpeng Li 提交于
      This can be triggered by hot-unplug one cpu.
      
      ======================================================
       [ INFO: possible circular locking dependency detected ]
       4.11.0+ #17 Not tainted
       -------------------------------------------------------
       step_after_susp/2640 is trying to acquire lock:
        (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110
      
       but task is already holding lock:
        (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (cpu_hotplug.lock){+.+.+.}:
              lock_acquire+0x11c/0x230
              __mutex_lock+0x92/0x990
              mutex_lock_nested+0x1b/0x20
              get_online_cpus+0x64/0x80
              blk_mq_init_allocated_queue+0x3a0/0x4e0
              blk_mq_init_queue+0x3a/0x60
              loop_add+0xe5/0x280
              loop_init+0x124/0x177
              do_one_initcall+0x53/0x1c0
              kernel_init_freeable+0x1e3/0x27f
              kernel_init+0xe/0x100
              ret_from_fork+0x31/0x40
      
       -> #0 (all_q_mutex){+.+...}:
              __lock_acquire+0x189a/0x18a0
              lock_acquire+0x11c/0x230
              __mutex_lock+0x92/0x990
              mutex_lock_nested+0x1b/0x20
              blk_mq_queue_reinit_work+0x18/0x110
              blk_mq_queue_reinit_dead+0x1c/0x20
              cpuhp_invoke_callback+0x1f2/0x810
              cpuhp_down_callbacks+0x42/0x80
              _cpu_down+0xb2/0xe0
              freeze_secondary_cpus+0xb6/0x390
              suspend_devices_and_enter+0x3b3/0xa40
              pm_suspend+0x129/0x490
              state_store+0x82/0xf0
              kobj_attr_store+0xf/0x20
              sysfs_kf_write+0x45/0x60
              kernfs_fop_write+0x135/0x1c0
              __vfs_write+0x37/0x160
              vfs_write+0xcd/0x1d0
              SyS_write+0x58/0xc0
              do_syscall_64+0x8f/0x710
              return_from_SYSCALL_64+0x0/0x7a
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(cpu_hotplug.lock);
                                      lock(all_q_mutex);
                                      lock(cpu_hotplug.lock);
         lock(all_q_mutex);
      
        *** DEADLOCK ***
      
       8 locks held by step_after_susp/2640:
        #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
        #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
        #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
        #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
        #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
        #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
        #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
        #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0
      
       stack backtrace:
       CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
       Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
       Call Trace:
        dump_stack+0x99/0xce
        print_circular_bug+0x1fa/0x270
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        ? lock_acquire+0x11c/0x230
        ? blk_mq_queue_reinit_work+0x18/0x110
        ? blk_mq_queue_reinit_work+0x18/0x110
        __mutex_lock+0x92/0x990
        ? blk_mq_queue_reinit_work+0x18/0x110
        ? kmem_cache_free+0x2cb/0x330
        ? anon_transport_class_unregister+0x20/0x20
        ? blk_mq_queue_reinit_work+0x110/0x110
        mutex_lock_nested+0x1b/0x20
        ? mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        ? __flow_cache_shrink+0x160/0x160
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        ? rcu_read_lock_sched_held+0x79/0x80
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        ? rcu_read_lock_sched_held+0x79/0x80
        ? rcu_sync_lockdep_assert+0x2f/0x60
        ? __sb_start_write+0xd9/0x1c0
        ? vfs_write+0x1ad/0x1d0
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        ? rcu_read_lock_sched_held+0x79/0x80
        do_syscall_64+0x8f/0x710
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL64_slow_path+0x25/0x25
      
      The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
      queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
      contend these two locks in the inversion order. This is due to commit eabe0659
      (blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
      issue because of hotplug rework, however the hotplug rework is still work-in-progress
      and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
      breaks the linus's tree in the merge window, so this patch reverts the lock order
      and avoids to splat linus's tree.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      51d638b1
  2. 04 5月, 2017 14 次提交
  3. 03 5月, 2017 1 次提交
  4. 02 5月, 2017 3 次提交
  5. 28 4月, 2017 4 次提交
  6. 27 4月, 2017 10 次提交
  7. 26 4月, 2017 1 次提交
    • D
      Revert "block: use DAX for partition table reads" · a41fe02b
      Dan Williams 提交于
      commit d1a5f2b4 ("block: use DAX for partition table reads") was
      part of a stalled effort to allow dax mappings of block devices. Since
      then the device-dax mechanism has filled the role of dax-mapping static
      device ranges.
      
      Now that we are moving ->direct_access() from a block_device operation
      to a dax_inode operation we would need block devices to map and carry
      their own dax_inode reference.
      
      Unless / until we decide to revive dax mapping of raw block devices
      through the dax_inode scheme, there is no need to carry
      read_dax_sector(). Its removal in turn allows for the removal of
      bdev_direct_access() and should have been included in commit
      22375701 ("block_dev: remove DAX leftovers").
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a41fe02b
  8. 24 4月, 2017 1 次提交
  9. 22 4月, 2017 2 次提交
    • I
      block: get rid of blk_integrity_revalidate() · 19b7ccf8
      Ilya Dryomov 提交于
      Commit 25520d55 ("block: Inline blk_integrity in struct gendisk")
      introduced blk_integrity_revalidate(), which seems to assume ownership
      of the stable pages flag and unilaterally clears it if no blk_integrity
      profile is registered:
      
          if (bi->profile)
                  disk->queue->backing_dev_info->capabilities |=
                          BDI_CAP_STABLE_WRITES;
          else
                  disk->queue->backing_dev_info->capabilities &=
                          ~BDI_CAP_STABLE_WRITES;
      
      It's called from revalidate_disk() and rescan_partitions(), making it
      impossible to enable stable pages for drivers that support partitions
      and don't use blk_integrity: while the call in revalidate_disk() can be
      trivially worked around (see zram, which doesn't support partitions and
      hence gets away with zram_revalidate_disk()), rescan_partitions() can
      be triggered from userspace at any time.  This breaks rbd, where the
      ceph messenger is responsible for generating/verifying CRCs.
      
      Since blk_integrity_{un,}register() "must" be used for (un)registering
      the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
      setting there.  This way drivers that call blk_integrity_register() and
      use integrity infrastructure won't interfere with drivers that don't
      but still want stable pages.
      
      Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk")
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.4+, needs backporting
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      19b7ccf8
    • B
      blk-mq: Fix preempt count imbalance · abc25a69
      Bart Van Assche 提交于
      Avoid that the following kernel bug gets triggered:
      
      BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:349
      in_atomic(): 1, irqs_disabled(): 0, pid: 8019, name: find
      CPU: 10 PID: 8019 Comm: find Tainted: G        W I     4.11.0-rc4-dbg+ #2
      Call Trace:
       dump_stack+0x68/0x93
       ___might_sleep+0x16e/0x230
       __might_sleep+0x4a/0x80
       __ext4_get_inode_loc+0x1e0/0x4e0
       ext4_iget+0x70/0xbc0
       ext4_iget_normal+0x2f/0x40
       ext4_lookup+0xb6/0x1f0
       lookup_slow+0x104/0x1e0
       walk_component+0x19a/0x330
       path_lookupat+0x4b/0x100
       filename_lookup+0x9a/0x110
       user_path_at_empty+0x36/0x40
       vfs_statx+0x67/0xc0
       SYSC_newfstatat+0x20/0x40
       SyS_newfstatat+0xe/0x10
       entry_SYSCALL_64_fastpath+0x18/0xad
      
      This happens since the big if/else in blk_mq_make_request() doesn't
      have final else section that also drops the ctx. Add that.
      
      Fixes: b00c53e8 ("blk-mq: fix schedule-while-atomic with scheduler attached")
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Omar Sandoval <osandov@fb.com>
      
      Added a bit more to the commit log.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      abc25a69
  10. 21 4月, 2017 3 次提交