1. 30 7月, 2019 1 次提交
  2. 23 7月, 2019 1 次提交
    • L
      nvme: fix memory leak caused by incorrect subsystem free · e654dfd3
      Logan Gunthorpe 提交于
      When freeing the subsystem after finding another match with
      __nvme_find_get_subsystem(), use put_device() instead of
      __nvme_release_subsystem() which calls kfree() directly.
      
      Per the documentation, put_device() should always be used
      after device_initialization() is called. Otherwise, leaks
      like the one below which was detected by kmemleak may occur.
      
      Once the call of __nvme_release_subsystem() is removed it no
      longer makes sense to keep the helper, so fold it back
      into nvme_release_subsystem().
      
      unreferenced object 0xffff8883d12bfbc0 (size 16):
        comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s)
        hex dump (first 16 bytes):
          6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff  nvme-subsys2....
        backtrace:
          [<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0
          [<0000000081169e5f>] kvasprintf+0xad/0x130
          [<0000000025626f25>] kvasprintf_const+0x47/0x120
          [<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120
          [<000000004881f8b3>] dev_set_name+0x98/0xc0
          [<000000007124dae3>] nvme_init_identify+0x1995/0x38e0
          [<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0
          [<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80
          [<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220
          [<000000002259b3d5>] __vfs_write+0x66/0x120
          [<000000002f6df81e>] vfs_write+0x154/0x490
          [<000000007e8cfc19>] ksys_write+0x10a/0x240
          [<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0
          [<00000000fee6d692>] do_syscall_64+0xaa/0x470
          [<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: ab9e00cc ("nvme: track subsystems")
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e654dfd3
  3. 12 7月, 2019 1 次提交
    • M
      nvme: fix NULL deref for fabrics options · 7d30c81b
      Minwoo Im 提交于
      git://git.infradead.org/nvme.git nvme-5.3 branch now causes the
      following NULL deref oops.  Check the ctrl->opts first before the deref.
      
      [   16.337581] BUG: kernel NULL pointer dereference, address: 0000000000000056
      [   16.338551] #PF: supervisor read access in kernel mode
      [   16.338551] #PF: error_code(0x0000) - not-present page
      [   16.338551] PGD 0 P4D 0
      [   16.338551] Oops: 0000 [#1] SMP PTI
      [   16.338551] CPU: 2 PID: 1035 Comm: kworker/u16:5 Not tainted 5.2.0-rc6+ #1
      [   16.338551] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
      [   16.338551] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
      [   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
      [   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
      [   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
      [   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
      [   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
      [   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
      [   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
      [   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
      [   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
      [   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.338551] Call Trace:
      [   16.338551]  nvme_scan_work+0x2c0/0x340 [nvme_core]
      [   16.338551]  ? __switch_to_asm+0x40/0x70
      [   16.338551]  ? _raw_spin_unlock_irqrestore+0x18/0x30
      [   16.338551]  ? try_to_wake_up+0x408/0x450
      [   16.338551]  process_one_work+0x20b/0x3e0
      [   16.338551]  worker_thread+0x1f9/0x3d0
      [   16.338551]  ? cancel_delayed_work+0xa0/0xa0
      [   16.338551]  kthread+0x117/0x120
      [   16.338551]  ? kthread_stop+0xf0/0xf0
      [   16.338551]  ret_from_fork+0x3a/0x50
      [   16.338551] Modules linked in: nvme nvme_core
      [   16.338551] CR2: 0000000000000056
      [   16.338551] ---[ end trace b9bf761a93e62d84 ]---
      [   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
      [   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
      [   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
      [   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
      [   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
      [   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
      [   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
      [   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
      [   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
      [   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
      [   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 958f2a0f ("nvme-tcp: set the STABLE_WRITES flag when data digests are enabled")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7d30c81b
  4. 11 7月, 2019 1 次提交
  5. 10 7月, 2019 2 次提交
    • M
      nvme-tcp: set the STABLE_WRITES flag when data digests are enabled · 958f2a0f
      Mikhail Skorzhinskii 提交于
      There was a few false alarms sighted on target side about wrong data
      digest while performing high throughput load to XFS filesystem shared
      through NVMoF TCP.
      
      This flag tells the rest of the kernel to ensure that the data buffer
      does not change while the write is in flight.  It incurs a performance
      penalty, so only enable it when it is actually needed, i.e. when we are
      calculating data digests.
      
      Although even with this change in place, ext2 users can steel experience
      false positives, as ext2 is not respecting this flag. This may be apply
      to vfat as well.
      Signed-off-by: NMikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
      Signed-off-by: NMike Playle <mplayle@solarflare.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      958f2a0f
    • B
      nvme: set physical block size and optimal I/O size · 81adb863
      Bart Van Assche 提交于
      >From the NVMe 1.4 spec:
      
      NSFEAT bit 4 if set to 1: indicates that the fields NPWG, NPWA, NPDG, NPDA,
      and NOWS are defined for this namespace and should be used by the host for
      I/O optimization;
      [ ... ]
      Namespace Preferred Write Granularity (NPWG): This field indicates the
      smallest recommended write granularity in logical blocks for this namespace.
      This is a 0's based value. The size indicated should be less than or equal
      to Maximum Data Transfer Size (MDTS) that is specified in units of minimum
      memory page size. The value of this field may change if the namespace is
      reformatted. The size should be a multiple of Namespace Preferred Write
      Alignment (NPWA). Refer to section 8.25 for how this field is utilized to
      improve performance and endurance.
      [ ... ]
      Each Write, Write Uncorrectable, or Write Zeroes commands should address a
      multiple of Namespace Preferred Write Granularity (NPWG) (refer to Figure
      245) and Stream Write Size (SWS) (refer to Figure 515) logical blocks (as
      expressed in the NLB field), and the SLBA field of the command should be
      aligned to Namespace Preferred Write Alignment (NPWA) (refer to Figure 245)
      for best performance.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      81adb863
  6. 21 6月, 2019 4 次提交
  7. 07 6月, 2019 1 次提交
  8. 21 5月, 2019 1 次提交
  9. 18 5月, 2019 7 次提交
  10. 14 5月, 2019 3 次提交
  11. 13 5月, 2019 1 次提交
  12. 01 5月, 2019 2 次提交
  13. 25 4月, 2019 1 次提交
  14. 10 4月, 2019 1 次提交
  15. 05 4月, 2019 2 次提交
  16. 27 3月, 2019 1 次提交
    • P
      srcu: Remove cleanup_srcu_struct_quiesced() · f5ad3991
      Paul E. McKenney 提交于
      The cleanup_srcu_struct_quiesced() function was added because NVME
      used WQ_MEM_RECLAIM workqueues and SRCU did not, which meant that
      NVME workqueues waiting on SRCU workqueues could result in deadlocks
      during low-memory conditions.  However, SRCU now also has WQ_MEM_RECLAIM
      workqueues, so there is no longer a potential for deadlock.  Furthermore,
      it turns out to be extremely hard to use cleanup_srcu_struct_quiesced()
      correctly due to the fact that SRCU callback invocation accesses the
      srcu_struct structure's per-CPU data area just after callbacks are
      invoked.  Therefore, the usual practice of using srcu_barrier() to wait
      for callbacks to be invoked before invoking cleanup_srcu_struct_quiesced()
      fails because SRCU's callback-invocation workqueue handler might be
      delayed, which can result in cleanup_srcu_struct_quiesced() being invoked
      (and thus freeing the per-CPU data) before the SRCU's callback-invocation
      workqueue handler is finished using that per-CPU data.  Nor is this a
      theoretical problem: KASAN emitted use-after-free warnings because of
      this problem on actual runs.
      
      In short, NVME can now safely invoke cleanup_srcu_struct(), which
      avoids the use-after-free scenario.  And cleanup_srcu_struct_quiesced()
      is quite difficult to use safely.  This commit therefore removes
      cleanup_srcu_struct_quiesced(), switching its sole user back to
      cleanup_srcu_struct().  This effectively reverts the following pair
      of commits:
      
      f7194ac3 ("srcu: Add cleanup_srcu_struct_quiesced()")
      4317228a ("nvme: Avoid flush dependency in delete controller flow")
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      f5ad3991
  17. 14 3月, 2019 7 次提交
  18. 20 2月, 2019 3 次提交
    • C
      nvme: convert to SPDX identifiers · bc50ad75
      Christoph Hellwig 提交于
      Update license to use SPDX-License-Identifier instead of verbose license
      text.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      bc50ad75
    • H
      nvme: return error from nvme_alloc_ns() · ab4ab09c
      Hannes Reinecke 提交于
      nvme_alloc_ns() might fail, so we should be returning an error code.
      Signed-off-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ab4ab09c
    • B
      nvme: avoid that deleting a controller triggers a circular locking complaint · b9c77583
      Bart Van Assche 提交于
      Rework nvme_delete_ctrl_sync() such that it does not have to wait for
      queued work. This patch avoids that test nvme/008 triggers the following
      complaint:
      
      WARNING: possible circular locking dependency detected
      5.0.0-rc6-dbg+ #10 Not tainted
      ------------------------------------------------------
      nvme/7918 is trying to acquire lock:
      000000009a1a7b69 ((work_completion)(&ctrl->delete_work)){+.+.}, at: __flush_work+0x379/0x410
      
      but task is already holding lock:
      00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (kn->count#389){++++}:
             lock_acquire+0xc5/0x1e0
             __kernfs_remove+0x42a/0x4a0
             kernfs_remove_by_name_ns+0x45/0x90
             remove_files.isra.1+0x3a/0x90
             sysfs_remove_group+0x5c/0xc0
             sysfs_remove_groups+0x39/0x60
             device_remove_attrs+0x68/0xb0
             device_del+0x24d/0x570
             cdev_device_del+0x1a/0x50
             nvme_delete_ctrl_work+0xbd/0xe0
             process_one_work+0x4f1/0xa40
             worker_thread+0x67/0x5b0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #0 ((work_completion)(&ctrl->delete_work)){+.+.}:
             __lock_acquire+0x1323/0x17b0
             lock_acquire+0xc5/0x1e0
             __flush_work+0x399/0x410
             flush_work+0x10/0x20
             nvme_delete_ctrl_sync+0x65/0x70
             nvme_sysfs_delete+0x4f/0x60
             dev_attr_store+0x3e/0x50
             sysfs_kf_write+0x87/0xa0
             kernfs_fop_write+0x186/0x240
             __vfs_write+0xd7/0x430
             vfs_write+0xfa/0x260
             ksys_write+0xab/0x130
             __x64_sys_write+0x43/0x50
             do_syscall_64+0x71/0x210
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(kn->count#389);
                                     lock((work_completion)(&ctrl->delete_work));
                                     lock(kn->count#389);
        lock((work_completion)(&ctrl->delete_work));
      
       *** DEADLOCK ***
      
      3 locks held by nvme/7918:
       #0: 00000000e2223b44 (sb_writers#6){.+.+}, at: vfs_write+0x1eb/0x260
       #1: 000000003404976f (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240
       #2: 00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210
      
      stack backtrace:
      CPU: 4 PID: 7918 Comm: nvme Not tainted 5.0.0-rc6-dbg+ #10
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       dump_stack+0x86/0xca
       print_circular_bug.isra.36.cold.54+0x173/0x1d5
       check_prev_add.constprop.45+0x996/0x1110
       __lock_acquire+0x1323/0x17b0
       lock_acquire+0xc5/0x1e0
       __flush_work+0x399/0x410
       flush_work+0x10/0x20
       nvme_delete_ctrl_sync+0x65/0x70
       nvme_sysfs_delete+0x4f/0x60
       dev_attr_store+0x3e/0x50
       sysfs_kf_write+0x87/0xa0
       kernfs_fop_write+0x186/0x240
       __vfs_write+0xd7/0x430
       vfs_write+0xfa/0x260
       ksys_write+0xab/0x130
       __x64_sys_write+0x43/0x50
       do_syscall_64+0x71/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b9c77583