1. 30 8月, 2019 4 次提交
  2. 23 7月, 2019 1 次提交
    • M
      nvme: fix multipath crash when ANA is deactivated · 66b20ac0
      Marta Rybczynska 提交于
      Fix a crash with multipath activated. It happends when ANA log
      page is larger than MDTS and because of that ANA is disabled.
      The driver then tries to access unallocated buffer when connecting
      to a nvme target. The signature is as follows:
      
      [  300.433586] nvme nvme0: ANA log page size (8208) larger than MDTS (8192).
      [  300.435387] nvme nvme0: disabling ANA support.
      [  300.437835] nvme nvme0: creating 4 I/O queues.
      [  300.459132] nvme nvme0: new ctrl: NQN "nqn.0.0.0", addr 10.91.0.1:8009
      [  300.464609] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [  300.466342] #PF error: [normal kernel read fault]
      [  300.467385] PGD 0 P4D 0
      [  300.467987] Oops: 0000 [#1] SMP PTI
      [  300.468787] CPU: 3 PID: 50 Comm: kworker/u8:1 Not tainted 5.0.20kalray+ #4
      [  300.470264] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  300.471532] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [  300.472724] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
      [  300.474038] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
      [  300.477374] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
      [  300.478334] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
      [  300.479784] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
      [  300.481488] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
      [  300.483203] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
      [  300.484928] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
      [  300.486626] FS:  0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
      [  300.488538] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  300.489907] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
      [  300.491612] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  300.493303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  300.494991] Call Trace:
      [  300.495645]  nvme_mpath_add_disk+0x5c/0xb0 [nvme_core]
      [  300.496880]  nvme_validate_ns+0x2ef/0x550 [nvme_core]
      [  300.498105]  ? nvme_identify_ctrl.isra.45+0x6a/0xb0 [nvme_core]
      [  300.499539]  nvme_scan_work+0x2b4/0x370 [nvme_core]
      [  300.500717]  ? __switch_to_asm+0x35/0x70
      [  300.501663]  process_one_work+0x171/0x380
      [  300.502340]  worker_thread+0x49/0x3f0
      [  300.503079]  kthread+0xf8/0x130
      [  300.503795]  ? max_active_store+0x80/0x80
      [  300.504690]  ? kthread_bind+0x10/0x10
      [  300.505502]  ret_from_fork+0x35/0x40
      [  300.506280] Modules linked in: nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm ib_core nvme_fabrics nvme_core xt_physdev ip6table_raw ip6table_mangle ip6table_filter ip6_tables xt_comment iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle iptable_filter veth ebtable_filter ebtable_nat ebtables iptable_raw vxlan ip6_udp_tunnel udp_tunnel sunrpc joydev pcspkr virtio_balloon br_netfilter bridge stp llc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console net_failover virtio_blk failover ata_piix serio_raw libata virtio_pci virtio_ring virtio
      [  300.514984] CR2: 0000000000000008
      [  300.515569] ---[ end trace faa2eefad7e7f218 ]---
      [  300.516354] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
      [  300.517330] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
      [  300.520353] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
      [  300.521229] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
      [  300.522399] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
      [  300.523560] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
      [  300.524734] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
      [  300.525915] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
      [  300.527084] FS:  0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
      [  300.528396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  300.529440] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
      [  300.530739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  300.531989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  300.533264] Kernel panic - not syncing: Fatal exception
      [  300.534338] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [  300.536227] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Condition check refactoring from Christoph Hellwig.
      Signed-off-by: NMarta Rybczynska <marta.rybczynska@kalray.eu>
      Tested-by: NJean-Baptiste Riaux <jbriaux@kalray.eu>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      66b20ac0
  3. 10 7月, 2019 1 次提交
    • B
      nvme: set physical block size and optimal I/O size · 81adb863
      Bart Van Assche 提交于
      >From the NVMe 1.4 spec:
      
      NSFEAT bit 4 if set to 1: indicates that the fields NPWG, NPWA, NPDG, NPDA,
      and NOWS are defined for this namespace and should be used by the host for
      I/O optimization;
      [ ... ]
      Namespace Preferred Write Granularity (NPWG): This field indicates the
      smallest recommended write granularity in logical blocks for this namespace.
      This is a 0's based value. The size indicated should be less than or equal
      to Maximum Data Transfer Size (MDTS) that is specified in units of minimum
      memory page size. The value of this field may change if the namespace is
      reformatted. The size should be a multiple of Namespace Preferred Write
      Alignment (NPWA). Refer to section 8.25 for how this field is utilized to
      improve performance and endurance.
      [ ... ]
      Each Write, Write Uncorrectable, or Write Zeroes commands should address a
      multiple of Namespace Preferred Write Granularity (NPWG) (refer to Figure
      245) and Stream Write Size (SWS) (refer to Figure 515) logical blocks (as
      expressed in the NLB field), and the SLBA field of the command should be
      aligned to Namespace Preferred Write Alignment (NPWA) (refer to Figure 245)
      for best performance.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      81adb863
  4. 21 6月, 2019 3 次提交
  5. 18 5月, 2019 1 次提交
  6. 01 5月, 2019 1 次提交
  7. 14 3月, 2019 1 次提交
  8. 20 2月, 2019 3 次提交
  9. 06 2月, 2019 1 次提交
  10. 04 2月, 2019 1 次提交
  11. 10 1月, 2019 1 次提交
  12. 19 12月, 2018 1 次提交
  13. 13 12月, 2018 2 次提交
  14. 12 12月, 2018 1 次提交
  15. 08 12月, 2018 5 次提交
  16. 01 12月, 2018 1 次提交
  17. 09 11月, 2018 1 次提交
  18. 18 10月, 2018 1 次提交
  19. 02 10月, 2018 1 次提交
  20. 28 9月, 2018 1 次提交
  21. 30 7月, 2018 1 次提交
  22. 28 7月, 2018 3 次提交
  23. 23 7月, 2018 2 次提交
  24. 22 6月, 2018 1 次提交
    • J
      nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e
      Jens Axboe 提交于
      nvme requires an sg table allocation for each request. If the request
      is large, then the allocation can become quite large. For instance,
      with our default software settings of 1280KB IO size, we'll need
      10248 bytes of sg table. That turns into a 2nd order allocation,
      which we can't always guarantee. If we fail the allocation, blk-mq
      will retry it later. But there's no guarantee that we'll EVER be
      able to allocate that much contigious memory.
      
      Limit the IO size such that we never need more than a single page
      of memory. That's a lot faster and more reliable. Then back that
      allocation with a mempool, so that we know we'll always be able
      to succeed the allocation at some point.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      943e942e
  25. 14 6月, 2018 1 次提交