提交 · 47fcc0360cfb3fe82e4daddacad3c1cd80b0b75d · openeuler / raspberrypi-kernel

26 1月, 2018 3 次提交

nvme: add tracepoint for nvme_complete_rq · ca5554a6

由 Johannes Thumshirn 提交于 1月 26, 2018

Add a tracepoint in nvme_complete_rq() for completions of NVMe commands. An
expmale output of the trace-point is as follows:

<idle>-0     [001] d.h.     3.505266: nvme_complete_rq: cmdid=989, qid=1, res=0, retries=0, flags=0x0, status=0
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ca5554a6

nvme: add tracepoint for nvme_setup_cmd · 3d030e41

由 Johannes Thumshirn 提交于 1月 26, 2018

Add tracepoints for nvme_setup_cmd() for tracing admin and/or nvm commands.

Examples of the two tracepoints are as follows for trace_nvme_setup_admin_cmd():
kworker/u8:0-5 [003] .... 2.998792: nvme_setup_admin_cmd: cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=1, qsize=1023, cq_flags=0x3, irq_vector=0)

and trace_nvme_setup_nvm_cmd():
dd-205 [001] .... 3.503929: nvme_setup_nvm_cmd: qid=1, nsid=1, cmdid=989, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=4096, len=2047, ctrl=0x0, dsmgmt=0, reftag=0)
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3d030e41

nvme-pci: introduce RECONNECTING state to mark initializing procedure · ad70062c

由 Jianchao Wang 提交于 1月 22, 2018

After Sagi's commit (nvme-rdma: fix concurrent reset and reconnect),
both nvme-fc/rdma have following pattern:
RESETTING    - quiesce blk-mq queues, teardown and delete queues/
               connections, clear out outstanding IO requests...
RECONNECTING - establish new queues/connections and some other
               initializing things.
Introduce RECONNECTING to nvme-pci transport to do the same mark.
Then we get a coherent state definition among nvme pci/rdma/fc
transports.
Suggested-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NReviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ad70062c

16 1月, 2018 1 次提交

nvme: host delete_work and reset_work on separate workqueues · b227c59b

由 Roy Shterman 提交于 1月 14, 2018

We need to ensure that delete_work will be hosted on a different
workqueue than all the works we flush or cancel from it.
Otherwise we may hit a circular dependency warning [1].

Also, given that delete_work flushes reset_work, host reset_work
on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
fix the flushing in the individual drivers to flush nvme_delete_wq
when draining queued deletes.

[1]:
[  178.491942] =============================================
[  178.492718] [ INFO: possible recursive locking detected ]
[  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
[  178.494382] ---------------------------------------------
[  178.495160] kworker/5:1/135 is trying to acquire lock:
[  178.495894]  (
[  178.496120] "nvme-wq"
[  178.496471] ){++++.+}
[  178.496599] , at:
[  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
[  178.497670]
               but task is already holding lock:
[  178.498499]  (
[  178.498724] "nvme-wq"
[  178.499074] ){++++.+}
[  178.499202] , at:
[  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.500343]
               other info that might help us debug this:
[  178.501269]  Possible unsafe locking scenario:

[  178.502113]        CPU0
[  178.502472]        ----
[  178.502829]   lock(
[  178.503115] "nvme-wq"
[  178.503467] );
[  178.503716]   lock(
[  178.504001] "nvme-wq"
[  178.504353] );
[  178.504601]
                *** DEADLOCK ***

[  178.505441]  May be due to missing lock nesting notation

[  178.506453] 2 locks held by kworker/5:1/135:
[  178.507068]  #0:
[  178.507330]  (
[  178.507598] "nvme-wq"
[  178.507726] ){++++.+}
[  178.508079] , at:
[  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.509004]  #1:
[  178.509265]  (
[  178.509532] (&ctrl->delete_work)
[  178.509795] ){+.+.+.}
[  178.510145] , at:
[  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.511070]
               stack backtrace:
:
[  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
[  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
[  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
[  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
[  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
[  178.518443] Call Trace:
[  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
[  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
[  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
[  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
[  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
[  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
[  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
[  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
[  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
[  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
[  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
[  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
[  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
[  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
[  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
[  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
[  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
[  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
[  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40
Signed-off-by: NRoy Shterman <roys@lightbitslabs.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b227c59b

15 1月, 2018 1 次提交

nvme-pci: serialize pci resets · 79c48ccf

由 Sagi Grimberg 提交于 1月 14, 2018

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

79c48ccf

11 1月, 2018 2 次提交

nvme/multipath: Consult blk_status_t for failover · 908e4564

由 Keith Busch 提交于 1月 09, 2018

This removes nvme multipath's specific status decoding to see if failover
is needed, using the generic blk_status_t that was decoded earlier. This
abstraction from the raw NVMe status means all status decoding exists
in one place.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

908e4564

nvme: Add more command status translation · e96fef2c

由 Keith Busch 提交于 1月 09, 2018

This adds more NVMe status code translations to blk_status_t values,
and captures all the current status codes NVMe multipath uses.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e96fef2c

09 1月, 2018 1 次提交

treewide: Use DEVICE_ATTR_RO · c828a892

由 Joe Perches 提交于 12月 19, 2017

Convert DEVICE_ATTR uses to DEVICE_ATTR_RO where possible.

Done with perl script:

$ git grep -w --name-only DEVICE_ATTR | \
  xargs perl -i -e 'local $/; while (<>) { s/\bDEVICE_ATTR\s*\(\s*(\w+)\s*,\s*\(?(?:\s*S_IRUGO\s*|\s*0444\s*)\)?\s*,\s*\1_show\s*,\s*NULL\s*\)/DEVICE_ATTR_RO(\1)/g; print;}'
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NRobert Jarzmik <robert.jarzmik@free.fr>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NZhang Rui <rui.zhang@intel.com>
Acked-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Acked-by: NJani Nikula <jani.nikula@intel.com>
Acked-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c828a892

08 1月, 2018 4 次提交

nvme: fix subsystem multiple controllers support check · b837b283

由 Israel Rukshin 提交于 1月 04, 2018

There is a problem when another module (e.g. nvmet) takes a reference on
the nvme block device and the physical nvme drive is removed. In that
case nvme_free_ctrl() will not be called and the controller state will be
"deleting" or "dead" unless nvmet module releases the block device.
Later on, the same nvme drive probes back and nvme_init_subsystem() will
be called and fail due to duplicate subnqn (if the nvme device doesn't
support subsystem with multiple controllers). This will cause a probe
failure. This commit changes the check of multiple controllers support
at nvme_init_subsystem() by not counting all the controllers at "dead" or
"deleting" state (this is safe because controllers at this state will
never be active again).

Fixes: ab9e00cc ("nvme: track subsystems")
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b837b283

nvme: take refcount on transport module · 85088c4a

由 Nitzan Carmi 提交于 1月 04, 2018

The block device is backed by the transport so we must ensure that the
transport driver will not be removed until all references are released.
Otherwise, we might end up referencing freed memory.
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

85088c4a

nvme-pci: fix NULL pointer reference in nvme_alloc_ns · 2b1b7e78

由 Jianchao Wang 提交于 1月 06, 2018

When the io queues setup or tagset allocation failed, ctrl.tagset is
NULL.  But the scan work will still be queued and executed, then panic
comes up due to NULL pointer reference of ctrl.tagset.

To fix this, add a new ctrl state NVME_CTRL_ADMIN_ONLY to inidcate only
admin queue is live. When non io queues or tagset allocation failed, ctrl
enters into this state, scan work will not be started.  But async event
work and nvme dev ioctl will be still available.  This will be helpful to
do further investigation and recovery.
Suggested-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2b1b7e78

nvme: modify the debug level for setting shutdown timeout · 1a3838d7

由 Max Gurtovoy 提交于 12月 31, 2017

When an NVMe controller reports RTD3 Entry Latency larger than the value
of shutdown_timeout module parameter, we update the shutdown_timeout
accordingly to honor RTD3 Entry Latency. Use an informational debug level
instead of a warning level for it.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1a3838d7

29 12月, 2017 2 次提交

nvme-mpath: fix last path removal during traffic · 479a322f

由 Sagi Grimberg 提交于 12月 21, 2017

In case our last path is removed during traffic, we can end up requeueing
the bio(s) but never schedule the actual requeue work as upper layers
still have open handles on the mpath device node.

Fix this by scheduling requeue work if the namespace being removed is
the last path in the ns_head path list.

Fixes: 32acab31 ("nvme: implement multipath access to nvme subsystems")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

479a322f

nvme: fix sector units when going between formats · cee160fd

由 Jeff Lien 提交于 12月 19, 2017

If you format a device with a 4k sector size back to 512 bytes, the queue
limit values for physical block size and minimum IO size were not getting
updated; only the logical block size was being updated.  This patch adds
code to update the physical block and IO minimum sizes.
Signed-off-by: NJeff Lien <jeff.lien@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cee160fd

15 12月, 2017 4 次提交

nvme: setup streams after initializing namespace head · 654b4a4a

由 Keith Busch 提交于 12月 14, 2017

Fixes a NULL pointer dereference.
Reported-by: NArnav Dawn <a.dawn@samsung.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

654b4a4a

nvme: check hw sectors before setting chunk sectors · 249159c5

由 Keith Busch 提交于 12月 14, 2017

Some devices with IDs matching the "stripe" quirk don't actually have
this quirk, and don't have an MDTS value. When MDTS is not set, the
driver sets the max sectors to UINT_MAX, which is not a power of 2,
hitting a BUG_ON from blk_queue_chunk_sectors. This patch skips setting
chunk sectors for such devices.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

249159c5

nvme: call blk_integrity_unregister after queue is cleaned up · bd9f5d65

由 Ming Lei 提交于 12月 06, 2017

During IO complete path, bio_integrity_advance() is often called, and
blk_get_integrity() is called in this function. But in
blk_integrity_unregister, the buffer pointed by queue->integrity
is cleared, and blk_integrity->profile becomes NULL, then blk_get_integrity
returns NULL, and causes kernel oops[1] finally.

This patch fixes this issue by calling blk_integrity_unregister() after
blk_cleanup_queue().

[1] kernel oops log
[  122.068007] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[  122.076760] IP: bio_integrity_advance+0x3d/0xf0
[  122.081815] PGD 0 P4D 0
[  122.084641] Oops: 0000 [#1] SMP
[  122.088142] Modules linked in: sunrpc ipmi_ssif intel_rapl vfat fat x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mei_me ipmi_si crct10dif_pclmul crc32_pclmul sg mei ghash_clmulni_intel mxm_wmi ipmi_devintf iTCO_wdt intel_cstate intel_uncore pcspkr intel_rapl_perf iTCO_vendor_support dcdbas ipmi_msghandler lpc_ich acpi_power_meter shpchp wmi dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel ahci nvme tg3 libahci nvme_core i2c_core libata ptp megaraid_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
[  122.149577] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-11.el7a.x86_64 #1
[  122.157635] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[  122.166179] task: ffff8802ff1e8000 task.stack: ffffc90000130000
[  122.172785] RIP: 0010:bio_integrity_advance+0x3d/0xf0
[  122.178419] RSP: 0018:ffff88047fc03d70 EFLAGS: 00010006
[  122.184248] RAX: ffff880473b08000 RBX: ffff880458c71a80 RCX: ffff880473b08248
[  122.192209] RDX: 0000000000000000 RSI: 000000000000003c RDI: ffffc900038d7ba0
[  122.200171] RBP: ffff88047fc03d78 R08: 0000000000000001 R09: ffffffffa01a78b5
[  122.208132] R10: ffff88047fc1eda0 R11: ffff880458c71ad0 R12: 0000000000007800
[  122.216094] R13: 0000000000000000 R14: 0000000000007800 R15: ffff880473a39b40
[  122.224056] FS:  0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
[  122.233083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  122.239494] CR2: 000000000000000a CR3: 0000000001c09002 CR4: 00000000001606e0
[  122.247455] Call Trace:
[  122.250183]  <IRQ>
[  122.252429]  bio_advance+0x28/0xf0
[  122.256217]  blk_update_request+0xa1/0x310
[  122.260778]  blk_mq_end_request+0x1e/0x70
[  122.265256]  nvme_complete_rq+0x1c/0xd0 [nvme_core]
[  122.270699]  nvme_pci_complete_rq+0x85/0x130 [nvme]
[  122.276140]  __blk_mq_complete_request+0x8d/0x140
[  122.281387]  blk_mq_complete_request+0x16/0x20
[  122.286345]  nvme_process_cq+0xdd/0x1c0 [nvme]
[  122.291301]  nvme_irq+0x23/0x50 [nvme]
[  122.295485]  __handle_irq_event_percpu+0x3c/0x190
[  122.300725]  handle_irq_event_percpu+0x32/0x80
[  122.305683]  handle_irq_event+0x3b/0x60
[  122.309964]  handle_edge_irq+0x8f/0x190
[  122.314247]  handle_irq+0xab/0x120
[  122.318043]  do_IRQ+0x48/0xd0
[  122.321355]  common_interrupt+0x9d/0x9d
[  122.325625]  </IRQ>
[  122.327967] RIP: 0010:cpuidle_enter_state+0xe9/0x280
[  122.333504] RSP: 0018:ffffc90000133e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff35
[  122.341952] RAX: ffff88047fc1b900 RBX: ffff88047fc24400 RCX: 000000000000001f
[  122.349913] RDX: 0000000000000000 RSI: fffffcf2e6007295 RDI: 0000000000000000
[  122.357874] RBP: ffffc90000133ea0 R08: 000000000000062e R09: 0000000000000253
[  122.365836] R10: 0000000000000225 R11: 0000000000000018 R12: 0000000000000002
[  122.373797] R13: 0000000000000001 R14: ffff88047fc24400 R15: 0000001c6bd1d263
[  122.381762]  ? cpuidle_enter_state+0xc5/0x280
[  122.386623]  cpuidle_enter+0x17/0x20
[  122.390611]  call_cpuidle+0x23/0x40
[  122.394501]  do_idle+0x17e/0x1f0
[  122.398101]  cpu_startup_entry+0x73/0x80
[  122.402478]  start_secondary+0x178/0x1c0
[  122.406854]  secondary_startup_64+0xa5/0xa5
[  122.411520] Code: 48 8b 5f 68 48 8b 47 08 31 d2 4c 8b 5b 48 48 8b 80 d0 03 00 00 48 83 b8 48 02 00 00 00 48 8d 88 48 02 00 00 48 0f 45 d1 c1 ee 09 <0f> b6 4a 0a 0f b6 52 09 89 f0 48 01 73 08 83 e9 09 d3 e8 0f af
[  122.432604] RIP: bio_integrity_advance+0x3d/0xf0 RSP: ffff88047fc03d70
[  122.439888] CR2: 000000000000000a
Reported-by: NZhang Yi <yizhan@redhat.com>
Tested-by: NZhang Yi <yizhan@redhat.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bd9f5d65

nvme: set discard_alignment to zero · b224f613

由 David Disseldorp 提交于 11月 24, 2017

Similar to 7c084289 ("rbd: set discard_alignment to zero"), NVMe
devices are currently incorrectly initialised with the block queue
discard_alignment set to the NVMe stream alignment.

As per Documentation/ABI/testing/sysfs-block:
  The discard_alignment parameter indicates how many bytes the beginning
  of the device is offset from the internal allocation unit's natural
  alignment.

Correcting the discard_alignment parameter to zero has no effect on how
discard requests are propagated through the block layer - @alignment in
__blkdev_issue_discard() remains zero. However, it does fix other
consumers, such as LIO's Block Limits VPD response.
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b224f613

20 11月, 2017 2 次提交

nvme: Suppress static analyis warning · 9941a862

由 Keith Busch 提交于 11月 16, 2017

The ns->head is always valid, so we don't need to check for NULL.
Reported-by: NDan Carpenter <dan.caprenter@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9941a862

nvme: Fix NULL dereference on reservation request · b0d61d58

由 Keith Busch 提交于 11月 16, 2017

This fixes using the NULL 'head' before getting the reference. It is
however possible the head will always be NULL, so this patch uses the
struct nvme_ns to get the ns_id field.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b0d61d58

12 11月, 2017 1 次提交

nvme: fix visibility of "uuid" ns attribute · a04b5de5

由 Martin Wilck 提交于 9月 28, 2017

"uuid" must be invisible if both ns->uuid and ns->nguid are unset,
not if either one is.

Fixes: d934f984 "nvme: provide UUID value to userspace"
Signed-off-by: NMartin Wilck <mwilck@suse.com>
[hch: rebased to the nvme-4.15 tree to help resolving a conflict]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a04b5de5

11 11月, 2017 19 次提交

nvme: expose subsys attribute to sysfs · 1e496938

由 Hannes Reinecke 提交于 11月 10, 2017

We should be exposing the subsystem attributes like 'model' and
'subsysnqn' to sysfs to allow for easier identification of the
subsystem.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1e496938

nvme: create 'slaves' and 'holders' entries for hidden controllers · e9a48034

由 Hannes Reinecke 提交于 11月 09, 2017

When creating nvme multipath devices we should populate the 'slaves' and
'holders' directorys properly to aid userspace topology detection.
Signed-off-by: NHannes Reinecke <hare@suse.com>
[hch: split from a larger patch, compile fix for CONFIG_NVME_MULTIPATH=n]
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9a48034

nvme: also expose the namespace identification sysfs files for mpath nodes · 5b85b826

由 Christoph Hellwig 提交于 11月 09, 2017

We do this by adding a helper that returns the ns_head for a device that
can belong to either the per-controller or per-subsystem block device
nodes, and otherwise reuse all the existing code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5b85b826

nvme: implement multipath access to nvme subsystems · 32acab31

由 Christoph Hellwig 提交于 11月 02, 2017

This patch adds native multipath support to the nvme driver.  For each
namespace we create only single block device node, which can be used
to access that namespace through any of the controllers that refer to it.
The gendisk for each controllers path to the name space still exists
inside the kernel, but is hidden from userspace.  The character device
nodes are still available on a per-controller basis.  A new link from
the sysfs directory for the subsystem allows to find all controllers
for a given subsystem.

Currently we will always send I/O to the first available path, this will
be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
ratified and implemented, at which point we will look at the ANA state
for each namespace.  Another possibility that was prototyped is to
use the path that is closes to the submitting NUMA code, which will be
mostly interesting for PCI, but might also be useful for RDMA or FC
transports in the future.  There is not plan to implement round robin
or I/O service time path selectors, as those are not scalable with
the performance rates provided by NVMe.

The multipath device will go away once all paths to it disappear,
any delay to keep it alive needs to be implemented at the controller
level.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

32acab31

nvme: track shared namespaces · ed754e5d

由 Christoph Hellwig 提交于 11月 09, 2017

Introduce a new struct nvme_ns_head that holds information about an actual
namespace, unlike struct nvme_ns, which only holds the per-controller
namespace information.  For private namespaces there is a 1:1 relation of
the two, but for shared namespaces this lets us discover all the paths to
it.  For now only the identifiers are moved to the new structure, but most
of the information in struct nvme_ns should eventually move over.

To allow lockless path lookup the list of nvme_ns structures per
nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns
structure through call_srcu.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed754e5d

nvme: introduce a nvme_ns_ids structure · 002fab04

由 Christoph Hellwig 提交于 11月 09, 2017

This allows us to manage the various uniqueue namespace identifiers
together instead needing various variables and arguments.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

002fab04

nvme: track subsystems · ab9e00cc

由 Christoph Hellwig 提交于 11月 09, 2017

This adds a new nvme_subsystem structure so that we can track multiple
controllers that belong to a single subsystem.  For now we only use it
to store the NQN, and to check that we don't have duplicate NQNs unless
the involved subsystems support multiple controllers.

Includes code originally from Hannes Reinecke to expose the subsystems
in sysfs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ab9e00cc

block, nvme: Introduce blk_mq_req_flags_t · 9a95e4ef

由 Bart Van Assche 提交于 11月 09, 2017

Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9a95e4ef

nvme: send uevent for some asynchronous events · e3d7874d

由 Keith Busch 提交于 11月 07, 2017

This will give udev a chance to observe and handle asynchronous event
notifications and clear the log to unmask future events of the same type.
The driver will create a change uevent of the asyncronuos event result
before submitting the next AEN request to the device if a completed AEN
event is of type error, smart, command set or vendor specific,
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e3d7874d

nvme: unexport starting async event work · d99ca609

由 Keith Busch 提交于 11月 07, 2017

Async event work is for core use only and should not be called directly
from drivers.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d99ca609

nvme: remove handling of multiple AEN requests · ad22c355

由 Keith Busch 提交于 11月 07, 2017

The driver can handle tracking only one AEN request, so this patch
removes handling for multiple ones.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart  <james.smart@broadcom.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad22c355

nvme: centralize AEN defines · 38dabe21

由 Keith Busch 提交于 11月 07, 2017

All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38dabe21

nvme: fix eui_show() print format · ab083b11

由 Javier González 提交于 11月 08, 2017

Fix print formatting, but keep the original output to prevent user
breakage as suggested by Joe Perches.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ab083b11

nvme: compare NQN string with right size · a47619b5

由 Javier González 提交于 11月 08, 2017

Copy subnqns using NVMF_NQN_SIZE as it is < 256
Signed-off-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a47619b5

nvme: check admin passthru command effects · 84fef62d

由 Keith Busch 提交于 11月 07, 2017

The NVMe standard provides a command effects log page so the host may
be aware of special requirements it may need to do for a particular
command. For example, the command may need to run with IO quiesced to
prevent timeouts or undefined behavior, or it may change the logical block
formats that determine how the host needs to construct future commands.

This patch saves the nvme command effects log page if the controller
supports it, and performs appropriate actions before and after an admin
passthrough command is completed. If the controller does not support the
command effects log page, the driver will define the effects for known
opcodes. The nvme format and santize are the only commands in this patch
with known effects.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84fef62d

nvme: factor get log into a helper · c627c487

由 Keith Busch 提交于 11月 07, 2017

And fix the warning on a successful firmware log.
Reviewed-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c627c487

nvme: fix and clarify the check for missing metadata · 715ea9e0

由 Christoph Hellwig 提交于 11月 07, 2017

Update the check in nvme_setup_rw for missing metadata so that it is
together with the other metadata handling, does not contain impossible
to reach conditions and warns if we get an impossible requests for
a (non-PI) metadata-enabled namespace when CONFIG_BLK_DEV_INTEGRITY
is not set.

Also add a little helper that checks if a given metadata configuration
contains protection information
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NJavier González <jg@lightnvm.io>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

715ea9e0

nvme: split __nvme_revalidate_disk · 24b0b58c

由 Christoph Hellwig 提交于 11月 02, 2017

Split out the code that applies the calculate value to a given disk/queue
into new helper that can be reused by the multipath code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

24b0b58c

nvme: set the chunk size before freezing the queue · 6e78f21a

由 Christoph Hellwig 提交于 11月 02, 2017

We don't need a frozen queue to update the chunk_size, which just is a
hint, and moving it a little earlier will allow for some better code
reuse with the multipath code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6e78f21a