提交 · c80b36cd9576efa861a080b05382856173a02ae9 · openeuler / Kernel

27 11月, 2019 6 次提交

nvme: else following return is not needed · c80b36cd

由 Edmund Nadolski 提交于 11月 25, 2019

Remove unnecessary keyword in nvme_create_queue().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NEdmund Nadolski <edmund.nadolski@intel.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

c80b36cd

nvme: add error message on mismatching controller ids · a8157ff3

由 James Smart 提交于 11月 21, 2019

We've seen a few devices that return different controller id's to
the Fabric Connect command vs the Identify(controller) command. It's
currently hard to identify this failure by existing error messages. It
comes across as a (re)connect attempt in the transport that fails with
a -22 (-EINVAL) status. The issue is compounded by older kernels not
having the controller id check or had the identify command overwrite the
fabrics controller id value before it checked. Both resulted in cases
where the devices appeared fine until more recent kernels.

Clarify the reject by adding an error message on controller id mismatches.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

a8157ff3

nvme_fc: add module to ops template to allow module references · 863fbae9

由 James Smart 提交于 11月 14, 2019

In nvme-fc: it's possible to have connected active controllers
and as no references are taken on the LLDD, the LLDD can be
unloaded.  The controller would enter a reconnect state and as
long as the LLDD resumed within the reconnect timeout, the
controller would resume.  But if a namespace on the controller
is the root device, allowing the driver to unload can be problematic.
To reload the driver, it may require new io to the boot device,
and as it's no longer connected we get into a catch-22 that
eventually fails, and the system locks up.

Fix this issue by taking a module reference for every connected
controller (which is what the core layer did to the transport
module). Reference is cleared when the controller is removed.
Acked-by: NHimanshu Madhani <hmadhani@marvell.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

863fbae9

nvmet-loop: Avoid preallocating big SGL for data · 52e6d8ed

由 Israel Rukshin 提交于 11月 24, 2019

nvme_loop_create_io_queues() preallocates a big buffer for the IO SGL based
on SG_CHUNK_SIZE.

Modern DMA engines are often capable of dealing with very big segments so
the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
SGL allocation per command.

If a controller has lots of deep queues, preallocation for the sg list can
consume substantial amounts of memory. For nvmet-loop, nr_hw_queues can be
128 and each queue's depth 128. This means the resulting preallocation
for the data SGL is 128*128*4K = 64MB per controller.

Switch to runtime allocation for SGL for lists longer than 2 entries. This
is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
well. Runtime SGL allocation has always been the case for the legacy I/O
path so this is nothing new.
Tested-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

52e6d8ed

nvme-fc: Avoid preallocating big SGL for data · b1ae1a23

由 Israel Rukshin 提交于 11月 24, 2019

nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based
on SG_CHUNK_SIZE.

Modern DMA engines are often capable of dealing with very big segments so
the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
SGL allocation per command.

If a controller has lots of deep queues, preallocation for the sg list can
consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be
128 and each queue's depth 128. This means the resulting preallocation
for the data SGL is 128*128*4K = 64MB per controller.

Switch to runtime allocation for SGL for lists longer than 2 entries. This
is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
well. Runtime SGL allocation has always been the case for the legacy I/O
path so this is nothing new.
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

b1ae1a23

nvme-rdma: Avoid preallocating big SGL for data · 38e18002

由 Israel Rukshin 提交于 11月 24, 2019

nvme_rdma_alloc_tagset() preallocates a big buffer for the IO SGL based
on SG_CHUNK_SIZE.

Modern DMA engines are often capable of dealing with very big segments so
the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
SGL allocation per command.

If a controller has lots of deep queues, preallocation for the sg list can
consume substantial amounts of memory. For nvme-rdma, nr_hw_queues can be
128 and each queue's depth 128. This means the resulting preallocation
for the data SGL is 128*128*4K = 64MB per controller.

Switch to runtime allocation for SGL for lists longer than 2 entries. This
is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
well. Runtime SGL allocation has always been the case for the legacy I/O
path so this is nothing new.

The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
support SG_CHAIN, use only runtime allocation for the SGL.

We didn't notice of a performance degradation, since for small IOs we'll
use the inline SG and for the bigger IOs the allocation of a bigger SGL
from slab is fast enough.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

38e18002

22 11月, 2019 2 次提交

nvme: hwmon: add quirk to avoid changing temperature threshold · 6c6aa2f2

由 Akinobu Mita 提交于 11月 15, 2019

This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing
the value of the temperature threshold feature for specific devices that
show undesirable behavior.

Guenter reported:

"On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the
Composite temperature sensor results in a temperature warning, and that
warning is sticky until I reset the controller.

It doesn't seem to matter which temperature I write; writing -273000 has
the same result."

The Intel NVMe has the latest firmware version installed, so this isn't
a problem that was ever fixed.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

6c6aa2f2

nvme: hwmon: provide temperature min and max values for each sensor · 52deba0f

由 Akinobu Mita 提交于 11月 15, 2019

According to the NVMe specification, the over temperature threshold and
under temperature threshold features shall be implemented for Composite
Temperature if a non-zero WCTEMP field value is reported in the Identify
Controller data structure.  The features are also implemented for all
implemented temperature sensors (i.e., all Temperature Sensor fields that
report a non-zero value).

This provides the over temperature threshold and under temperature
threshold for each sensor as temperature min and max values of hwmon
sysfs attributes.

The WCTEMP is already provided as a temperature max value for Composite
Temperature, but this change isn't incompatible.  Because the default
value of the over temperature threshold for Composite Temperature is
the WCTEMP.

Now the alarm attribute for Composite Temperature indicates one of the
temperature is outside of a temperature threshold.  Because there is only
a single bit in Critical Warning field that indicates a temperature is
outside of a threshold.

Example output from the "sensors" command:

nvme-pci-0100
Adapter: PCI adapter
Composite:    +33.9°C  (low  = -273.1°C, high = +69.8°C)
                       (crit = +79.8°C)
Sensor 1:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +31.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 5:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)

This also adds helper macros for kelvin from/to milli Celsius conversion,
and replaces the repeated code in hwmon.c.

Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

52deba0f

13 11月, 2019 1 次提交

nvme: Discard workaround for non-conformant devices · 530436c4

由 Eduard Hasenleithner 提交于 11月 12, 2019

Users observe IOMMU related errors when performing discard on nvme from
non-compliant nvme devices reading beyond the end of the DMA mapped
ranges to discard.

Two different variants of this behavior have been observed: SM22XX
controllers round up the read size to a multiple of 512 bytes, and Phison
E12 unconditionally reads the maximum discard size allowed by the spec
(256 segments or 4kB).

Make nvme_setup_discard unconditionally allocate the maximum DSM buffer
so the driver DMA maps a memory range that will always succeed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202665 many
Signed-off-by: NEduard Hasenleithner <eduard@hasenleithner.at>
[changelog, use existing define, kernel coding style]
Signed-off-by: NKeith Busch <kbusch@kernel.org>

530436c4

12 11月, 2019 1 次提交

nvme: Add hardware monitoring support · 400b6a7b

由 Guenter Roeck 提交于 11月 06, 2019

nvme devices report temperature information in the controller information
(for limits) and in the smart log. Currently, the only means to retrieve
this information is the nvme command line interface, which requires
super-user privileges.

At the same time, it would be desirable to be able to use NVMe temperature
information for thermal control.

This patch adds support to read NVMe temperatures from the kernel using the
hwmon API and adds temperature zones for NVMe drives. The thermal subsystem
can use this information to set thermal policies, and userspace can access
it using libsensors and/or the "sensors" command.

Example output from the "sensors" command:

nvme0-pci-0100
Adapter: PCI adapter
Composite:    +39.0°C  (high = +85.0°C, crit = +85.0°C)
Sensor 1:     +39.0°C
Sensor 2:     +41.0°C
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

400b6a7b

05 11月, 2019 27 次提交

nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths · 763303a8

由 Anton Eidelman 提交于 11月 01, 2019

nvme_mpath_clear_ctrl_paths() iterates through
the ctrl->namespaces list while holding ctrl->scan_lock.
This does not seem to be the correct way of protecting
from concurrent list modification.

Specifically, nvme_scan_work() sorts ctrl->namespaces
AFTER unlocking scan_lock.

This may result in the following (rare) crash in ctrl disconnect
during scan_work:

    BUG: kernel NULL pointer dereference, address: 0000000000000050
    Oops: 0000 [#1] SMP PTI
    CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
    RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
    ...
    Call Trace:
     nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
     nvme_remove_namespaces+0x35/0xe0 [nvme_core]
     nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
     nvme_sysfs_delete+0x49/0x60 [nvme_core]
     dev_attr_store+0x17/0x30
     sysfs_kf_write+0x3e/0x50
     kernfs_fop_write+0x11e/0x1a0
     __vfs_write+0x1b/0x40
     vfs_write+0xb9/0x1a0
     ksys_write+0x67/0xe0
     __x64_sys_write+0x1a/0x20
     do_syscall_64+0x5a/0x130
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f8d02bfb154

Fix:
After taking scan_lock in nvme_mpath_clear_ctrl_paths()
down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
This will not cause deadlocks because taking scan_lock never happens
while holding the namespaces_rwsem.
Moreover, scan work downs namespaces_rwsem in the same order.

Alternative: sort ctrl->namespaces in nvme_scan_work()
while still holding the scan_lock.
This would leave nvme_mpath_clear_ctrl_paths() without correct protection
against ctrl->namespaces modification by anyone other than scan_work.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

763303a8

nvme-rdma: fix a segmentation fault during module unload · 9ad9e8d6

由 Max Gurtovoy 提交于 10月 29, 2019

In case there are controllers that are not associated with any RDMA
device (e.g. during unsuccessful reconnection) and the user will unload
the module, these controllers will not be freed and will access already
freed memory. The same logic appears in other fabric drivers as well.

Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

9ad9e8d6

nvme: Fix parsing of ANA log page · 64fab729

由 Prabhath Sajeepa 提交于 10月 28, 2019

Check validity of offset into ANA log buffer before accessing
nvme_ana_group_desc. This check ensures the size of ANA log buffer >=
offset + sizeof(nvme_ana_group_desc)
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NPrabhath Sajeepa <psajeepa@purestorage.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

64fab729

nvmet: stop using bio_set_op_attrs · 716fd9c1

由 Christoph Hellwig 提交于 10月 29, 2019

bio_set_op_attrs has been long deprecated, replace it with a direct
assignment of the flags to bio->bi_opf.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

716fd9c1

nvmet: add plugging for read/write when ns is bdev · 9dea0c81

由 Christoph Hellwig 提交于 10月 28, 2019

With reference to the following issue reported on the mailing list :-
http://lists.infradead.org/pipermail/linux-nvme/2019-October/027604.html
this patch adds plugging for the bdev-ns under nvmet_bdev_execute_rw().

We can see the following performance improvement in random write
workload I/Os with the setup described in the link when device_path
configured as /dev/md0.

Without this patch :-

  write: IOPS=40.8k, BW=159MiB/s (167MB/s)(4777MiB/30002msec)
  write: IOPS=41.2k, BW=161MiB/s (169MB/s)(4831MiB/30011msec)
    slat (usec): min=8,  max=10823, avg=15.64,  stdev=16.85
    slat (usec): min=8,  max=401,   avg=15.40,  stdev= 9.56
    clat (usec): min=54, max=2492,  avg=759.07, stdev=172.62
    clat (usec): min=56, max=1997,  avg=768.06, stdev=178.72

With this patch :-

  write: IOPS=123k, BW=480MiB/s (504MB/s)(14.1GiB/30011msec)
  write: IOPS=123k, BW=481MiB/s (504MB/s)(14.1GiB/30002msec)
    slat (usec): min=8,  max=9941,  avg=13.31,  stdev= 8.04
    slat (usec): min=8,  max=289,   avg=13.31,  stdev= 3.37
    clat (usec): min=43, max=17635, avg=245.46, stdev=171.23
    clat (usec): min=44, max=17751, avg=245.25, stdev=183.14
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9dea0c81

nvmet: clean up command parsing a bit · d84dd8cd

由 Christoph Hellwig 提交于 10月 25, 2019

Move the special cases for fabrics commands and the discovery controller
to nvmet_parse_admin_cmd in preparation for adding passthrough support.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d84dd8cd

nvme-pci: Spelling s/resdicovered/rediscovered/ · 05d3046f

由 Geert Uytterhoeven 提交于 10月 24, 2019

Fix misspelling of "rediscovered".
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

05d3046f

nvmet: fill discovery controller sn, fr and mn correctly · d4b3a174

由 Sagi Grimberg 提交于 10月 24, 2019

Discovery controllers need this information as well.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4b3a174

nvmet: Open code nvmet_req_execute() · be3f3114

由 Christoph Hellwig 提交于 10月 23, 2019

Now that nvmet_req_execute does nothing, open code it.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be3f3114

nvmet: Remove the data_len field from the nvmet_req struct · e9061c39

由 Christoph Hellwig 提交于 10月 23, 2019

Instead of storing the expected length and checking it when it's
executed, just check the length inside the command themselves.

A new helper, nvmet_check_data_len() is created to help with this
check.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[split patch, udpate changelog]
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9061c39

nvmet: Introduce nvmet_dsm_len() helper · 59ef0eaa

由 Christoph Hellwig 提交于 10月 23, 2019

Similar to the nvmet_rw_len helper.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59ef0eaa

nvmet: Cleanup discovery execute handlers · 6f86f2c9

由 Christoph Hellwig 提交于 10月 23, 2019

Push the lid and cns check into their respective handlers and, while
we're at it, rename the functions to be consistent with other
discovery handlers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6f86f2c9

nvmet: Introduce common execute function for get_log_page and identify · 2cb6963a

由 Christoph Hellwig 提交于 10月 23, 2019

Instead of picking the sub-command handler to execute in a nested
switch statement introduce a landing functions that calls out
to the appropriate sub-command handler.

This will allow us to have a common place in the handler to check
the transfer length in a future patch.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[split patch, update change log]
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2cb6963a

nvmet-tcp: Don't set the request's data_len · c73eebc0

由 Logan Gunthorpe 提交于 10月 23, 2019

It's not apprporiate for the transports to set the data_len
field of the request which is only used by the core.

In this case, just use a variable on the stack to store the
length of the sgl for comparison.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c73eebc0

nvmet-tcp: Don't check data_len in nvmet_tcp_map_data() · e0bace71

由 Logan Gunthorpe 提交于 10月 23, 2019

None of the other transports check data_len which is verified
in core code. The function should instead check that the sgl length
is non-zero.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e0bace71

nvme: Introduce nvme_lba_to_sect() · e08f2ae8

由 Damien Le Moal 提交于 10月 21, 2019

Introduce the new helper function nvme_lba_to_sect() to convert a device
logical block number to a 512B sector number. Use this new helper in
obvious places, cleaning up the code.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e08f2ae8

nvme: Cleanup and rename nvme_block_nr() · 314d48dd

由 Damien Le Moal 提交于 10月 21, 2019

Rename nvme_block_nr() to nvme_sect_to_lba() and use SECTOR_SHIFT
instead of its hard coded value 9. Also add a comment to decribe this
helper.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

314d48dd

nvme: move common call to nvme_cleanup_cmd to core layer · 16686f3a

由 Max Gurtovoy 提交于 10月 13, 2019

nvme_cleanup_cmd should be called for each call to nvme_setup_cmd
(symmetrical functions). Move the call for nvme_cleanup_cmd to the common
core layer and call it during nvme_complete_rq for the good flow. For
error flow, each transport will call nvme_cleanup_cmd independently. Also
take care of a special case of path failure, where we call
nvme_complete_rq without doing nvme_setup_cmd.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

16686f3a

nvme: introduce "Command Aborted By host" status code · 2dc3947b

由 Max Gurtovoy 提交于 10月 13, 2019

Fix the status code of canceled requests initiated by the host according
to TP4028 (Status Code 0x371):
"Command Aborted By host: The command was aborted as a result of host
action (e.g., the host disconnected the Fabric connection)."

Also in a multipath environment, unless otherwise specified, errors of
this type (path related) should be retried using a different path, if
one is available.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2dc3947b

nvmet-rdma: add unlikely check at nvmet_rdma_map_sgl_keyed · 59534b9d

由 Israel Rukshin 提交于 10月 13, 2019

The calls to nvmet_req_alloc_sgl and rdma_rw_ctx_init should usually
succeed, so add this simple optimization to the fast path.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59534b9d

nvmet: add unlikely check at nvmet_req_alloc_sgl · e522f446

由 Israel Rukshin 提交于 10月 13, 2019

The call to sgl_alloc shouldn't fail so add this simple optimization to
the fast path.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e522f446

nvmet: use bio_io_error instead of duplicating it · 4d764bb9

由 Israel Rukshin 提交于 10月 13, 2019

This commit doesn't change any logic.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d764bb9

nvme: introduce nvme_is_aen_req function · 58a8df67

由 Israel Rukshin 提交于 10月 13, 2019

This function improves code readability and reduces code duplication.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

58a8df67

nvme-fc: ensure association_id is cleared regardless of a Disconnect LS · bcde5f0f

由 James Smart 提交于 9月 27, 2019

Code today only clears the association_id if a Disconnect LS is transmit.

Remove ambiguity and unconditionally clear the association_id if the
association has been terminated.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bcde5f0f

nvme-fc: clarify error messages · 7db39484

由 James Smart 提交于 9月 27, 2019

Change wording on a couple of messages to clarify what happened.
Signed-off-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7db39484

nvme-fc: Set new cmd set indicator in nvme-fc cmnd iu · 44fbf3bb

由 James Smart 提交于 9月 27, 2019

Set the new category field in the FC-NVME CMND_IU based on queue number.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

44fbf3bb

nvme-fc and nvmet-fc: sync with FC-NVME-2 header changes · 53b2b2f5

由 James Smart 提交于 9月 27, 2019

Sync sources with revised structure and field names to correspond with
FC-NVME-2 header sync-up.

Tested interoperability with success:
- prior initiator with new target
- prior target with new initiator
- new on new
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

53b2b2f5

29 10月, 2019 3 次提交

nvme-multipath: remove unused groups_only mode in ana log · 86cccfbf

由 Anton Eidelman 提交于 10月 18, 2019

groups_only mode in nvme_read_ana_log() is no longer used: remove it.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

86cccfbf

nvme-multipath: fix possible io hang after ctrl reconnect · af8fd042

由 Anton Eidelman 提交于 10月 18, 2019

The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
   NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
   This fails because ctrl disconnects.
   Therefore nvme_update_ns_ana_state() is not called
   and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
   nvme_read_ana_log(ctrl, groups_only=true).
   However, nvme_update_ana_state() does not update namespaces
   because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.

Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af8fd042

net: use skb_queue_empty_lockless() in busy poll contexts · 3f926af3

由 Eric Dumazet 提交于 10月 23, 2019

Busy polling usually runs without locks.
Let's use skb_queue_empty_lockless() instead of skb_queue_empty()

Also uses READ_ONCE() in __skb_try_recv_datagram() to address
a similar potential problem.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f926af3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功