提交 · 026d2ef752f47f33efd92244b9cf6be65d2a1621 · openeuler / Kernel

26 3月, 2020 28 次提交

nvme: rename __nvme_find_ns_head to nvme_find_ns_head · 026d2ef7

由 Christoph Hellwig 提交于 3月 25, 2020

There is no non __-prefixed version, so make the name a little more
readable.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

026d2ef7

nvme: refactor nvme_identify_ns_descs error handling · fb314eb0

由 Christoph Hellwig 提交于 3月 25, 2020

Move the handling of an error into the function from the caller, and
only do it for an actual error on the admin command itself, not the
command parsing, as that should be enough to deal with devices claiming
a bogus version compliance.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

fb314eb0

nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl · bea54ef5

由 Israel Rukshin 提交于 3月 24, 2020

The transition to LIVE state should not fail in case of a new controller.
Moving to DELETING state before nvme_tcp_create_ctrl() allocates all the
resources may leads to NULL dereference at teardown flow (e.g., IO tagset,
admin_q, connect_q).
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

bea54ef5

nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl · 96135862

由 Israel Rukshin 提交于 3月 24, 2020

The transition to LIVE state should not fail in case of a new controller.
Moving to DELETING state before nvme_tcp_create_ctrl() allocates all the
resources may leads to NULL dereference at teardown flow (e.g., IO tagset,
admin_q, connect_q).
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

96135862

nvme: Fix controller creation races with teardown flow · ce151813

由 Israel Rukshin 提交于 3月 24, 2020

Calling nvme_sysfs_delete() when the controller is in the middle of
creation may cause several bugs. If the controller is in NEW state we
remove delete_controller file and don't delete the controller. The user
will not be able to use nvme disconnect command on that controller again,
although the controller may be active. Other bugs may happen if the
controller is in the middle of create_ctrl callback and
nvme_do_delete_ctrl() starts. For example, freeing I/O tagset at
nvme_do_delete_ctrl() before it was allocated at create_ctrl callback.

To fix all those races don't allow the user to delete the controller
before it was fully created.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

ce151813

nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl · 726612b6

由 Israel Rukshin 提交于 3月 24, 2020

Put the ctrl reference count at nvme_uninit_ctrl as opposed to
nvme_init_ctrl which takes it. This decrease the reference count at the
core layer instead of decreasing it on each transport separately.
Also move the call of nvme_uninit_ctrl at PCI driver after calling to
nvme_release_prp_pools and nvme_dev_unmap, in order to put the reference
count after using the dev. This is safe because those functions use
nvme_dev which is freed only later at nvme_pci_free_ctrl.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

726612b6

nvme: Fix ctrl use-after-free during sysfs deletion · b780d741

由 Israel Rukshin 提交于 3月 24, 2020

In case nvme_sysfs_delete() is called by the user before taking the ctrl
reference count, the ctrl may be freed during the creation and cause the
bug. Take the reference as soon as the controller is externally visible,
which is done by cdev_device_add() in nvme_init_ctrl(). Also take the
reference count at the core layer instead of taking it on each transport
separately.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

b780d741

nvme-pci: Re-order nvme_pci_free_ctrl · 253fd4ac

由 Israel Rukshin 提交于 3月 24, 2020

Destroy the resources in the same order like in nvme_probe error flow to
improve code readability.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

253fd4ac

nvme: Remove unused return code from nvme_delete_ctrl_sync · 6721c18a

由 Israel Rukshin 提交于 3月 24, 2020

The return code of nvme_delete_ctrl_sync is never used, so change it to
void.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

6721c18a

nvme: Use nvme_state_terminal helper · e7c43fea

由 Israel Rukshin 提交于 3月 10, 2020

Improve code readability.
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

e7c43fea

nvme: release ida resources · f41cfd5d

由 Max Gurtovoy 提交于 3月 18, 2020

ida instances allocate some internal memory in addition to the base
'struct ida'. Use ida_destroy() to release that memory at module_exit().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

f41cfd5d

nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO · c225b610

由 masahiro31.yamada@kioxia.com 提交于 3月 05, 2020

Currently 32 bit application gets ENOTTY when it calls
compat_ioctl with NVME_IOCTL_SUBMIT_IO in 64 bit kernel.

The cause is that the results of sizeof(struct nvme_user_io),
which is used to define NVME_IOCTL_SUBMIT_IO,
are not same between 32 bit compiler and 64 bit compiler.

* 32 bit: the result of sizeof nvme_user_io is 44.
* 64 bit: the result of sizeof nvme_user_io is 48.

64 bit compiler seems to add 32 bit padding for multiple of 8 bytes.

This patch adds a compat_ioctl handler.
The handler replaces NVME_IOCTL_SUBMIT_IO32 with NVME_IOCTL_SUBMIT_IO
in case 32 bit application calls compat_ioctl for submit in 64 bit kernel.
Then, it calls nvme_ioctl as usual.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMasahiro Yamada (KIOXIA) <masahiro31.yamada@kioxia.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

c225b610

nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow · 8d8a50e2

由 Takashi Iwai 提交于 3月 11, 2020

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

8d8a50e2

nvme-multipath: do not reset on unknown status · 764e9332

由 John Meneghini 提交于 2月 20, 2020

The nvme multipath error handling defaults to controller reset if the
error is unknown. There are, however, no existing nvme status codes that
indicate a reset should be used, and resetting causes unnecessary
disruption to the rest of IO.

Change nvme's error handling to first check if failover should happen.
If not, let the normal error handling take over rather than reset the
controller.
Based-on-a-patch-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJohn Meneghini <johnm@netapp.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

764e9332

nvme-pci: properly print controller address · 2db24e4a

由 Max Gurtovoy 提交于 3月 09, 2020

Align PCI address print with fabrics address that is printed with
newline character.

Before:
[root@server40 linux]# cat /sys/class/nvme/nvme2/address
0000:0b:00.0[root@server40 linux]#

After:
[root@server40 linux]# cat /sys/class/nvme/nvme2/address
0000:0b:00.0
[root@server40 linux]#
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>

2db24e4a

nvme-tcp: break from io_work loop if recv failed · 761ad26c

由 Sagi Grimberg 提交于 2月 25, 2020

If we failed to receive data from the socket, don't try
to further process it, we will for sure be handling a queue
error at this point. While no issue was seen with the
current behavior thus far, its safer to cease socket processing
if we detected an error.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

761ad26c

nvme-tcp: move send failure to nvme_tcp_try_send · 5ff4e112

由 Sagi Grimberg 提交于 2月 25, 2020

Consolidate the request failure handling code to where
it is being fetched (nvme_tcp_try_send).
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

5ff4e112

nvme-tcp: optimize queue io_cpu assignment for multiple queue maps · 40510a63

由 Sagi Grimberg 提交于 2月 25, 2020

Currently, queue io_cpu assignment is done sequentially for default,
read and poll queues based on queue id. This causes miss-alignment between
context of CPU initiating I/O and the I/O worker thread processing
queued requests or completions.

Change to modify queue io_cpu assignment to take into account queue
maps offset. Each queue io_cpu will start at zero for each queue map.
This essentially aligns read/poll queues to start over the same range as
default queues.

Testing performed by Mark with:
- ram device (nvmet)
- single CPU core (pinned)
- 100% 4k reads
- engine io_uring (not using sq_thread option)
- hipri flag set

Micro-benchmark results show a net gain of:
- increase of 18%-29% in IOPs
- reduction of 16%-22% in average latency
- reduction of 7%-23% in 99.99% latency

Baseline:
========
QDepth/Batch	| IOPs [k]	| Avg. Lat [us]	| 99.99% Lat [us]
-----------------------------------------------------------------
1/1 		| 32.4		| 30.11		| 50.94
32/8		| 179		| 168.20	| 371

CPU alignment:
=============
QDepth/Batch	| IOPs [k]	| Avg. Lat [us]	| 99.99% Lat [us]
-----------------------------------------------------------------
1/1 		| 38.5		|   25.18	| 39.16
32/8		| 231		|   130.75	| 343
Reported-by: NMark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

40510a63

nvme-pci: Simplify nvme_poll_irqdisable · fa059b85

由 Keith Busch 提交于 3月 04, 2020

The timeout handler can use the existing nvme_poll() if it needs to
check a polled queue, allowing nvme_poll_irqdisable() to handle only
irq driven queues for the remaining callers.
Signed-off-by: NKeith Busch <kbusch@kernel.org>

fa059b85

nvme-pci: Remove two-pass completions · 324b494c

由 Keith Busch 提交于 3月 02, 2020

Completion handling had been done in two steps: find all new completions
under a lock, then handle those completions outside the lock. This was
done to make the locked section as short as possible so that other
threads using the same lock wait less time.

The driver no longer shares locks during completion, and is in fact
lockless for interrupt driven queues, so the optimization no longer
serves its original purpose. Replace the two-pass completion queue
handler with a single pass that completes entries immediately.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

324b494c

nvme-pci: Remove tag from process cq · bf392a5d

由 Keith Busch 提交于 3月 02, 2020

The only user for tagged completion was for timeout handling. That user,
though, really only cares if the timed out command is completed, which
we can safely check within the timeout handler.

Remove the tag check to simplify completion handling.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

bf392a5d

nvme-pci: slimmer CQ head update · e2a366a4

由 Alexey Dobriyan 提交于 2月 28, 2020

Update CQ head with pre-increment operator. This saves subtraction of 1
and a few registers.

Also update phase with "^= 1". This generates only one RMW instruction.

ffffffff815ba150 <nvme_update_cq_head>:
ffffffff815ba150: 0f b7 47 70 movzx eax,WORD PTR [rdi+0x70]
ffffffff815ba154: 83 c0 01 add eax,0x1
ffffffff815ba157: 66 89 47 70 mov WORD PTR [rdi+0x70],ax
ffffffff815ba15b: 66 3b 47 68 cmp ax,WORD PTR [rdi+0x68]
ffffffff815ba15f: 74 01 je ffffffff815ba162 <nvme_update_cq_head+0x12>
ffffffff815ba161: c3 ret
ffffffff815ba162: 31 c0 xor eax,eax
ffffffff815ba164: 80 77 74 01 ===> xor BYTE PTR [rdi+0x74],0x1
ffffffff815ba168: 66 89 47 70 mov WORD PTR [rdi+0x70],ax
ffffffff815ba16c: c3 ret

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-119 (-119)
Function old new delta
nvme_poll 690 678 -12
nvme_dev_disable 1230 1177 -53
nvme_irq 613 559 -54
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

e2a366a4

nvme: Check for readiness more quickly, to speed up boot time · 3e98c244

由 Josh Triplett 提交于 2月 28, 2020

After initialization, nvme_wait_ready checks for readiness every 100ms,
even though the drive may be ready far sooner than that. This delays
system boot by hundreds of milliseconds. Reduce the delay, checking for
readiness every millisecond instead.

Boot-time tests on an AWS c5.12xlarge:

Before:
[    0.546936] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
...
[    0.764178] nvme nvme0: 2/0/0 default/read/poll queues
[    0.768424]  nvme0n1: p1
[    0.774132] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[    0.774146] VFS: Mounted root (ext4 filesystem) on device 259:1.
...
[    0.788141] Run /sbin/init as init process

After:
[    0.537088] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
...
[    0.543457] nvme nvme0: 2/0/0 default/read/poll queues
[    0.548473]  nvme0n1: p1
[    0.554339] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[    0.554344] VFS: Mounted root (ext4 filesystem) on device 259:1.
...
[    0.567931] Run /sbin/init as init process
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

3e98c244

nvme: log additional message for controller status · 94d2e705

由 Rupesh Girase 提交于 2月 27, 2020

Log the controller status to know more about issue if it
lies within kernel nvme subsytem or controller is unhealthy.
Signed-off-by: NRupesh Girase <rgirase@redhat.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulakrni@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

94d2e705

nvme: code cleanup nvme_identify_ns_desc() · ad95a613

由 Chaitanya Kulkarni 提交于 2月 19, 2020

The function nvme_identify_ns_desc() has 3 levels of nesting which make
error message to exceeded > 80 char per line which is not aligned with
the kernel code standards and rest of the NVMe subsystem code.

Add a helper function to move the processing of the log when the
command is successful by reducing the nesting and keeping the
code < 80 char per line.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

ad95a613

nvme: Don't deter users from enabling hwmon support · 22891450

由 Jean Delvare 提交于 2月 11, 2020

I see no good reason for the "If unsure, say N" advice in the description
of the NVME_HWMON configuration option. It is not dangerous, it does
not select any other option, and has a fairly low overhead.

As the option is already not enabled by default, further suggesting
hesitant users to not enable it is not useful anyway. Unlike some other
options where the description alone may not be sufficient for users to
make a decision, NVME_HWMON is pretty simple to grasp in my opinion,
so just let the user do what they want.
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

22891450

nvme: expose hostid via sysfs for fabrics controllers · 45fb19f7

由 Sagi Grimberg 提交于 2月 07, 2020

We allow userspace to connect with a custom hostid which is useful for
certain use-cases. However there is is no way to tell what is the hostid
used to connect to a given controller.

Expose this so userspace can correlate controllers based on hostid.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

45fb19f7

nvme: expose hostnqn via sysfs for fabrics controllers · 76171c6c

由 Sagi Grimberg 提交于 2月 07, 2020

We allow userspace to connect with a custom hostnqn which is useful for
certain use-cases. However there is no way to tell what is the hostnqn
used to connect to a given controller.

Expose this so userspace can correlate controllers based on hostnqn.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

76171c6c

05 3月, 2020 2 次提交

nvme-tcp: Set SO_PRIORITY for all host sockets · 9912ade3

由 Wunderlich, Mark 提交于 1月 16, 2020

Enable ability to associate all sockets related to NVMf TCP traffic
to a priority group that will perform optimized network processing for
this traffic class. Maintain initial default behavior of using priority
of zero.
Signed-off-by: NKiran Patil <kiran.patil@intel.com>
Signed-off-by: NMark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

9912ade3

nvme: remove unused return code from nvme_alloc_ns · adce7e98

由 Edmund Nadolski 提交于 11月 27, 2019

The return code of nvme_alloc_ns is never used, so change it
to void.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NEdmund Nadolski <edmund.nadolski@intel.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

adce7e98

28 2月, 2020 1 次提交

nvme-pci: Hold cq_poll_lock while completing CQEs · 9515743b

由 Bijan Mottahedeh 提交于 2月 26, 2020

Completions need to consumed in the same order the controller submitted
them, otherwise future completion entries may overwrite ones we haven't
handled yet. Hold the nvme queue's poll lock while completing new CQEs to
prevent another thread from freeing command tags for reuse out-of-order.

Fixes: dabcefab ("nvme: provide optimized poll function for separate poll queues")
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

9515743b

21 2月, 2020 1 次提交

nvme-multipath: Fix memory leak with ana_log_buf · 3b783090

由 Logan Gunthorpe 提交于 2月 20, 2020

kmemleak reports a memory leak with the ana_log_buf allocated by
nvme_mpath_init():

unreferenced object 0xffff888120e94000 (size 8208):
  comm "nvme", pid 6884, jiffies 4295020435 (age 78786.312s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
      01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000e2360188>] kmalloc_order+0x97/0xc0
      [<0000000079b18dd4>] kmalloc_order_trace+0x24/0x100
      [<00000000f50c0406>] __kmalloc+0x24c/0x2d0
      [<00000000f31a10b9>] nvme_mpath_init+0x23c/0x2b0
      [<000000005802589e>] nvme_init_identify+0x75f/0x1600
      [<0000000058ef911b>] nvme_loop_configure_admin_queue+0x26d/0x280
      [<00000000673774b9>] nvme_loop_create_ctrl+0x2a7/0x710
      [<00000000f1c7a233>] nvmf_dev_write+0xc66/0x10b9
      [<000000004199f8d0>] __vfs_write+0x50/0xa0
      [<0000000065466fef>] vfs_write+0xf3/0x280
      [<00000000b0db9a8b>] ksys_write+0xc6/0x160
      [<0000000082156b91>] __x64_sys_write+0x43/0x50
      [<00000000c34fbb6d>] do_syscall_64+0x77/0x2f0
      [<00000000bbc574c9>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

nvme_mpath_init() is called by nvme_init_identify() which is called in
multiple places (nvme_reset_work(), nvme_passthru_end(), etc). This
means nvme_mpath_init() may be called multiple times before
nvme_mpath_uninit() (which is only called on nvme_free_ctrl()).

When nvme_mpath_init() is called multiple times, it overwrites the
ana_log_buf pointer with a new allocation, thus leaking the previous
allocation.

To fix this, free ana_log_buf before allocating a new one.

Fixes: 0d0b660f ("nvme: add ANA support")
Cc: <stable@vger.kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

3b783090

20 2月, 2020 1 次提交

nvme: Fix uninitialized-variable warning · 15755854

由 Keith Busch 提交于 2月 20, 2020

gcc may detect a false positive on nvme using an unintialized variable
if setting features fails. Since this is not a fast path, explicitly
initialize this variable to suppress the warning.
Reported-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

15755854

19 2月, 2020 2 次提交

nvme-pci: Use single IRQ vector for old Apple models · 98f7b86a

由 Andy Shevchenko 提交于 2月 12, 2020

People reported that old Apple machines are not working properly
if the non-first IRQ vector is in use.

Set quirk for that models to limit IRQ to use first vector only.

Based on original patch by GitHub user npx001.

Link: https://github.com/Dunedan/mbp-2016-linux/issues/9
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Leif Liddy <leif.liddy@gmail.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

98f7b86a

nvme/pci: Add sleep quirk for Samsung and Toshiba drives · 1fae37ac

由 Shyjumon N 提交于 2月 06, 2020

The Samsung SSD SM981/PM981 and Toshiba SSD KBG40ZNT256G on the Lenovo
C640 platform experience runtime resume issues when the SSDs are kept in
sleep/suspend mode for long time.

This patch applies the 'Simple Suspend' quirk to these configurations.
With this patch, the issue had not been observed in a 1+ day test.
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NShyjumon N <shyjumon.n@intel.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

1fae37ac

15 2月, 2020 4 次提交

nvme: fix the parameter order for nvme_get_log in nvme_get_fw_slot_info · f25372ff

由 Yi Zhang 提交于 2月 14, 2020

nvme fw-activate operation will get bellow warning log,
fix it by update the parameter order

[  113.231513] nvme nvme0: Get FW SLOT INFO log error

Fixes: 0e98719b ("nvme: simplify the API for getting log pages")
Reported-by: NSujith Pandel <sujith_pandel@dell.com>
Reviewed-by: NDavid Milburn <dmilburn@redhat.com>
Signed-off-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f25372ff

nvme/pci: move cqe check after device shutdown · fa46c6fb

由 Keith Busch 提交于 2月 13, 2020

Many users have reported nvme triggered irq_startup() warnings during
shutdown. The driver uses the nvme queue's irq to synchronize scanning
for completions, and enabling an interrupt affined to only offline CPUs
triggers the alarming warning.

Move the final CQE check to after disabling the device and all
registered interrupts have been torn down so that we do not have any
IRQ to synchronize.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206509Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa46c6fb

nvme: prevent warning triggered by nvme_stop_keep_alive · 97b2512a

由 Nigel Kirkland 提交于 2月 10, 2020

Delayed keep alive work is queued on system workqueue and may be cancelled
via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq.

Check_flush_dependency detects mismatched attributes between the work-queue
context used to cancel the keep alive work and system-wq. Specifically
system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used
to cancel keep alive work have WQ_MEM_RECLAIM flag.

Example warning:

  workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc]
	is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]

To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq.

However this creates a secondary concern where work and a request to cancel
that work may be in the same work queue - namely err_work in the rdma and
tcp transports, which will want to flush/cancel the keep alive work which
will now be on nvme_wq.

After reviewing the transports, it looks like err_work can be moved to
nvme_reset_wq. In fact that aligns them better with transition into
RESETTING and performing related reset work in nvme_reset_wq.

Change nvme-rdma and nvme-tcp to perform err_work in nvme_reset_wq.
Signed-off-by: NNigel Kirkland <nigel.kirkland@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

97b2512a

nvme/tcp: fix bug on double requeue when send fails · 2d570a7c

由 Anton Eidelman 提交于 2月 10, 2020

When nvme_tcp_io_work() fails to send to socket due to
connection close/reset, error_recovery work is triggered
from nvme_tcp_state_change() socket callback.
This cancels all the active requests in the tagset,
which requeues them.

The failed request, however, was ended and thus requeued
individually as well unless send returned -EPIPE.
Another return code to be treated the same way is -ECONNRESET.

Double requeue caused BUG_ON(blk_queued_rq(rq))
in blk_mq_requeue_request() from either the individual requeue
of the failed request or the bulk requeue from
blk_mq_tagset_busy_iter(, nvme_cancel_request, );
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d570a7c

04 2月, 2020 1 次提交

nvme-pci: remove nvmeq->tags · cfa27356

由 Christoph Hellwig 提交于 1月 30, 2020

There is no real need to have a pointer to the tagset in
struct nvme_queue, as we only need it in a single place, and that place
can derive the used tagset from the device and qid trivially.  This
fixes a problem with stale pointer exposure when tagsets are reset,
and also shrinks the nvme_queue structure.  It also matches what most
other transports have done since day 1.
Reported-by: NEdmund Nadolski <edmund.nadolski@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

cfa27356

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功