提交 · 538e4a8c571efdf131834431e0c14808bcfb1004 · openeuler / Kernel

02 2月, 2021 1 次提交

nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs · 538e4a8c

由 Thorsten Leemhuis 提交于 1月 29, 2021

Some Kingston A2000 NVMe SSDs sooner or later get confused and stop
working when they use the deepest APST sleep while running Linux. The
system then crashes and one has to cold boot it to get the SSD working
again.

Kingston seems to known about this since at least mid-September 2020:
https://bbs.archlinux.org/viewtopic.php?pid=1926994#p1926994

Someone working for a German company representing Kingston to the German
press confirmed to me Kingston engineering is aware of the issue and
investigating; the person stated that to their current knowledge only
the deepest APST sleep state causes trouble. Therefore, make Linux avoid
it for now by applying the NVME_QUIRK_NO_DEEPEST_PS to this SSD.

I have two such SSDs, but it seems the problem doesn't occur with them.
I hence couldn't verify if this patch really fixes the problem, but all
the data in front of me suggests it should.

This patch can easily be reverted or improved upon if a better solution
surfaces.

FWIW, there are many reports about the issue scattered around the web;
most of the users disabled APST completely to make things work, some
just made Linux avoid the deepest sleep state:

https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c73
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c74
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c78
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c79
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c80
https://askubuntu.com/questions/1222049/nvmekingston-a2000-sometimes-stops-giving-response-in-ubuntu-18-04dell-inspir
https://community.acer.com/en/discussion/604326/m-2-nvme-ssd-aspire-517-51g-issue-compatibility-kingston-a2000-linux-ubuntu

For the record, some data from 'nvme id-ctrl /dev/nvme0'

NVME Identify Controller:
vid       : 0x2646
ssvid     : 0x2646
mn        : KINGSTON SA2000M81000G
fr        : S5Z42105
[...]
ps    0 : mp:9.00W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:4.60W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:3.80W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0450W non-operational enlat:2000 exlat:2000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0040W non-operational enlat:15000 exlat:15000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Cc: stable@vger.kernel.org # 4.14+
Signed-off-by: NThorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

538e4a8c

29 1月, 2021 3 次提交

nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head · 772ea326

由 Chao Leng 提交于 1月 28, 2021

The "list" of nvme_ns_head is used as rcu list, now in nvme_init_ns_head
list_add_tail is used to add ns->siblings to the rcu list. It is not safe.
Should use list_add_tail_rcu instead of list_add_tail.
Signed-off-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

772ea326

nvme-multipath: Early exit if no path is available · d1bcf006

由 Daniel Wagner 提交于 1月 27, 2021

nvme_round_robin_path() should test if the return ns pointer is valid.
nvme_next_ns() will return a NULL pointer if there is no path left.

Fixes: 75c10e73 ("nvme-multipath: round-robin I/O policy")
Signed-off-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d1bcf006

nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device · 89919929

由 Chaitanya Kulkarni 提交于 1月 25, 2021

This adds a quirk for SPCC 256GB NVMe 1.3 drive which fixes timeouts and
I/O errors due to the fact that the controller does not properly
handle the Write Zeroes command:

[ 2745.659527] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G            E 5.10.6-BET #1
[ 2745.659528] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 3001 12/04/2020
[ 2776.138874] nvme nvme1: I/O 414 QID 3 timeout, aborting
[ 2776.138886] nvme nvme1: I/O 415 QID 3 timeout, aborting
[ 2776.138891] nvme nvme1: I/O 416 QID 3 timeout, aborting
[ 2776.138895] nvme nvme1: I/O 417 QID 3 timeout, aborting
[ 2776.138912] nvme nvme1: Abort status: 0x0
[ 2776.138921] nvme nvme1: I/O 428 QID 3 timeout, aborting
[ 2776.138922] nvme nvme1: Abort status: 0x0
[ 2776.138925] nvme nvme1: Abort status: 0x0
[ 2776.138974] nvme nvme1: Abort status: 0x0
[ 2776.138977] nvme nvme1: Abort status: 0x0
[ 2806.346792] nvme nvme1: I/O 414 QID 3 timeout, reset controller
[ 2806.363566] nvme nvme1: 15/0/0 default/read/poll queues
[ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
[ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672121] nvme nvme1: failed to mark controller live state
[ 2836.672123] nvme nvme1: Removing after probe failure status: -19
[ 2836.689016] Aborting journal on device dm-0-8.
[ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592, lost sync page write
[ 2836.689027] JBD2: Error -5 detected when updating journal superblock for dm-0-8.
Reported-by: NBradley Chapman <chapman6235@comcast.net>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: NBradley Chapman <chapman6235@comcast.net>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

89919929

21 1月, 2021 2 次提交

nvme-pci: fix error unwind in nvme_map_data · fa073216

由 Christoph Hellwig 提交于 1月 20, 2021

Properly unwind step by step using refactored helpers from nvme_unmap_data
to avoid a potential double dma_unmap on a mapping failure.

Fixes: 7fe07d14 ("nvme-pci: merge nvme_free_iod into nvme_unmap_data")
Reported-by: NMarc Orr <marcorr@google.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NMarc Orr <marcorr@google.com>

fa073216

nvme-pci: refactor nvme_unmap_data · 9275c206

由 Christoph Hellwig 提交于 1月 20, 2021

Split out three helpers from nvme_unmap_data that will allow finer grained
unwinding from nvme_map_data.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NMarc Orr <marcorr@google.com>

9275c206

19 1月, 2021 4 次提交

nvme-pci: allow use of cmb on v1.4 controllers · 20d3bb92

由 Klaus Jensen 提交于 1月 15, 2021

Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled
by the host.
Signed-off-by: NKlaus Jensen <k.jensen@samsung.com>
[hch: avoid a local variable and add a comment]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

20d3bb92

nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout · 9ebbfe49

由 Chao Leng 提交于 1月 14, 2021

Each name space has a request queue, if complete request long time,
multi request queues may have time out requests at the same time,
nvme_tcp_timeout will execute concurrently. Multi requests in different
request queues may be queued in the same tcp queue, multi
nvme_tcp_timeout may call nvme_tcp_stop_queue at the same time.
The first nvme_tcp_stop_queue will clear NVME_TCP_Q_LIVE and continue
stopping the tcp queue(cancel io_work), but the others check
NVME_TCP_Q_LIVE is already cleared, and then directly complete the
requests, complete request before the io work is completely canceled may
lead to a use-after-free condition.
Add a multex lock to serialize nvme_tcp_stop_queue.
Signed-off-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9ebbfe49

nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout · 7674073b

由 Chao Leng 提交于 1月 14, 2021

A crash happens when inject completing request long time(nearly 30s).
Each name space has a request queue, when inject completing request long
time, multi request queues may have time out requests at the same time,
nvme_rdma_timeout will execute concurrently. Multi requests in different
request queues may be queued in the same rdma queue, multi
nvme_rdma_timeout may call nvme_rdma_stop_queue at the same time.
The first nvme_rdma_timeout will clear NVME_RDMA_Q_LIVE and continue
stopping the rdma queue(drain qp), but the others check NVME_RDMA_Q_LIVE
is already cleared, and then directly complete the requests, complete
request before the qp is fully drained may lead to a use-after-free
condition.

Add a multex lock to serialize nvme_rdma_stop_queue.
Signed-off-by: NChao Leng <lengchao@huawei.com>
Tested-by: NIsrael Rukshin <israelr@nvidia.com>
Reviewed-by: NIsrael Rukshin <israelr@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7674073b

nvme: check the PRINFO bit before deciding the host buffer length · 4d6b1c95

由 Revanth Rajashekar 提交于 1月 14, 2021

According to NVMe spec v1.4, section 8.3.1, the PRINFO bit and
the metadata size play a vital role in deteriming the host buffer size.

If PRIFNO bit is set and MS==8, the host doesn't add the metadata buffer,
instead the controller adds it.
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4d6b1c95

15 1月, 2021 3 次提交

nvme: don't intialize hwmon for discovery controllers · 5ab25a32

由 Sagi Grimberg 提交于 1月 13, 2021

Discovery controllers usually don't support smart log page command.
So when we connect to the discovery controller we see this warning:
nvme nvme0: Failed to read smart log (error 24577)
nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.123.1:8009
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

Introduce a new helper to understand if the controller is a discovery
controller and use this helper to skip nvme_init_hwmon (also use it in
other places that we check if the controller is a discovery controller).

Fixes: 400b6a7b ("nvme: Add hardware monitoring support")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5ab25a32

nvme-tcp: fix possible data corruption with bio merges · ca1ff67d

由 Sagi Grimberg 提交于 1月 13, 2021

When a bio merges, we can get a request that spans multiple
bios, and the overall request payload size is the sum of
all bios. When we calculate how much we need to send
from the existing bio (and bvec), we did not take into
account the iov_iter byte count cap.

Since multipage bvecs support, bvecs can split in the middle
which means that when we account for the last bvec send we
should also take the iov_iter byte count cap as it might be
lower than the last bvec size.
Reported-by: NHao Wang <pkuwangh@gmail.com>
Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
Tested-by: NHao Wang <pkuwangh@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ca1ff67d

nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT · ada83177

由 Sagi Grimberg 提交于 1月 13, 2021

We shouldn't call smp_processor_id() in a preemptible
context, but this is advisory at best, so instead
call __smp_processor_id().

Fixes: db5ad6b7 ("nvme-tcp: try to send request in queue_rq context")
Reported-by: NOr Gerlitz <gerlitz.or@gmail.com>
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ada83177

06 1月, 2021 6 次提交

nvme: remove the unused status argument from nvme_trace_bio_complete · 2b59787a

由 Max Gurtovoy 提交于 1月 05, 2021

The only used argument in this function is the "req".
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2b59787a

nvme: unexport functions with no external caller · 9b66fc02

由 Minwoo Im 提交于 12月 30, 2020

There are no callers for nvme_reset_ctrl_sync() and
nvme_alloc_request_qid() so that we keep the symbols exported.

Unexport those functions, mark them static and update the header file
respectively.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9b66fc02

nvme: avoid possible double fetch in handling CQE · 62df8016

由 Lalithambika Krishnakumar 提交于 12月 23, 2020

While handling the completion queue, keep a local copy of the command id
from the DMA-accessible completion entry. This silences a time-of-check
to time-of-use (TOCTOU) warning from KF/x[1], with respect to a
Thunderclap[2] vulnerability analysis. The double-read impact appears
benign.

There may be a theoretical window for @command_id to be used as an
adversary-controlled array-index-value for mounting a speculative
execution attack, but that mitigation is saved for a potential follow-on.
A man-in-the-middle attack on the data payload is out of scope for this
analysis and is hopefully mitigated by filesystem integrity mechanisms.

[1] https://github.com/intel/kernel-fuzzer-for-xen-project
[2] http://thunderclap.io/thunderclap-paper-ndss2019.pdfSigned-off-by: NLalithambika Krishna Kumar <lalithambika.krishnakumar@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

62df8016

nvme-tcp: Fix possible race of io_work and direct send · 5c11f7d9

由 Sagi Grimberg 提交于 12月 21, 2020

We may send a request (with or without its data) from two paths:

  1. From our I/O context nvme_tcp_io_work which is triggered from:
    - queue_rq
    - r2t reception
    - socket data_ready and write_space callbacks
  2. Directly from queue_rq if the send_list is empty (because we want to
     save the context switch associated with scheduling our io_work).

However, given that now we have the send_mutex, we may run into a race
condition where none of these contexts will send the pending payload to
the controller. Both io_work send path and queue_rq send path
opportunistically attempt to acquire the send_mutex however queue_rq only
attempts to send a single request, and if io_work context fails to
acquire the send_mutex it will complete without rescheduling itself.

The race can trigger with the following sequence:

  1. queue_rq sends request (no incapsule data) and blocks
  2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
     to the send_list and schedules io_work
  3. io_work triggers and cannot acquire the send_mutex - because of (1),
     ends without self rescheduling
  4. queue_rq completes the send, and completes

==> no context will send the h2cdata - timeout.

Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.

Fixes: db5ad6b7 ("nvme-tcp: try to send request in queue_rq context")
Reported-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Reported-by: NSamuel Jones <sjones@kalrayinc.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5c11f7d9

nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN · 7ee5c78c

由 Gopal Tiwari 提交于 12月 04, 2020

A system with more than one of these SSDs will only have one usable.
Hence the kernel fails to detect nvme devices due to duplicate cntlids.

[    6.274554] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting
[    6.274566] nvme nvme1: Removing after probe failure status: -22

Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.
Signed-off-by: NGopal Tiwari <gtiwari@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7ee5c78c

nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context · 19fce047

由 James Smart 提交于 12月 01, 2020

Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
used to be called from a timeout or work context. Now it is being called
in an io completion context, which can be an interrupt handler.
Unfortunately, the abort outstanding ios routine attempts to stop nvme
queues and nested routines that may try to sleep, which is in conflict
with the interrupt handler.

Correct replacing the direct call with a work element scheduling, and the
abort outstanding ios routine will be called in the work element.

Fixes: 95ced8a2 ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reported-by: NDaniel Wagner <dwagner@suse.de>
Tested-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

19fce047

05 12月, 2020 1 次提交

block: remove the request_queue argument to the block_bio_remap tracepoint · 1c02fca6

由 Christoph Hellwig 提交于 12月 03, 2020

The request_queue can trivially be derived from the bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1c02fca6

02 12月, 2020 13 次提交

nvme: export zoned namespaces without Zone Append support read-only · 2f4c9ba2

由 Javier González 提交于 12月 01, 2020

Allow ZNS NVMe SSDs to present a read-only namespace when append is not
supported, instead of rejecting the namespace directly.

This allows (i) the namespace to be used in read-only mode, which is not
a problem as the append command only affects the write path, and (ii) to
use standard management tools such as nvme-cli to choose a different
format or firmware slot that is compatible with the Linux zoned block
device.
Signed-off-by: NJavier González <javier.gonz@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2f4c9ba2

nvme: rename bdev operations · ba4fb320

由 Javier González 提交于 12月 01, 2020

Remane block device operations in preparation to add char device file
operations.
Signed-off-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ba4fb320

nvme: rename controller base dev_t char device · f68abd9c

由 Javier González 提交于 12月 01, 2020

Rename controller base dev_t char device in preparation for adding a
namespace char device.
Signed-off-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f68abd9c

nvme: remove unnecessary return values · e1aaf5ca

由 Javier González 提交于 12月 01, 2020

Cleanup unnecessary ret values that are not checked or used in
nvme_alloc_ns().
Signed-off-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e1aaf5ca

nvme: print a warning for when listing active namespaces fails · f781f3dd

由 Minwoo Im 提交于 11月 30, 2020

During the scan_work, an Identify command is issued to figure out which
namespaces are active.  If this command fails, the nvme driver falls back
to scanning namespaces sequentially.  In this situation, we don't see
any warnings and don't even know whether list-ns command has been failed
or not easiliy.

Printa warning when the Identify command executin fail:

[    1.108399] nvme nvme0: Identify NS List failed (status=0x400b)
[    1.109583] nvme0n1: detected capacity change from 0 to 1048576
[    1.112186] nvme nvme0: Identify Descriptors failed (nsid=2, status=0x4002)
[    1.113929] nvme nvme0: Identify Descriptors failed (nsid=3, status=0x4002)
[    1.116537] nvme nvme0: Identify Descriptors failed (nsid=4, status=0x4002)
...
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f781f3dd

nvme: improve an error message on Identify failure · aa9d7295

由 Minwoo Im 提交于 11月 30, 2020

Add the namespace ID to the error message when the Identify command used
to retrieve the Namespace Identification Descriptor list fails.

This avoids rather useless and duplicative messages like the following:
[    1.321031] nvme nvme0: Identify Descriptors failed (16386)
[    1.321948] nvme nvme0: Identify Descriptors failed (16386)
[    1.322872] nvme nvme0: Identify Descriptors failed (16386)
[    1.323775] nvme nvme0: Identify Descriptors failed (16386)
[    1.324687] nvme nvme0: Identify Descriptors failed (16386)
...

Also, print the nvme status code in hexadecimal rather than decimal
format rather for better readability.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

aa9d7295

nvme-fabrics: reject I/O to offline device · 8c4dfea9

由 Victor Gladkov 提交于 11月 24, 2020

Commands get stuck while Host NVMe-oF controller is in reconnect state.
The controller enters into reconnect state when it loses connection with
the target.  It tries to reconnect every 10 seconds (default) until
a successful reconnect or until the reconnect time-out is reached.
The default reconnect time out is 10 minutes.

Applications are expecting commands to complete with success or error
within a certain timeout (30 seconds by default).  The NVMe host is
enforcing that timeout while it is connected, but during reconnect the
timeout is not enforced and commands may get stuck for a long period or
even forever.

To fix this long delay due to the default timeout, introduce new
"fast_io_fail_tmo" session parameter.  The timeout is measured in seconds
from the controller reconnect and any command beyond that timeout is
rejected.  The new parameter value may be passed during 'connect'.
The default value of -1 means no timeout (similar to current behavior).
Signed-off-by: NVictor Gladkov <victor.gladkov@kioxia.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8c4dfea9

nvme-pci: don't allocate unused I/O queues · e3aef095

由 Niklas Schnelle 提交于 11月 12, 2020

currently the NVME_QUIRK_SHARED_TAGS quirk for Apple devices is handled
during the assignment of nr_io_queues in nvme_setup_io_queues().
This however means that for these devices nvme_max_io_queues() will
actually not return the supported maximum which is confusing and
unexpected and also means that in nvme_probe() we are allocating
for I/O queues that will never be used.
Fix this by moving the quirk handling into nvme_max_io_queues().
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e3aef095

nvme-pci: drop min() from nr_io_queues assignment · ff4e5fba

由 Niklas Schnelle 提交于 11月 12, 2020

in nvme_setup_io_queues() the number of I/O queues is set to either 1 in
case of a quirky Apple device or to the min of nvme_max_io_queues() or
dev->nr_allocated_queues - 1.
This is unnecessarily complicated as dev->nr_allocated_queues is only
assigned once and is nvme_max_io_queues() + 1.
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ff4e5fba

nvme: split nvme_alloc_request() · 39dfe844

由 Chaitanya Kulkarni 提交于 11月 09, 2020

Right now nvme_alloc_request() allocates a request from block layer
based on the value of the qid. When qid set to NVME_QID_ANY it used
blk_mq_alloc_request() else blk_mq_alloc_request_hctx().

The function nvme_alloc_request() is called from different context, The
only place where it uses non NVME_QID_ANY value is for fabrics connect
commands :-

nvme_submit_sync_cmd()		NVME_QID_ANY
nvme_features()			NVME_QID_ANY
nvme_sec_submit()		NVME_QID_ANY
nvmf_reg_read32()		NVME_QID_ANY
nvmf_reg_read64()		NVME_QID_ANY
nvmf_reg_write32()		NVME_QID_ANY
nvmf_connect_admin_queue()	NVME_QID_ANY
nvme_submit_user_cmd()		NVME_QID_ANY
	nvme_alloc_request()
nvme_keep_alive()		NVME_QID_ANY
	nvme_alloc_request()
nvme_timeout()			NVME_QID_ANY
	nvme_alloc_request()
nvme_delete_queue()		NVME_QID_ANY
	nvme_alloc_request()
nvmet_passthru_execute_cmd()	NVME_QID_ANY
	nvme_alloc_request()
nvmf_connect_io_queue() 	QID
	__nvme_submit_sync_cmd()
		nvme_alloc_request()

With passthru nvme_alloc_request() now falls into the I/O fast path such
that blk_mq_alloc_request_hctx() is never gets called and that adds
additional branch check in fast path.

Split the nvme_alloc_request() into nvme_alloc_request() and
nvme_alloc_request_qid().

Replace each call of the nvme_alloc_request() with NVME_QID_ANY param
with a call to newly added nvme_alloc_request() without NVME_QID_ANY.

Replace a call to nvme_alloc_request() with QID param with a call to
newly added nvme_alloc_request() and nvme_alloc_request_qid()
based on the qid value set in the __nvme_submit_sync_cmd().
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

39dfe844

nvme: use consistent macro name for timeout · dc96f938

由 Chaitanya Kulkarni 提交于 11月 09, 2020

This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to
make consistent with NVME_IO_TIMEOUT.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

dc96f938

nvme: centralize setting the timeout in nvme_alloc_request · 0d2e7c84

由 Chaitanya Kulkarni 提交于 11月 09, 2020

The function nvme_alloc_request() is called from different context
(I/O and Admin queue) where callers do not consider the I/O timeout when
called from I/O queue context.

Update nvme_alloc_request() to set the default I/O and Admin timeout
value based on whether the queuedata is set or not.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0d2e7c84

nvme: simplify nvme_req_qid() · 84115d6d

由 Baolin Wang 提交于 10月 27, 2020

Use the request's '->mq_hctx->queue_num' directly to simplify the
nvme_req_qid() function.
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

84115d6d

16 11月, 2020 3 次提交

nvme: use set_capacity_and_notify in nvme_set_queue_dying · d17e66aa

由 Christoph Hellwig 提交于 11月 16, 2020

Use the block layer helper to update both the disk and block device
sizes.  Contrary to the name no notification is sent in this case,
as a size 0 is special cased.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d17e66aa

block: remove the update_bdev parameter to set_capacity_revalidate_and_notify · 449f4ec9

由 Christoph Hellwig 提交于 11月 16, 2020

The update_bdev argument is always set to true, so remove it.  Also
rename the function to the slighly less verbose set_capacity_and_notify,
as propagating the disk size to the block device isn't really
revalidation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NPetr Vorel <pvorel@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

449f4ec9

nvme: let set_capacity_revalidate_and_notify update the bdev size · 5dd55749

由 Christoph Hellwig 提交于 11月 16, 2020

There is no good reason to call revalidate_disk_size separately.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5dd55749

14 11月, 2020 3 次提交

nvme: fix memory leak freeing command effects · 8168d23f

由 Keith Busch 提交于 11月 13, 2020

xa_destroy() frees only internal data. The caller is responsible for
freeing the exteranl objects referenced by an xarray.

Fixes: 1cf7a12e ("nvme: use an xarray to lookup the Commands Supported and Effects log")
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8168d23f

nvme: directly cache command effects log · f6224b86

由 Keith Busch 提交于 11月 13, 2020

Remove the struct used for tracking known command effects logs in a
list. This is now saved in an xarray that doesn't use these elements.
Instead, store the log directly instead of the wrapper struct.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f6224b86

nvme: free sq/cq dbbuf pointers when dbbuf set fails · 0f0d2c87

由 Minwoo Im 提交于 11月 05, 2020

If Doorbell Buffer Config command fails even 'dev->dbbuf_dbs != NULL'
which means OACS indicates that NVME_CTRL_OACS_DBBUF_SUPP is set,
nvme_dbbuf_update_and_check_event() will check event even it's not been
successfully set.

This patch fixes mismatch among dbbuf for sq/cqs in case that dbbuf
command fails.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0f0d2c87

13 11月, 2020 1 次提交

nvme-rdma: Use ibdev_to_node instead of dereferencing ->dma_device · 22dd4c70

由 Christoph Hellwig 提交于 11月 06, 2020

->dma_device is a private implementation detail of the RDMA core.  Use the
ibdev_to_node helper to get the NUMA node for a ib_device instead of
poking into ->dma_device.

Link: https://lore.kernel.org/r/20201106181941.1878556-5-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

22dd4c70

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功