提交 · b21ffd2ee27a1dc25b506715722d21593e6b8bd7 · openanolis / cloud-kernel

02 9月, 2020 31 次提交

alinux: nvme: pci: Fix the incorrect ways to calculate the request size · b21ffd2e

由 Baolin Wang 提交于 8月 31, 2020

fix #29375191

For NVMe discard request, it will use special_vec to describe the size
of the request, thus it will get an incorrect request size with
blk_rq_bytes() when handling the NVMe discard request.

Thus we should use blk_rq_payload_bytes() to calculate the data transfer
size which can fix this issue.

Fixes: 220741e8c12d ("alios: nvme-pci: Improve mapping single segment requests using SGLs")
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

b21ffd2e

alinux: nvme: pci: Use bio->bi_vcnt directly · 2f68309b

由 Baolin Wang 提交于 7月 31, 2020

fix #29327388

Just use bio->bi_vcnt directly to validate if only one bvec in
a bio for PRP mode, which can remove warnings for dm device.
No functional changes.

Fixes: c8b92b847512 ("alios: nvme-pci: Improve mapping single segment requests using PRP")
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

2f68309b

alinux: nvme-pci: hold cq_lock while completing CQEs · ba34628b

由 Xiaoguang Wang 提交于 7月 28, 2020

fix #29535320

In __nvme_poll(), nvme_complete_cqes() should also been protected
by nvmeq->cq_lock.

Fixes: 0d326c85dba5 ("nvme: provide optimized poll function for separate poll queues")
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ba34628b

alinux: nvme-pci: Improve mapping single segment requests using SGLs · add0d5e4

由 Baolin Wang 提交于 7月 15, 2020

fix #29327388

Now the blk-mq did not support multi-page bvec, which means each bvec
can only contain one page. Though the physical segment is 1 in one bio,
the bio still can contains multiple bvecs which are physically contiguous,
so we can not use one bvec length to map the request, instead we should
use the full length of the request to mapping the request, when the
physical segment is 1 in a request.

In future if we support multi-page bvecs, this patch can be dropped.
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

add0d5e4

alinux: nvme-pci: Improve mapping single segment requests using PRP · 79a76a7b

由 Baolin Wang 提交于 7月 15, 2020

fix #29327388

Now the blk-mq did not support multi-page bvec, which means each bvec
can only contain one page. For simple PRP, it just support one bvec
in the bio though the physical segment is only 1, otherwise it can
not map the whole request.

In future if we support multi-page bvecs, this patch can be dropped.
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

79a76a7b

nvme-pci: fix psdt field for single segment sgls · c201c72d

由 Klaus Birkelund Jensen 提交于 4月 30, 2019

fix #29327388

commit 049bf37262c61c99f45438910711b55054b24838 upstream

The shortcut for single segment SGL requests did not set the PSDT field
to mark the request as using SGLs.

Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: NKlaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c201c72d

nvme-pci: Set the prp2 correctly when using more than 4k page · 92d3a23a

由 Kevin Hao 提交于 10月 18, 2019

fix #29327388

commit a4f40484e7f1dff56bb9f286cc59ffa36e0259eb upstream

In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.

Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

92d3a23a

nvme-pci: tidy up nvme_map_data · f19a739a

由 Christoph Hellwig 提交于 3月 05, 2019

fix #29327388

commit 70479b71bc80ae6f63c8d6644cc76dff99f79686 upstream

Remove two pointless local variables, remove ret assignment that is
never used, move the use_sgl initialization closer to where it is used.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f19a739a

nvme-pci: optimize mapping single segment requests using SGLs · d1348f8f

由 Christoph Hellwig 提交于 3月 05, 2019

fix #29327388

commit 297910571f08f1d7e398793df6e606ebb375a3f1 upstream

If the controller supports SGLs we can take another short cut for single
segment request, given that we can always map those without another
indirection structure, and thus don't need to create a scatterlist
structure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d1348f8f

nvme-pci: optimize mapping of small single segment requests · d3cd5ddb

由 Christoph Hellwig 提交于 3月 05, 2019

fix #29327388

commit dff824b2aadb7808f50ceb0927acaec5ad750ce7 upstream

If a request is single segment and fits into one or two PRP entries we
do not have to create a scatterlist for it, but can just map the bio_vec
directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d3cd5ddb

nvme-pci: remove the inline scatterlist optimization · 2c30abde

由 Christoph Hellwig 提交于 3月 05, 2019

fix #29327388

commit d43f1ccfad053dbefba1d15443cdc36ca60958f0 upstream

We'll have a better way to optimize for small I/O that doesn't
require it soon, so remove the existing inline_sg case to make that
optimization easier to implement.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

2c30abde

nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data · 25fa89d4

由 Christoph Hellwig 提交于 3月 03, 2019

fix #29327388

commit 4aedb705437f6f98b45f45c394e6803ca67abd33 upstream

This prepares for some bigger changes to the data mapping helpers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

25fa89d4

nvme-pci: do not build a scatterlist to map metadata · 4fbeeb5c

由 Christoph Hellwig 提交于 3月 03, 2019

fix #29327388

commit 783b94bd9250478154904fa782d2cfc46336cdf6 upstream

We always have exactly one segment, so we can simply call dma_map_bvec.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4fbeeb5c

nvme-pci: only call nvme_unmap_data for requests transferring data · 1a1efe1d

由 Christoph Hellwig 提交于 3月 03, 2019

fix #29327388

commit b15c592de37ed9d71499a3b8a750d1b235fcba3d upstream

This mirrors how nvme_map_pci is called and will allow simplifying some
checks in nvme_unmap_pci later on.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1a1efe1d

nvme-pci: merge nvme_free_iod into nvme_unmap_data · 6eebc7c9

由 Christoph Hellwig 提交于 3月 03, 2019

fix #29327388

commit 7fe07d14f71fabef642a478626248a9121e95b7b upstream

This means we now have a function that undoes everything nvme_map_data
does and we can simplify the error handling a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6eebc7c9

nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data · 51d93c1d

由 Christoph Hellwig 提交于 3月 03, 2019

fix #29327388

commit 915f04c93db4e3a7388c8ad8ddfc28830e4cbce3 upstream

Cleaning up the command setup isn't related to unmapping data, and
disentangling them will simplify error handling a bit down the road.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

51d93c1d

nvme-pci: remove nvme_init_iod · 57835706

由 Christoph Hellwig 提交于 3月 03, 2019

fix #29327388

commit 9b048119a153590b934ef49aae309b723587f527 upstream

nvme_init_iod should really be split into two parts: initialize a few
general iod fields, which can easily be done at the beginning of
nvme_queue_rq, and allocating the scatterlist if needed, which logically
belongs into nvme_map_data with the code making use of it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

57835706

nvme-pci: remove unused nvme_iod member · d10e2314

由 Keith Busch 提交于 3月 08, 2019

fix #29327388

commit 39f8e36401142d73e33a954ac4bdf844fb5de9ae upstream
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d10e2314

nvme-pci: Hold cq_poll_lock while completing CQEs · 26f3eab7

由 Bijan Mottahedeh 提交于 2月 26, 2020

to #28991349

commit 9515743bfb39c61aaf3d4f3219a645c8d1fe9a0e upstream

Completions need to consumed in the same order the controller submitted
them, otherwise future completion entries may overwrite ones we haven't
handled yet. Hold the nvme queue's poll lock while completing new CQEs to
prevent another thread from freeing command tags for reuse out-of-order.

Fixes: dabcefab45d3 ("nvme: provide optimized poll function for separate poll queues")
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

26f3eab7

nvme-pci: don't share queue maps · 04f11e89

由 Christoph Hellwig 提交于 12月 17, 2018

to #28991349

commit e5edd5f298fafda28284bafb8371e6f0b7681035 upstream

Now that the block layer checks if a queue map has any queues inside
it there is no more reason to duplicate the maps for the non-default
types.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

04f11e89

nvme-pci: fix nvme_setup_irqs() · 22c50895

由 Ming Lei 提交于 1月 03, 2019

to #28991349

commit c45b1fa2433c65e44bdf48f513cb37289f3116b9 upstream

When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(),
we still try to allocate multiple irq vectors again, so irq queues
covers the admin queue actually. But we don't consider that, then
number of the allocated irq vector may be same with sum of
io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way
is obviously wrong, and finally breaks nvme_pci_map_queues(), and
warning from pci_irq_get_affinity() is triggered.

IRQ queues should cover admin queues, this patch makes this
point explicitely in nvme_calc_io_queues().

We got severl boot failure internal report on aarch64, so please
consider to fix it in v4.20.

Fixes: 6451fe73fa0f ("nvme: fix irq vs io_queue calculations")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Tested-by: Nfin4478 <fin4478@hotmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

22c50895

nvme: fix irq vs io_queue calculations · 1c5ffc03

由 Jens Axboe 提交于 12月 09, 2018

to #28991349

commit 6451fe73fa0f542a49bfacd7205b88a597897f58 upstream

Guenter reported an boot hang issue on HPPA after we default to 0 poll
queues. We have two issues in the queue count calculations:

1) We don't separate the poll queues from the read/write queues. This is
   important, since the former doesn't need interrupts.
2) The adjust logic is broken.

Adjust the poll queue count before doing nvme_calc_io_queues(). The poll
queue count is only limited by the IO queue count we were able to get
from the controller, not failures in the IRQ allocation loop. This
leaves nvme_calc_io_queues() just adjusting the read/write queue map.
Reported-by: NReported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1c5ffc03

block: move queues types to the block layer · cd87b6e5

由 Christoph Hellwig 提交于 12月 02, 2018

to #28991349

commit e20ba6e1da029136ded295f33076483d65ddf50a upstream

Having another indirect all in the fast path doesn't really help
in our post-spectre world.  Also having too many queue type is just
going to create confusion, so I'd rather manage them centrally.

Note that the queue type naming and ordering changes a bit - the
first index now is the default queue for everything not explicitly
marked, the optional ones are read and poll queues.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

cd87b6e5

nvme: provide optimized poll function for separate poll queues · 0b4694bc

由 Jens Axboe 提交于 11月 14, 2018

to #28991349

commit dabcefab45d36ecb5a22f16577bb0f298876a22d upstream

If we have separate poll queues, we know that they aren't using
interrupts. Hence we don't need to disable interrupts around
finding completions.

Provide a separate set of blk_mq_ops for such devices.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0b4694bc

nvme: default to 0 poll queues · 05ee2b9b

由 Jens Axboe 提交于 11月 19, 2018

to #28991349

commit a4668d9ba4be1ca9f4a39798ba3419fdfef0750d upstream

We need a better way of configuring this, and given that polling is
(still) a bit niche, let's default to using 0 poll queues. That way
we'll have the same read/write/poll behavior as 4.20, and users that
want to test/use polling are required to do manual configuration of the
number of poll queues.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

05ee2b9b

nvme: fix handling of EINVAL on pci_alloc_irq_vectors_affinity() · b35f58df

由 Jens Axboe 提交于 11月 15, 2018

to #28991349

commit db29eb059cdc571f9d75cec4a41b9884b3b8286a upstream

At least on SPARC, if MSI/MSI-X isn't supported, we get EINVAL if
we ask for more than one vector. This isn't covered by our ENOSPC
check.

If we get EINVAL, decrease our ask to just one vector, instead of
bailing out in error.

Fixes: 3b6592f70ad7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b35f58df

nvme: fix boot hang with only being able to get one IRQ vector · e84fc182

由 Jens Axboe 提交于 11月 14, 2018

to #28991349

commit 30e066286e232772cad72c87008a958e23e40a33 upstream

NVMe always asks for io_queues + 1 worth of IRQ vectors, which
means that even when we scale all the way down, we still ask
for 2 vectors and get -ENOSPC in return if the system can't
support more than 1.

Getting just 1 vector is fine, it just means that we'll have
1 IO queue and 1 admin queue, with a shared vector between them.
Check for this case and don't add our + 1 if it happens.

Fixes: 3b6592f70ad7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

e84fc182

nvme: add separate poll queue map · b244a53e

由 Jens Axboe 提交于 11月 05, 2018

to #28991349

commit 4b04cc6a8f86c4842314def22332de1f15de8523 upstream

Adds support for defining a variable number of poll queues, currently
configurable with the 'poll_queues' module parameter. Defaults to
a single poll queue.

And now we finally have poll support without triggering interrupts!
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b244a53e

nvme-pci: check kstrtoint() return value in queue_count_set() · dcb5f45f

由 Bart Van Assche 提交于 2月 14, 2019

to #28991349

commit e895fedf12dc0663a925b54eb0961fc927208097 upstream

This patch avoids that the compiler complains about 'ret' being set
but not being used when building with W=1.

Fixes: 3b6592f70ad7 ("nvme: utilize two queue maps, one for reads and one for writes") # v5.0-rc1
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

dcb5f45f

nvme: utilize two queue maps, one for reads and one for writes · 850bbb5a

由 Jens Axboe 提交于 10月 31, 2018

to #28991349

commit 3b6592f70ad7b4c24dd3eb2ac9bbe3353d02c992 upstream

NVMe does round-robin between queues by default, which means that
sharing a queue map for both reads and writes can be problematic
in terms of read servicing. It's much easier to flood the queue
with writes and reduce the read servicing.

Implement two queue maps, one for reads and one for writes. The
write queue count is configurable through the 'write_queues'
parameter.

By default, we retain the previous behavior of having a single
queue set, shared between reads and writes. Setting 'write_queues'
to a non-zero value will create two queue sets, one for reads and
one for writes, the latter using the configurable number of
queues (hardware queue counts permitting).
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

850bbb5a

blk-mq: abstract out queue map · f63859ea

由 Jens Axboe 提交于 10月 29, 2018

to #28991349

commit ed76e329d74a4b15ac0f5fd3adbd52ec0178a134 upstream

This is in preparation for allowing multiple sets of maps per
queue, if so desired.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f63859ea

29 6月, 2020 1 次提交

nvme: implement mq_ops->commit_rqs() hook · a2724bab

由 Jens Axboe 提交于 11月 29, 2018

fix #28871358

commit 04f3eafda6e05adc56afed4d3ae6e24aaa429058 upstream

Split the command submission and the SQ doorbell ring, and add the
doorbell ring as our ->commit_rqs() hook. This allows a list of
requests to be issued, with nvme only writing the SQ update when
it's necessary. This is more efficient if we have lists of requests
to issue, particularly on virtualized hardware, where writing the
SQ doorbell is more expensive than on real hardware. For those cases,
performance increases of 2-3x have been observed.

The use case for this is plugged IO, where blk-mq flushes a batch of
requests at the time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a2724bab

15 6月, 2020 1 次提交

nvme: retain split access workaround for capability reads · c2fe0cbc

由 Ard Biesheuvel 提交于 10月 03, 2019

task #28557808

[ Upstream commit 3a8ecc935efabdad106b5e06d07b150c394b4465 ]

Commit 7fd8930f

  "nvme: add a common helper to read Identify Controller data"

has re-introduced an issue that we have attempted to work around in the
past, in commit a310acd7 ("NVMe: use split lo_hi_{read,write}q").

The problem is that some PCIe NVMe controllers do not implement 64-bit
outbound accesses correctly, which is why the commit above switched
to using lo_hi_[read|write]q for all 64-bit BAR accesses occuring in
the code.

In the mean time, the NVMe subsystem has been refactored, and now calls
into the PCIe support layer for NVMe via a .reg_read64() method, which
fails to use lo_hi_readq(), and thus reintroduces the problem that the
workaround above aimed to address.

Given that, at the moment, .reg_read64() is only used to read the
capability register [which is known to tolerate split reads], let's
switch .reg_read64() to lo_hi_readq() as well.

This fixes a boot issue on some ARM boxes with NVMe behind a Synopsys
DesignWare PCIe host controller.

Fixes: 7fd8930f ("nvme: add a common helper to read Identify Controller data")
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c2fe0cbc

17 1月, 2020 1 次提交

blk-mq: when polling for IO, look for any completion · d25f577c

由 Jens Axboe 提交于 11月 26, 2018

commit 1052b8ac5282daf35df331edcbdb645839d17e6a upstream.

If we want to support async IO polling, then we have to allow finding
completions that aren't just for the one we are looking for. Always pass
in -1 to the mq_ops->poll() helper, and have that return how many events
were found in this poll loop.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

d25f577c

15 1月, 2020 1 次提交

alinux: nvme-pci: Disable dicard zero-out functionality on Intel's P3600 NVMe disk drive · 2cde0dfb

由 Wenwei Tao 提交于 12月 13, 2017

We found huge performance lost on below particular Intel's disk drive
when discard zeroout functionality is enabled on it. The issue was
found when we have ext4 filesystem mounted on the disk drive and
started regular FIO testing. With it disabled, we don't observe
performance lost any more.

81:00.0 Non-Volatile memory controller: Intel Corporation \
             PCIe Data Center SSD (rev 01)

This imposes to disable the discard zero-out functionality on above
disk drive in order to regain the high performance that NVMe disk
driver supposes to provide.

Differential Revision: https://aone.alibaba-inc.com/code/D377540Signed-off-by: NWenwei Tao <wenwei.tao@linux.alibaba.com>
Reviewed-by: NGavin Shan <shan.gavin@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

2cde0dfb

01 12月, 2019 2 次提交

nvme-pci: fix conflicting p2p resource adds · fa7f1bce

由 Keith Busch 提交于 10月 31, 2018

[ Upstream commit 9fe5c59ff6a1e5e26a39b75489a1420e7eaaf0b1 ]

The nvme pci driver had been adding its CMB resource to the P2P DMA
subsystem everytime on on a controller reset. This results in the
following warning:

    ------------[ cut here ]------------
    nvme 0000:00:03.0: Conflicting mapping in same section
    WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
    ...
    Call Trace:
     pci_p2pdma_add_resource+0x153/0x370
     nvme_reset_work+0x28c/0x17b1 [nvme]
     ? add_timer+0x107/0x1e0
     ? dequeue_entity+0x81/0x660
     ? dequeue_entity+0x3b0/0x660
     ? pick_next_task_fair+0xaf/0x610
     ? __switch_to+0xbc/0x410
     process_one_work+0x1cf/0x350
     worker_thread+0x215/0x3d0
     ? process_one_work+0x350/0x350
     kthread+0x107/0x120
     ? kthread_park+0x80/0x80
     ret_from_fork+0x1f/0x30
    ---[ end trace f7ea76ac6ee72727 ]---
    nvme nvme0: failed to register the CMB

This patch fixes this by registering the CMB with P2P only once.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fa7f1bce

nvme-pci: fix hot removal during error handling · 305c262f

由 Keith Busch 提交于 10月 15, 2018

[ Upstream commit cb4bfda62afa25b4eee3d635d33fccdd9485dd7c ]

A removal waits for the reset_work to complete. If a surprise removal
occurs around the same time as an error triggered controller reset, and
reset work happened to dispatch a command to the removed controller, the
command won't be recovered since the timeout work doesn't do anything
during error recovery. We wouldn't want to wait for timeout handling
anyway, so this patch fixes this by disabling the controller and killing
admin queues prior to syncing with the reset_work.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

305c262f

06 9月, 2019 1 次提交

nvme-pci: Fix async probe remove race · 4a982919

由 Keith Busch 提交于 7月 29, 2019

[ Upstream commit bd46a90634302bfe791e93ad5496f98f165f7ae0 ]

Ensure the controller is not in the NEW state when nvme_probe() exits.
This will always allow a subsequent nvme_remove() to set the state to
DELETING, fixing a potential race between the initial asynchronous probe
and device removal.
Reported-by: NLi Zhong <lizhongfs@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4a982919

26 7月, 2019 2 次提交

nvme-pci: set the errno on ctrl state change error · 762bba1b

由 Chaitanya Kulkarni 提交于 6月 08, 2019

[ Upstream commit e71afda49335620e3d9adf56015676db33a3bd86 ]

This patch removes the confusing assignment of the variable result at
the time of declaration and sets the value in error cases next to the
places where the actual error is happening.

Here we also set the result value to -ENODEV when we fail at the final
ctrl state transition in nvme_reset_work(). Without this assignment
result will hold 0 from nvme_setup_io_queue() and on failure 0 will be
passed to he nvme_remove_dead_ctrl() from final state transition.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

762bba1b

nvme-pci: properly report state change failure in nvme_reset_work · c876a665

由 Minwoo Im 提交于 6月 09, 2019

[ Upstream commit cee6c269b016ba89c62e34d6bccb103ee2c7de4f ]

If the state change to NVME_CTRL_CONNECTING fails, the dmesg is going to
be like:

  [  293.689160] nvme nvme0: failed to mark controller CONNECTING
  [  293.689160] nvme nvme0: Removing after probe failure status: 0

Even it prints the first line to indicate the situation, the second line
is not proper because the status is 0 which means normally success of
the previous operation.

This patch makes it indicate the proper error value when it fails.
  [   25.932367] nvme nvme0: failed to mark controller CONNECTING
  [   25.932369] nvme nvme0: Removing after probe failure status: -16

This situation is able to be easily reproduced by:
  root@target:~# rmmod nvme && modprobe nvme && rmmod nvme
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c876a665

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功