提交 · e7832cb48a654cd12b2bc9181b2f0ad49d526ac6 · openeuler / Kernel

30 8月, 2019 25 次提交

nvme: make fabrics command run on a separate request queue · e7832cb4

由 Sagi Grimberg 提交于 8月 02, 2019

We have a fundamental issue that fabric commands use the admin_q.
The reason is, that admin-connect, register reads and writes and
admin commands cannot be guaranteed ordering while we are running
controller resets.

For example, when we reset a controller we perform:
1. disable the controller
2. teardown the admin queue
3. re-establish the admin queue
4. enable the controller

In order to perform (3), we need to unquiesce the admin queue, however
we may have some admin commands that are already pending on the
quiesced admin_q and will immediate execute when we unquiesce it before
we execute (4). The host must not send admin commands to the controller
before enabling the controller.

To fix this, we have the fabric commands (admin connect and property
get/set, but not I/O queue connect) use a separate fabrics_q and make
sure to quiesce the admin_q before we disable the controller, and
unquiesce it only after we enable the controller.

This fixes the error prints from nvmet in a controller reset storm test:
kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0
Which indicate that the host is sending an admin command when the
controller is not enabled.
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

e7832cb4

nvme-pci: Support shared tags across queues for Apple 2018 controllers · d38e9f04

由 Benjamin Herrenschmidt 提交于 8月 07, 2019

Another issue with the Apple T2 based 2018 controllers seem to be
that they blow up (and shut the machine down) if there's a tag
collision between the IO queue and the Admin queue.

My suspicion is that they use our tags for their internal tracking
and don't mix them with the queue id. They also seem to not like
when tags go beyond the IO queue depth, ie 128 tags.

This adds a quirk that marks tags 0..31 of the IO queue reserved
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d38e9f04

nvme-pci: Add support for Apple 2018+ models · 66341331

由 Benjamin Herrenschmidt 提交于 8月 07, 2019

Based on reverse engineering and original patch by

Paul Pawlowski <paul@mrarm.io>

This adds support for Apple weird implementation of NVME in their
2018 or later machines. It accounts for the twice-as-big SQ entries
for the IO queues, and the fact that only interrupt vector 0 appears
to function properly.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

66341331

nvme-pci: Add support for variable IO SQ element size · c1e0cc7e

由 Benjamin Herrenschmidt 提交于 8月 07, 2019

The size of a submission queue element should always be 6 (64 bytes)
by spec.

However some controllers such as Apple's are not properly implementing
the standard and require a different size.

This provides the ground work for the subsequent quirks for these
controllers.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

c1e0cc7e

nvme-pci: Pass the queue to SQ_SIZE/CQ_SIZE macros · 8a1d09a6

由 Benjamin Herrenschmidt 提交于 8月 07, 2019

This will make it easier to handle variable queue entry sizes
later. No functional change.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

8a1d09a6

nvme: trace bio completion · 35fe0d12

由 Hannes Reinecke 提交于 7月 24, 2019

When native multipathing is enabled we cannot enable blktrace for
the underlying paths, so any completion is never traced.
Signed-off-by: NHannes Reinecke <hare@suse.com>
[fixed-up by Mikhail for non-multipath-build]
Signed-off-by: NMikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

35fe0d12

nvme-multipath: fix ana log nsid lookup when nsid is not found · e01f91df

由 Anton Eidelman 提交于 8月 16, 2019

ANA log parsing invokes nvme_update_ana_state() per ANA group desc.
This updates the state of namespaces with nsids in desc->nsids[].

Both ctrl->namespaces list and desc->nsids[] array are sorted by nsid.
Hence nvme_update_ana_state() performs a single walk over ctrl->namespaces:
- if current namespace matches the current desc->nsids[n],
  this namespace is updated, and n is incremented.
- the process stops when it encounters the end of either
  ctrl->namespaces end or desc->nsids[]

In case desc->nsids[n] does not match any of ctrl->namespaces,
the remaining nsids following desc->nsids[n] will not be updated.
Such situation was considered abnormal and generated WARN_ON_ONCE.

However ANA log MAY contain nsids not (yet) found in ctrl->namespaces.
For example, lets consider the following scenario:
- nvme0 exposes namespaces with nsids = [2, 3] to the host
- a new namespace nsid = 1 is added dynamically
- also, a ANA topology change is triggered
- NS_CHANGED aen is generated and triggers scan_work
- before scan_work discovers nsid=1 and creates a namespace, a NOTICE_ANA
  aen was issues and ana_work receives ANA log with nsids=[1, 2, 3]

Result: ana_work fails to update ANA state on existing namespaces [2, 3]

Solution:
Change the way nvme_update_ana_state() namespace list walk
checks the current namespace against desc->nsids[n] as follows:
a) ns->head->ns_id < desc->nsids[n]: keep walking ctrl->namespaces.
b) ns->head->ns_id == desc->nsids[n]: match, update the namespace
c) ns->head->ns_id >= desc->nsids[n]: skip to desc->nsids[n+1]

This enables correct operation in the scenario described above.
This also allows ANA log to contain nsids currently invisible
to the host, i.e. inactive nsids.
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

e01f91df

nvmet-tcp: Add TOS for tcp transport · 89275a96

由 Israel Rukshin 提交于 8月 18, 2019

Set the outgoing packets type of service (TOS) according to the
receiving TOS.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Suggested-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

89275a96

nvme-tcp: Add TOS for tcp transport · bb13985d

由 Israel Rukshin 提交于 8月 18, 2019

TOS provide clients the ability to segregate traffic flows for
different type of data.
One of the TOS usage is bandwidth management which allows setting bandwidth
limits for QoS classes, e.g. 80% bandwidth to controllers at QoS class A
and 20% to controllers at QoS class B.

usage examples:
nvme connect --tos=0 --transport=tcp --traddr=10.0.1.1 --nqn=test-nvme
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

bb13985d

nvme-tcp: Use struct nvme_ctrl directly · 9924b030

由 Israel Rukshin 提交于 8月 18, 2019

This patch doesn't change any functionality.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

9924b030

nvme-rdma: Add TOS for rdma transport · e63440d6

由 Israel Rukshin 提交于 8月 18, 2019

For RDMA transports, TOS is an extension of IB QoS to provide clients
the ability to segregate traffic flows for different type of data.
RDMA CM abstract it for ULPs using rdma_set_service_type().
Internally, each traffic flow is represented by a connection with all of
its independent resources like that of a normal connection, and is
differentiated by service type. In other words, there can be multiple qp
connections between an IP pair and each supports a unique service type.

One of the TOS usage is bandwidth management which allows setting bandwidth
limits for QoS classes, e.g. 80% bandwidth to controllers at QoS class A
and 20% to controllers at QoS class B.

Note: In addition to the TOS configuration, QOS must be configured on the
relevant HCA on the target (send RDMA commands) and initiator to effect
the traffic.

usage examples:
nvme connect --tos=0 --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

e63440d6

nvme-fabrics: Add type of service (TOS) configuration · 52b4451a

由 Israel Rukshin 提交于 8月 18, 2019

TOS is user-defined and needs to be configured via nvme-cli.
It must be set before initiating any traffic and once set the TOS
cannot be changed.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

52b4451a

nvmet-tcp: fix possible memory leak · 35d1a938

由 Sagi Grimberg 提交于 8月 02, 2019

when we uninit a command in error flow we also need to
free an iovec if it was allocated.
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

35d1a938

nvmet-tcp: fix possible NULL deref · b6272007

由 Sagi Grimberg 提交于 8月 02, 2019

We must only call sgl_free for sgl that we actually
allocated.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

b6272007

nvmet: trace: parse Get LBA Status command in detail · 42df26d4

由 Minwoo Im 提交于 8月 04, 2019

Four different fields are in CDWs of Get LBA Status command which means
it would be great if we can see in detail when tracing in target side
also.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

42df26d4

nvme: trace: parse Get LBA Status command in detail · 177b06ed

由 Minwoo Im 提交于 8月 04, 2019

Four different fields are in CDWs of Get LBA Status command which means
it would be great if we can see in detail when tracing.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

177b06ed

nvmet: fix data units read and written counters in SMART log · 3bec2e37

由 Tom Wu 提交于 8月 08, 2019

In nvme spec 1.3 there is a definition for data write/read counters
from SMART log, (See section 5.14.1.2):
	This value is reported in thousands (i.e., a value of 1
	corresponds to 1000 units of 512 bytes read) and is rounded up.

However, in nvme target where value is reported with actual units,
but not thousands of units as the spec requires.
Signed-off-by: NTom Wu <tomwu@mellanox.com>
Reviewed-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

3bec2e37

nvme-tcp: support simple polling · 1a9460ce

由 Sagi Grimberg 提交于 7月 03, 2019

Simple polling support via socket busy_poll interface.
Although we do not shutdown interrupts but simply hammer
the socket poll, we can sometimes find completions faster
than the normal interrupt driven RX path.

We add per queue nr_cqe counter that resets every time
RX path is invoked such that .poll callback can return it
to stay consistent with the semantics.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

1a9460ce

nvme: tcp: selects CRYPTO_CRC32C for nvme-tcp · 79fd751d

由 Minwoo Im 提交于 7月 14, 2019

The tcp host module is now taking those APIs from crypto ahash:
	(1) crypto_ahash_final()
	(2) crypto_ahash_digest()
	(3) crypto_alloc_ahash()

nvme-tcp should depends on CRYPTO_CRC32C.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

79fd751d

nvme: don't pass cap to nvme_disable_ctrl · b5b05048

由 Sagi Grimberg 提交于 7月 22, 2019

All seem to call it with ctrl->cap so no need to pass it
at all.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

b5b05048

nvme: move sqsize setting to the core · c0f2f45b

由 Sagi Grimberg 提交于 7月 22, 2019

nvme_enable_ctrl reads the cap register right after, so
no need to do that locally in the transport driver. Have
sqsize setting in nvme_init_identify.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

c0f2f45b

S
nvme-pci: set ctrl sqsize to the device q_depth · aa22c8e6
由 Sagi Grimberg 提交于 8月 22, 2019
```
Align with what the rest of the transports are doing.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
```
aa22c8e6

nvme: have nvme_init_identify set ctrl->cap · 4fba4458

由 Sagi Grimberg 提交于 7月 22, 2019

No need to use a stack cap variable.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

4fba4458

nvme-tcp: Use protocol specific operations while reading socket · 10407ec9

由 Potnuri Bharat Teja 提交于 7月 08, 2019

Using socket specific read_sock() calls instead of directly calling
tcp_read_sock() helps lld module registered handlers if any, to be called
from nvme-tcp host.
This patch therefore replaces the tcp_read_sock() with socket specific
prot_ops.
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

10407ec9

nvme-tcp: cleanup nvme_tcp_recv_pdu · 6be18260

由 Sagi Grimberg 提交于 7月 19, 2019

Can return directly in the switch statement
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

6be18260

28 8月, 2019 3 次提交

raid5 improve too many read errors msg by adding limits · 0009fad0

由 Nigel Croxon 提交于 8月 21, 2019

Often limits can be changed by admin. When discussing such things
it helps if you can provide "self-sustained" facts. Also
sometimes the admin thinks he changed a limit, but it did not
take effect for some reason or he changed the wrong thing.

V3: Only pr_warn when Faulty is 0.
V2: Add read_errors value to pr_warn.
Signed-off-by: NNigel Croxon <ncroxon@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

0009fad0

md: don't report active array_state until after revalidate_disk() completes. · 9d4b45d6

由 NeilBrown 提交于 8月 20, 2019

Until revalidate_disk() has completed, the size of a new md array will
appear to be zero.
So we shouldn't report, through array_state, that the array is active
until that time.
udev rules check array_state to see if the array is ready.  As soon as
it appear to be zero, fsck can be run.  If it find the size to be
zero, it will fail.

So add a new flag to provide an interlock between do_md_run() and
array_state_show().  This flag is set while do_md_run() is active and
it prevents array_state_show() from reporting that the array is
active.

Before do_md_run() is called, ->pers will be NULL so array is
definitely not active.
After do_md_run() is called, revalidate_disk() will have run and the
array will be completely ready.

We also move various sysfs_notify*() calls out of md_run() into
do_md_run() after MD_NOT_READY is cleared.  This ensure the
information is ready before the notification is sent.

Prior to v4.12, array_state_show() was called with the
mddev->reconfig_mutex held, which provided exclusion with do_md_run().

Note that MD_NOT_READY cleared twice.  This is deliberate to cover
both success and error paths with minimal noise.

Fixes: b7b17c9b ("md: remove mddev_lock() from md_attr_show()")
Cc: stable@vger.kernel.org (v4.12++)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

9d4b45d6

md: only call set_in_sync() when it is expected to succeed. · 480523fe

由 NeilBrown 提交于 8月 20, 2019

Since commit 4ad23a97 ("MD: use per-cpu counter for
writes_pending"), set_in_sync() is substantially more expensive: it
can wait for a full RCU grace period which can be 10s of milliseconds.

So we should only call it when the cost is justified.

md_check_recovery() currently calls set_in_sync() every time it finds
anything to do (on non-external active arrays).  For an array
performing resync or recovery, this will be quite often.
Each call will introduce a delay to the md thread, which can noticeable
affect IO submission latency.

In md_check_recovery() we only need to call set_in_sync() if
'safemode' was non-zero at entry, meaning that there has been not
recent IO.  So we save this "safemode was nonzero" state, and only
call set_in_sync() if it was non-zero.

This measurably reduces mean and maximum IO submission latency during
resync/recovery.
Reported-and-tested-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

480523fe

24 8月, 2019 1 次提交

null_blk: fix inline misuse · 38b4e09f

由 Jens Axboe 提交于 8月 23, 2019

You can't magically mark a function inline and expect that to work.

Fixes: fceb5d1b ("null_blk: create a helper for zoned devices")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38b4e09f

23 8月, 2019 6 次提交

null_blk: create a helper for req completion · a3d7d674

由 Chaitanya Kulkarni 提交于 8月 22, 2019

This patch creates a helper function for handling the request
completion in the null_handle_cmd().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a3d7d674

null_blk: create a helper for zoned devices · fceb5d1b

由 Chaitanya Kulkarni 提交于 8月 22, 2019

This patch creates a helper function for handling zoned block device
operations.

This patch also restructured the code for null_blk_zoned.c and uses the
pattern to return blk_status_t and catch the error in the function
null_handle_cmd() into cmd->error variable instead of setting it up in
the deeper layer just like the way it is done for flush, badblocks and
memory backed case in the null_handle_cmd(). We also move
null_handle_zoned() to the null_blk_zoned.c to keep the zoned code
separate.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fceb5d1b

null_blk: create a helper for mem-backed ops · 7ea88e22

由 Chaitanya Kulkarni 提交于 8月 22, 2019

This patch creates a helper for handling requests when null_blk is
memory backed in the null_handle_cmd(). Although the helper is very
simple right now, it makes the code flow consistent with the rest of
code in the null_handle_cmd() and provides a uniform code structure
for future code.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ea88e22

null_blk: create a helper for badblocks · 8f94d1c1

由 Chaitanya Kulkarni 提交于 8月 22, 2019

This patch creates a helper for handling badblocks code in the
null_handle_cmd().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f94d1c1

null_blk: create a helper for throttling · adb84284

由 Chaitanya Kulkarni 提交于 8月 22, 2019

This patch creates a helper for handling throttling code in the
null_handle_cmd().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

adb84284

null_blk: move duplicate code to callers · d4b186ed

由 Chaitanya Kulkarni 提交于 8月 22, 2019

This is a preparation patch which moves the duplicate code for sectors
and nr_sectors calculations for bio vs request mode into their
respective callers (null_queue_bio(), null_qeueue_req()). Now the core
function only deals with the respective actions and commands instead of
having to calculte the bio vs req operations and different sector
related variables. We also move the flush command handling at the top
which significantly simplifies the rest of the code.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4b186ed

21 8月, 2019 5 次提交

nbd: fix max number of supported devs · e9e006f5

由 Mike Christie 提交于 8月 04, 2019

This fixes a bug added in 4.10 with commit:

commit 9561a7ad
Author: Josef Bacik <jbacik@fb.com>
Date:   Tue Nov 22 14:04:40 2016 -0500

    nbd: add multi-connection support

that limited the number of devices to 256. Before the patch we could
create 1000s of devices, but the patch switched us from using our
own thread to using a work queue which has a default limit of 256
active works.

The problem is that our recv_work function sits in a loop until
disconnection but only handles IO for one connection. The work is
started when the connection is started/restarted, but if we end up
creating 257 or more connections, the queue_work call just queues
connection257+'s recv_work and that waits for connection 1 - 256's
recv_work to be disconnected and that work instance completing.

Instead of reverting back to kthreads, this has us allocate a
workqueue_struct per device, so we can block in the work.

Cc: stable@vger.kernel.org
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9e006f5

nbd: fix zero cmd timeout handling v2 · 2da22da5

由 Mike Christie 提交于 8月 13, 2019

This fixes a regression added in 4.9 with commit:

commit 0eadf37a
Author: Josef Bacik <jbacik@fb.com>
Date:   Thu Sep 8 12:33:40 2016 -0700

    nbd: allow block mq to deal with timeouts

where before the patch userspace would set the timeout to 0 to disable
it. With the above patch, a zero timeout tells the block layer to use
the default value of 30 seconds. For setups where commands can take a
long time or experience transient issues like network disruptions this
then results in IO errors being sent to the application.

To fix this, the patch still uses the common block layer timeout
framework, but if zero is set, nbd just logs a message and then resets
the timer when it expires.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2da22da5

nbd: add missing config put · 887e975c

由 Mike Christie 提交于 8月 13, 2019

Fix bug added with the patch:

commit 8f3ea359
Author: Josef Bacik <josef@toxicpanda.com>
Date:   Mon Jul 16 12:11:35 2018 -0400

    nbd: handle unexpected replies better

where if the timeout handler runs when the completion path is and we fail
to grab the mutex in the timeout handler we will leave a config reference
and cannot free the config later.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

887e975c

nbd: add function to convert blk req op to nbd cmd · 00514677

由 Mike Christie 提交于 8月 13, 2019

This adds a helper function to convert a block req op to a nbd cmd type.
It will be used in the last patch to log the type in the timeout
handler.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00514677

nbd: add set cmd timeout helper · 55313e92

由 Mike Christie 提交于 8月 13, 2019

Add a helper to set the cmd timeout. It does not really do a lot now,
but will be more useful in the next patches.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

55313e92

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功