提交 · 82654b6b8ef8b93ee87a97fc562f87f081fc2f91 · openanolis / cloud-kernel

07 6月, 2017 1 次提交

nvme: fix hang in remove path · 82654b6b

由 Ming Lei 提交于 6月 02, 2017

We need to start admin queues too in nvme_kill_queues()
for avoiding hang in remove path[1].

This patch is very similar with 806f026f(nvme: use
blk_mq_start_hw_queues() in nvme_kill_queues()).

[1] hang stack trace
[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Fixes: c5552fde("nvme: Enable autonomous power state transitions")
Reported-by: NRakesh Pandit <rakesh@tuxera.com>
Tested-by: NRakesh Pandit <rakesh@tuxera.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

82654b6b

26 5月, 2017 2 次提交

nvme: only setup block integrity if supported by the driver · c81bfba9

由 Christoph Hellwig 提交于 5月 20, 2017

Currently only the PCIe driver supports metadata, so we should not claim
integrity support for the other drivers. This prevents nasty crashes
with targets that advertise metadata support on fabrics.

Also use the opportunity to factor out some code into a separate helper
that isn't even compiled if CONFIG_BLK_DEV_INTEGRITY is disabled.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

c81bfba9

nvme: replace is_flags field in nvme_ctrl_ops with a flags field · d3d5b87d

由 Christoph Hellwig 提交于 5月 20, 2017

So that we can have more flags for transport-specific behavior.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

d3d5b87d

23 5月, 2017 2 次提交

nvme: avoid to use blk_mq_abort_requeue_list() · 986f75c8

由 Ming Lei 提交于 5月 22, 2017

NVMe may add request into requeue list simply and not kick off the
requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
is called in both nvme_kill_queues() and nvme_ns_remove() for
dealing with this issue.

Unfortunately blk_mq_abort_requeue_list() is absolutely a
race maker, for example, one request may be requeued during
the aborting. So this patch just calls blk_mq_kick_requeue_list() in
nvme_kill_queues() to handle this issue like what nvme_start_queues()
does. Now all requests in requeue list when queues are stopped will be
handled by blk_mq_kick_requeue_list() when queues are restarted, either
in nvme_start_queues() or in nvme_kill_queues().

Cc: stable@vger.kernel.org
Reported-by: NZhang Yi <yizhan@redhat.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

986f75c8

nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() · 806f026f

由 Ming Lei 提交于 5月 22, 2017

Inside nvme_kill_queues(), we have to start hw queues for
draining requests in sw queues, .dispatch list and requeue list,
so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
which only run queues if queues are stopped, but the queues may have
been started already, for example nvme_start_queues() is called in reset work
function.

blk_mq_start_hw_queues() run hw queues in current context, instead
of running asynchronously like before. Given nvme_kill_queues() is
run from either remove context or reset worker context, both are fine
to run hw queue directly. And the mutex of namespaces_mutex isn't a
problem too becasue nvme_start_freeze() runs hw queue in this way
already.

Cc: stable@vger.kernel.org
Reported-by: NZhang Yi <yizhan@redhat.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

806f026f

25 4月, 2017 3 次提交

nvme: Add nvme_core.force_apst to ignore the NO_APST quirk · c35e30b4

由 Andy Lutomirski 提交于 4月 21, 2017

We're probably going to be stuck quirking APST off on an over-broad
range of devices for 4.11.  Let's make it easy to override the quirk
for testing.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

c35e30b4

nvme: Display raw APST configuration via DYNAMIC_DEBUG · fb0dc399

由 Andy Lutomirski 提交于 4月 21, 2017

Debugging APST is currently a bit of a pain.  This gives optional
simple log messages that describe the APST state.

The easiest way to use this is probably with the nvme_core.dyndbg=+p
module parameter.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fb0dc399

nvme: Fix APST comment · 76e4ad09

由 Andy Lutomirski 提交于 4月 21, 2017

There was a typo in the description of the timeout heuristic.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

76e4ad09

21 4月, 2017 6 次提交

nvme: let dm-mpath distinguish nvme error codes · e02ab023

由 Junxiong Guan 提交于 4月 21, 2017

Currently most IOs which return the nvme error codes are retried on
the other path if those IOs returns EIO from NVMe driver. This
patch let Multipath distinguish nvme media error codes and some
generic or cmd-specific nvme error codes so that multipath will
not retry those kinds of IO, to save bandwidth.
Signed-off-by: NJunxiong Guan <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e02ab023

nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA" · be56945c

由 Andy Lutomirski 提交于 4月 20, 2017

There's a report that it malfunctions with APST on.

See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184

Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

be56945c

nvme: Adjust the Samsung APST quirk · ff5350a8

由 Andy Lutomirski 提交于 4月 20, 2017

I got a couple more reports: the Samsung APST issues appears to
affect multiple 950-series devices in Dell XPS 15 9550 and Precision
5510 laptops.  Change the quirk: rather than blacklisting the
firmware on the first problematic SSD that was reported, disable
APST on all 144d:a802 devices if they're installed in the two
affected Dell models.  While we're at it, disable only the deepest
sleep state instead of all of them -- the reporters say that this is
sufficient to fix the problem.

(I have a device that appears to be entirely identical to one of the
affected devices, but I have a different Dell laptop, so it's not
the case that all Samsung devices with firmware BXW75D0Q are broken
under all circumstances.)

Samsung engineers have an affected system, and hopefully they'll
give us a better workaround some time soon.  In the mean time, this
should minimize regressions.

See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184

Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

ff5350a8

blk-mq: remove the error argument to blk_mq_complete_request · 08e0029a

由 Christoph Hellwig 提交于 4月 20, 2017

Now that all drivers that call blk_mq_complete_requests have a
->complete callback we can remove the direct call to blk_mq_end_request,
as well as the error argument to blk_mq_complete_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

08e0029a

nvme: make nvme_error_status private · 65ba6b54

由 Christoph Hellwig 提交于 4月 20, 2017

Currently it's used by the lighnvm passthrough ioctl, but we'd like to make
it private in preparation of block layer specific error code.  Lighnvm already
returns the real NVMe status anyway, so I think we can just limit it to
returning -EIO for any status set.

This will need a careful audit from the lightnvm folks, though.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

65ba6b54

nvme: split nvme status from block req->errors · 27fa9bc5

由 Christoph Hellwig 提交于 4月 20, 2017

We want our own clearly defined error field for NVMe passthrough commands,
and the request errors field is going away in its current form.

Just store the status and result field in the nvme_request field from
hardirq completion context (using a new helper) and then generate a
Linux errno for the block layer only when we actually need it.

Because we can't overload the status value with a negative error code
for cancelled command we now have a flags filed in struct nvme_request
that contains a bit for this condition.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

27fa9bc5

09 4月, 2017 1 次提交

nvme: implement REQ_OP_WRITE_ZEROES · e850fd16

由 Christoph Hellwig 提交于 4月 05, 2017

But now for the real NVMe Write Zeroes yet, just to get rid of the
discard abuse for zeroing.  Also rename the quirk flag to be a bit
more self-explanatory.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e850fd16

06 4月, 2017 4 次提交

nvme: move the retries count to struct nvme_request · 44e44b29

由 Christoph Hellwig 提交于 4月 05, 2017

The way NVMe uses this field is entirely different from the older
SCSI/BLOCK_PC usage, so move it into struct nvme_request.

Also reduce the size of the file to a unsigned char so that we leave
space for additional smaller fields that will appear soon.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

44e44b29

nvme: mark nvme_max_retries static · 83f3aeb3

由 Christoph Hellwig 提交于 4月 05, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

83f3aeb3

nvme: cleanup nvme_req_needs_retry · f6324b1b

由 Christoph Hellwig 提交于 4月 05, 2017

Don't pass the status explicitly but derive it from the requeust,
and unwind the complex condition to be more readable.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

f6324b1b

nvme: move ->retries setup to nvme_setup_cmd · 987f699a

由 Christoph Hellwig 提交于 4月 05, 2017

->retries is counting the number of times a command is resubmitted, and
be cleared on the first time we see the command.  We currently don't do
that for non-PCIe command, which is easily fixed by moving the setup
to common code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

987f699a

04 4月, 2017 1 次提交

nvme: factor request completion code into a common helper · 77f02a7a

由 Christoph Hellwig 提交于 3月 30, 2017

This avoids duplicating the logic four times, and it also allows to keep
some helpers static in core.c or just opencode them.

Note that this loses printing the aborted status on completions in the
PCI driver as that uses a data structure not available any more.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

77f02a7a

02 4月, 2017 1 次提交

nvme: add missing byte swap in nvme_setup_discard · f1dd03a8

由 Christoph Hellwig 提交于 3月 31, 2017

Fixes: b35ba01e ("nvme: support ranged discard requests")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

f1dd03a8

29 3月, 2017 1 次提交

block: rename blk_mq_freeze_queue_start() · 1671d522

由 Ming Lei 提交于 3月 27, 2017

As the .q_usage_counter is used by both legacy and
mq path, we need to block new I/O if queue becomes
dead in blk_queue_enter().

So rename it and we can use this function in both
paths.
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

1671d522

02 3月, 2017 1 次提交

nvme: Complete all stuck requests · 302ad8cc

由 Keith Busch 提交于 3月 01, 2017

If the nvme driver is shutting down its controller, the drievr will not
start the queues up again, preventing blk-mq's hot CPU notifier from
making forward progress.

To fix that, this patch starts a request_queue freeze when the driver
resets a controller so no new requests may enter. The driver will wait
for frozen after IO queues are restarted to ensure the queue reference
can be reinitialized when nvme requests to unfreeze the queues.

If the driver is doing a safe shutdown, the driver will wait for the
controller to successfully complete all inflight requests so that we
don't unnecessarily fail them. Once the controller has been disabled,
the queues will be restarted to force remaining entered requests to end
in failure so that blk-mq's hot cpu notifier may progress.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

302ad8cc

23 2月, 2017 6 次提交

nvme/core: Fix race kicking freed request_queue · f33447b9

由 Keith Busch 提交于 2月 10, 2017

If a namespace has already been marked dead, we don't want to kick the
request_queue again since we may have just freed it from another thread.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

f33447b9

nvme: Enable autonomous power state transitions · c5552fde

由 Andy Lutomirski 提交于 2月 07, 2017

NVMe devices can advertise multiple power states.  These states can
be either "operational" (the device is fully functional but possibly
slow) or "non-operational" (the device is asleep until woken up).
Some devices can automatically enter a non-operational state when
idle for a specified amount of time and then automatically wake back
up when needed.

The hardware configuration is a table.  For each state, an entry in
the table indicates the next deeper non-operational state, if any,
to autonomously transition to and the idle time required before
transitioning.

This patch teaches the driver to program APST so that each successive
non-operational state will be entered after an idle time equal to 100%
of the total latency (entry plus exit) associated with that state.
The maximum acceptable latency is controlled using dev_pm_qos
(e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
states with total latency greater than this value will not be used.
As a special case, setting the latency tolerance to 0 will disable
APST entirely.  On hardware without APST support, the sysfs file will
not be exposed.

The latency tolerance for newly-probed devices is set by the module
parameter nvme_core.default_ps_max_latency_us.

In theory, the device can expose "default" APST table, but this
doesn't seem to function correctly on my device (Samsung 950), nor
does it seem particularly useful.  There is also an optional
mechanism by which a configuration can be "saved" so it will be
automatically loaded on reset.  This can be configured from
userspace, but it doesn't seem useful to support in the driver.

On my laptop, enabling APST seems to save nearly 1W.

The hardware tables can be decoded in userspace with nvme-cli.
'nvme id-ctrl /dev/nvmeN' will show the power state table and
'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
configuration.

This feature is quirked off on a known-buggy Samsung device.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

c5552fde

nvme: Add a quirk mechanism that uses identify_ctrl · bd4da3ab

由 Andy Lutomirski 提交于 2月 22, 2017

Currently, all NVMe quirks are based on PCI IDs.  Add a mechanism to
define quirks based on identify_ctrl's vendor id, model number,
and/or firmware revision.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

bd4da3ab

nvme: Use CNS as 8-bit field and avoid endianness conversion · 986994a2

由 Parav Pandit 提交于 1月 26, 2017

This patch defines CNS field as 8-bit field and avoids cpu_to/from_le
conversions.
Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS
for readability (don't rely on the fact that NVME_ID_CNS_NS = 0).
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

986994a2

nvme: add semicolon in nvme_command setting · 778f067c

由 Max Gurtovoy 提交于 1月 26, 2017

Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

778f067c

nvme: Make controller state visible via sysfs · 8432bdb2

由 Sagi Grimberg 提交于 11月 28, 2016

Easier for debugging and testing state machine
transitions.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8432bdb2

18 2月, 2017 2 次提交

nvme: Check for Security send/recv support before issuing commands. · 8a9ae523

由 Scott Bauer 提交于 2月 17, 2017

We need to verify that the controller supports the security
commands before actually trying to issue them.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
[hch: moved the check so that we don't call into the OPAL code if not
      supported]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8a9ae523

block/sed-opal: allocate struct opal_dev dynamically · 4f1244c8

由 Christoph Hellwig 提交于 2月 17, 2017

Insted of bloating the containing structure with it all the time this
allocates struct opal_dev dynamically.  Additionally this allows moving
the definition of struct opal_dev into sed-opal.c.  For this a new
private data field is added to it that is passed to the send/receive
callback.  After that a lot of internals can be made private as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NScott Bauer <scott.bauer@intel.com>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4f1244c8

15 2月, 2017 1 次提交

Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN · e225c20e

由 Scott Bauer 提交于 2月 14, 2017

When CONFIG_KASAN is enabled, compilation fails:

block/sed-opal.c: In function 'sed_ioctl':
block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

Moved all the ioctl structures off the stack and dynamically allocate
using _IOC_SIZE()

Fixes: 455a7b23 ("block: Add Sed-opal library")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e225c20e

09 2月, 2017 1 次提交

nvme: support ranged discard requests · b35ba01e

由 Christoph Hellwig 提交于 2月 08, 2017

NVMe supports up to 256 ranges per DSM command, so wire up support
for ranged discards up to that limit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b35ba01e

07 2月, 2017 1 次提交

nvme: Add Support for Opal: Unlock from S3 & Opal Allocation/Ioctls · a98e58e5

由 Scott Bauer 提交于 2月 03, 2017

This patch implements the necessary logic to unlock an Opal
enabled device coming back from an S3.

The patch also implements the SED/Opal allocation necessary to support
the opal ioctls.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a98e58e5

01 2月, 2017 1 次提交

block: fold cmd_type into the REQ_OP_ space · aebf526b

由 Christoph Hellwig 提交于 1月 31, 2017

Instead of keeping two levels of indirection for requests types, fold it
all into the operations.  The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.

Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

aebf526b

31 1月, 2017 1 次提交

lightnvm: add ioctls for vector I/Os · 84d4add7

由 Matias Bjørling 提交于 1月 31, 2017

Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.

For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.

The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Metadata implementation, test, and fixes.
Signed-off-by: NSimon A.F. Lund <slund@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

84d4add7

12 1月, 2017 1 次提交

nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too · b5a10c5f

由 Guilherme G. Piccoli 提交于 12月 28, 2016

Commit 54adc010 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.

When this quirk was added, we checked ctrl->tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.

In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.

So, this patch removes the original ctrl->tagset verification in order
to enable the quirk even on probe time.

Fixes: 54adc010 ("nvme/quirk: Add a delay before checking for adapter readiness")
Reported-by: NAndrew Byrne <byrneadw@ie.ibm.com>
Reported-by: NJaime A. H. Gomez <jahgomez@mx1.ibm.com>
Reported-by: NZachary D. Myers <zdmyers@us.ibm.com>
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: NJeffrey Lien <Jeff.Lien@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b5a10c5f

21 12月, 2016 1 次提交

nvme: simplify stripe quirk · e6282aef

由 Keith Busch 提交于 12月 19, 2016

Some OEMs believe they own the Identify Controller vendor specific
region and will repurpose it with their own values. While not common,
we can't rely on the PCI VID:DID to tell use how to decode the field
we reserved for this as the stripe size so we need to do something else
for the list of devices using this quirk.

The field was supposed to allow flexibility on the device's back-end
striping, but it turned out that never materialized; the chunk is always
the same as MDTS in the products subscribing to this quirk, so this
patch removes the stripe_size field and sets the chunk to the max hw
transfer size for the devices using this quirk.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e6282aef

14 12月, 2016 1 次提交

Revert "nvme: add support for the Write Zeroes command" · cdb98c26

由 Linus Torvalds 提交于 12月 13, 2016

This reverts commit 6d31e3ba.

This causes bootup problems for me both on my laptop and my desktop.
What they have in common is that they have NVMe disks with dm-crypt, but
it's not the same controller, so it's not controller-specific.

Jens does not see it on his machine (also NVMe), so it's presumably
something that triggers just on bootup.  Possibly related to dm-crypt
and the fact that I mark my luks volume with "allow-discards" in
/etc/crypttab.

It's 100% repeatable for me, which made it fairly straightforward to
bisect the problem to this commit. Small mercies.

So we don't know what the reason is yet, but the revert is needed to get
things going again.
Acked-by: NJens Axboe <axboe@fb.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cdb98c26

09 12月, 2016 1 次提交

block: improve handling of the magic discard payload · f9d03f96

由 Christoph Hellwig 提交于 12月 08, 2016

Instead of allocating a single unused biovec for discard requests, send
them down without any payload.  Instead we allow the driver to add a
"special" payload using a biovec embedded into struct request (unioned
over other fields never used while in the driver), and overloading
the number of segments for this case.

This has a couple of advantages:

 - we don't have to allocate the bio_vec
 - the amount of special casing for discard requests in the block
   layer is significantly reduced
 - using this same scheme for other request types is trivial,
   which will be important for implementing the new WRITE_ZEROES
   op on devices where it actually requires a payload (e.g. SCSI)
 - we can get rid of playing games with the request length, as
   we'll never touch it and completions will work just fine
 - it will allow us to support ranged discard operations in the
   future by merging non-contiguous discard bios into a single
   request
 - last but not least it removes a lot of code

This patch is the common base for my WIP series for ranges discards and to
remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
so it would be good to get it in quickly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f9d03f96

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功